Integrating Machine Learning and Communication Engineering

Image result for to be a machineA spectre is haunting conferences on communication and information theory — the spectre of machine learning.

Some find the (relatively) new tools a liberation from the time-consuming task of acquiring domain knowledge and advanced analytical skills. For this camp, machine learning shifts the engineering problem from the traditional model-based optimization of individual functionalities to an overall data-driven end-to-end design.

Others hold the reduction of communication engineering to the choice of objective functions and hyperparameters of general-purpose machine learning algorithms to be disturbing and demeaning. This second camp views machine learning as an excuse to cop out on the engineering responsibilities of ensuring reliable and verifiable performance guarantees.

And yet, to continue with the initial paraphrase: Where is the research group that has not attempted to apply machine learning to well-studied communication problems? Where is the communication researcher who has not tweaked hyperparameters so as to reproduce known engineering solutions?

Some perspective seems needed in order to find a beneficial synthesis between the two extreme viewpoints that may open new opportunities rather than transform communication engineering into a branch of computer science. To this end, I would propose that the use of machine learning in communications is justified when:

1) a good mathematical model exists for the problem at hand, and:

  • an objective function is hard to define or to optimize mathematically (even approximately);
  • it is easy to draw data samples from the model;
  • it is easy to evaluate desired pairs of well-specified input and output pairs (supervised learning), or to provide feedback on sequentially selected outputs (reinforcement learning);
  • the task does not required detailed explanations or performance guarantees;

or 2) a good mathematical model does not exist, and:

  • it is easy to collect real-world data;
  • the phenomenon under study does not change rapidly over time;
  • it is easy to evaluate desired pairs of well-specified input and output pairs (supervised learning), to assess the quality of the generated output (unsupervised learning), or to provide feedback on sequentially selected outputs (reinforcement learning);
  • the task does not required detailed explanations or performance guarantees.

Even when the use of machine learning methods is justified, the adoption of a data-driven approach does not entail the complete rejection of domain knowledge and engineering principles. As a case in point, the decoding of a dense channel code via efficient message passing schemes is known to fail in general. Nevertheless, the structure of a message passing algorithm can still be leveraged to define a parametrized scheme that can be tuned based on input-output examples.

More broadly, engineering a modern communication system is likely to call for a macro-classification of subtasks for which either model- or data-driven approaches are most suitable, followed by a micro-design of individual subproblems and by a possible end-to-end fine-tuning of parameters via learning methods.

Computer scientists and communication engineers of all countries, unite?


Snow Crash

The novel “Snow Crash” published in 1992 by Neal Stephenson, is often invoked as a prime example of fiction that has shaped, and not merely predicted, today’s technological landscape. The following lightly commented list of quotes from the first one hundred pages offer ample justification for the novel’s reputation. [Comments are in brackets.]

… all of the information got converted into machine-readable form, which is to say, ones and zeroes.  And as the number of media grew, the material became more up to date, and the methods for searching the Library became more up to date […] Millions […] are uploading millions of other fragments at the same time. […] clients, mostly large corporations and Sovereigns, rifle through the Library looking for useful information, and if they find a use for something that Hiro put into it, Hiro gets paid.

[Mosaic, the first popular web browser, was introduced only in 1993, and the first widely known search engine, WebCrawler, came out in 1994. Rather than relying on systems of payments as envisaged in the novel, the world wide web was built on an advertisement-based model. This system is currently being challenged by the changing legal and political climate, while new technologies for micro-payments are emerging.]

By drawing a slightly different image in front of each eye, the image can be made three-dimensional. By changing the image seventy-two times a second, it can be made to move. By drawing the moving three-dimensional image at a resolution of 2K pixels on a side, it can be as sharp as the eye can perceive, and by pumping stereo digital sound through the little earphones, the moving 3-D pictures can have a perfectly realistic soundtrack. So Hiro’s not actually here at all. He’s in a computer-generated universe that his computer is drawing onto his goggles and pumping into his earphones. In the lingo, this imaginary place is known as the Metaverse.”

[Metaverse, a term coined in the novel, refers to the idea of a shared virtual space. This concept has been extremely influential, inspiring game designers and VR developers.]

“It was, of course, nothing more than sexism, the especially virulent type espoused by male techies who sincerely believe that they are too smart to be sexists.”

[Sadly still relevant today.]

There is something new: A globe about the size of a grapefruit, a perfectly detailed rendition of Planet Earth, hanging in space at arm’s length in front of his eyes. […] It is a piece of […] software called, simply, Earth. It is the user interface that CIC uses to keep track of every bit of spatial information that it owns — all the maps, weather data, architectural plans, and satellite surveillance stuff.

[This ideas has reportedly inspired the development of Google Earth.]

“For the most part I write myself,” explains an automated online search engine. “That is, I have the innate ability to learn from experience. But this ability was originally coded into me by my creator.

[A good self-definition of a machine learning algorithm.]

Human-Imitative AI vs. Useful AI

Image result for Robot-Proof: Higher Education in the Age of Artificial IntelligenceIn a recent post, Michael I. Jordan distinguishes the following aspirations for AI-related research:

  • Human imitation: AI should behave in a way that is indistinguishable from that of a human being.
  • Intelligence Augmentation (IA): AI should augment human capacity to think, communicate, and create.
  • Intelligence Infrastructure (II): AI should manage a network of computing and communicating agents, markets, and repositories of information in a way that is efficient, supportive of human and societal needs, as well as economically and legally viable.

Human-imitative AI is often singled out by technologists, academicians, journalists and venture capitalists as the real aspiration and end goal of AI research. However, most progress in “AI” to date has not concerned high-level abstract thinking machines, but rather low-level engineering solutions with roots in operations research, statistics, pattern recognition, information theory and control theory.

In fact, as argued by Jordan, the single-minded focus on human-imitative AI has become a distraction from the more useful endeavor of addressing IA and II. To this end, the post calls for the founding of a new engineering discipline building on

ideas that the preceding century gave substance to — ideas such as “information,” “algorithm,” “data,” “uncertainty,” “computing,” “inference,” and “optimization.” Moreover, since much of the focus of the new discipline will be on data from and about humans, its development will require perspectives from the social sciences and humanities.”

The development of such a discipline would call not only for large-scale targeted research efforts, but also for new higher education programs. As proposed in “Robot-proof“, the new programs should impart data literacy, technological literacy, and human literacy:

Students will need data literacy to manage the flow of big data, and technological literacy to know how their machines work, but human literacy–the humanities, communication, and design–to function as a human being. Life-long learning opportunities will support their ability to adapt to change.

Information Theory is…

Image result for strand bookstore

[According to Google’s Talk to Books]

“Information theory is a branch of applied mathematics providing a framework allowing the quantification of the information generated or transmitted through a communication channel.”  from The Manual of Photography  by Elizabeth Allen, Sophie Triantaphillidou

“Information theory is a mathematical theory dealing with highly precise aspects of the communication of information in terms of bits through well-defined channels. The theory of international politics developed in this volume is non-mathematical and non-precise.” from System and Process in International Politics  by Morton A. Kaplan

“Information theory is a branch of science, mathematics, and engineering that studies information in a physical and mathematical context rather than a psychological framework.” from Assessing and Measuring Environmental Impact and Sustainability  by Jiri J Klemes

“Information theory is the science of message transmission developed by Claude Shannon and other engineers at Bell Telephone Laboratories in the late 1940s. It provides a mathematical means of measuring information.” from The Creation Hypothesis: Scientific Evidence for an Intelligent Designer by James Porter Moreland

Information theory is a branch of statistics and probability theory dealing with the study of data, ways to manipulate it (for instance, cryptography and compression) and communicate it (for instance, data transmissions and communication systems)”. from AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors  by Alex J. Champandard

Information theory provides a means to quantify the complexity of information that can be used in the design of communication systems (Shannon 1948). It originated during World War II as a tool for assuring the successful transmission…” from Oceanography and Marine Biology: An Annual Review by R. N. Gibson, R. J. A. Atkinson, J. D. M. Gordon

And so on…


Rethinking Wearables

Image result for mad menWatch any episode of Mad Men and you are likely to be struck by the casual way in which cigarettes were lit and smoked in quick succession in the 60s: How could everyone be so oblivious to the health risks? A new report by the National Toxicology Program of the US Department of Health and Human Services raises concerns that we may have been similarly nonchalant about the use of wireless devices in the last twenty years. While the findings are not conclusive, there appears to be enough evidence to recommend a reconsideration of a pervasive use of wearable wireless devices around the human body. For further discussion, I recommend this interview.

AI = RL + DL?


One often hears the following two statements made in the same breath: that the current resurgence of artificial intelligence research will fundamentally transform the way in which we live, and that we are on the verge of mastering general intelligence. The frequent conflating of these two strikingly different assessments has led to predictions ranging from an evolutionary catastrophe for the human race to the onset of a downward cycle in AI of innovation brought about by overhype. In order to make sense of these claims, it is useful to take a deeper look at the current state of the art and to walk a few steps back in history for some perspective.

One of the most publicized successes of modern AI is given by programs, developed most notably by DeepMind, that have mastered complex games such as Go, obtaining super-human performance. The topic is considered to be of sufficient dramatic heft to justify a Netflix production. An aspect that has particularly captured the attention of commentators is the capability of these algorithms to learn from a blank slate, not requiring even an initial nudge by the programmers towards strategies that have been found to be effective by human players.

The engine underlying these programs is reinforcement learning, which builds on the idea of using feedback from a large number of simulated experiences to slowly gather information about the environment and/or the optimal strategies to be adopted. The specific algorithms employed date from the 70s and can be eloquently explained by a cartoon. What the new AI wave has brought to the table is hence, by and large, not a novel understanding of intelligence, but rather a bag of clever tricks that allows old algorithms to make an effective statistical use of unprecedented computational resources. Today’s algorithms would look very familiar to a researcher of the 80s. But they would also be useless: with her computing technology, the researcher would have to wait a few million years to obtain the same results that now take just a few days on modern processors.

In fact, even the most straightforward black-box optimization schemes, such as evolution strategies, have been shown to provide state-of-the-art results. These schemes merely test many random perturbations of the current strategy by leveraging computing parallelism, and they modify the current solution according to the feedback received from simulations of the environment. As such, these methods merely rely on the capability of the computing system to simulate the effect of many variations of the actions on the environment across test runs.

The fact that sheer computing power is to be credited for the most visible successes of AI should give us pause when commenting on our understanding of intelligence. The key principles at play in the current AI wave are in fact still the same postulated by Norbert Wiener’s cybernetics, namely information processing and feedback. Whether these are the right principles on which to build a theory of intelligence appears to be a valid open question.

As nicely summarized in Jessica Riskin’ essay, our understanding of intelligent behavior has over the centuries shifted to concentrate on different manifestations of intelligence, such as motion or programmability. For example, following Aristotle’s definition of living beings as things that can move at will, hydraulic automata that can make water travel upward, against gravity, would qualify as intelligent. In light of the apparent limitations of our current understanding of intelligence, artificial or otherwise, will some new principle of intelligence emerge that will make our current established framework appear as quaint as Aristotle’s?

Mind-Body Computing

A any quantitative researcher knows all too well, visualizing data facilitates interpretation and extrapolation, and can be a powerful tool for solving problems and motivating decision and actions. Visualization allows one to leverage our intuitive sense of space in order to grasp connections and relationships, as well as to notice parallels and analogies. (It can, of course, also be used to confuse).

Modern machine learning algorithms operate on very high dimensional data structures that cannot be directly visualized by humans. In this sense, machines can “see”, and “think”, in spaces that are inaccessible to our eyes. To put it with Richard Hamming’s words:

“Just as there are odors that dogs can smell and we cannot, as well as sounds that dogs can hear and we cannot, so too there are wavelengths of light we cannot see and flavors we cannot taste. Why then, given our brains wired the way they are, does the remark ‘Perhaps there are thoughts we cannot think’, surprise you?”

In an era of smart homes, smart cities, and smart governments, methods that visualize high-dimensional data in two dimensions can allow us to bridge, albeit partially, the understanding gap between humans and algorithms. This explains the success of techniques such as t-SNE, especially when coupled with interactive graphical interfaces.

As virtual reality and mixed technologies vie with standard two-dimensional interfaces as the dominant medium between us and the machines, data visualization and interactive representation stand to gain another dimension. And the difference may not be merely a quantitative one. As suggested by Jaron Lanier:

“People think differently when they express themselves physically. […] Having a piano in front of me makes me smarter by applying the biggest part of my cortex, the part associated with haptics. […] Don’t just think of VR as the place where you can look at a molecule in 3-D, or perhaps handle one, like all those psychiatrists in Freud avatars. No! VR is the place where you become a molecule. Where you learn to think like a molecule. Your brain is waiting for the chance.”

As such, VR may allow humans “to explore motor cortex intelligence”. Can this result in a new wave of innovations and discoveries?

Common Sense and Smart Appliances


At CES 2018, most new consumer products, such as smart appliances and self-driving cars, will sport the label “AI”. As the New York Times puts it: “the real star is artificial intelligence, the culmination of software, algorithms and sensors working together to make your everyday appliances smarter and more automated”. Given the economic and intellectual clout of the acronym AI, it is worth pausing to question its significance, and to reflect on the possible implications of its inflated use in the tech industry and in the academic world.

In 1978, John McCarthy quipped that producing a human-level, or general, AI would require “1.7 Einsteins, 2 Maxwells, 5 Faradays and .3 Manhattan Projects.” Forty years later, despite many predictions of an upcoming age of superintelligent machines, little progress has been done towards the goal of a general AI.  The AI powering the new consumer devices is in fact mostly deep learning, a specialized learning technique that performs pattern recognition by interpolating across a large number of data points.

Twenty years earlier, in 1959, in his seminal paper “Programs with Common Sense“, McCarthy associated human-like intelligence with the capability to deduce consequences by extrapolating from experience and knowledge even in the  presence of previously unforeseen circumstances. This is in stark contrast with the interpolation of observations to closely related contingencies allowed by deep learning. According to this view, intelligent thinking is the application of common sense to long-tail phenomena, that is, to previously unobserved events that are outside the “manifold” spanned by the available data.

This generalized form of intelligence appears to be strongly related to the ability to acquire and process information through language. A machine with common sense should be able to answers queries such as “If you stick a pin into a carrot, does it make a hole in the carrot or in the pin?“, without having to rely on many joint observations of carrots and pins. To use Hector Levesque‘s distinction, if state-of-the-art machine learning techniques acquire street smarts by learning from repeated observations, general AI requires the book smarts necessary to learn from written or spoken language.  Language is indeed widely considered to loom as the the next big frontier of AI.

As most of the academic talent in AI is recruited by the Big Five (Amazon, Apple, Google, Facebook and Microsoft),  the economic incentives for the development of general AI seem insufficient to meet the level of effort posited by McCarthy. And so, rather than worrying about the take-over of a super-AI, given the dominance of the current state-of-the-art specialized AI, what we should “be most concerned about is the possibility of computer systems that are less than fully intelligent, but are nonetheless considered to be intelligent enough to be given the authority to control machines and make decisions on their own. The true danger […] is with systems without common sense making decisions where common sense is needed.” (Hector Levesque, “Common Sense, the Turing Test, and the Quest for Real AI“).