What shall we do tonight? You ask. Friend A is flexible — You choose — while Friend B has a strong opinion and lets you know it. Which friend is being kinder to you?
In “Algorithms to Live by” , the authors argue that Friend B is the more generous of the two:
“Seemingly innocuous language like ‘Oh, I’m flexible’ […] has a dark computational underbelly that should make you think twice. It has the veneer of kindness about it, but it does two deeply alarming things. First, it passes the cognitive buck: ‘Here’s a problem, you handle it.’ Second, by not stating your preferences, it invites the others to simulate or imagine them.”
If we allow that deciding on a plan for the evening is mostly a matter of computation, this computational kindness principle has evident merits and I, for one, am nodding in agreement. And so are also the big tech companies, all furiously competing to be your best Friend B. The last campaign by Google — Make Google do it. — makes this plain: The ambition is that of thinking for us — giving us directions, telling us what to buy, what to watch, where to go, whom to date, and so on.
The amount of cognitive offload from humans to AI-powered apps is an evident, and possibly defining, trend of our times. As pointed out by James Bridle in “New Dark Age: Technology and the End of the Future“, this process is accompanied by the spreading “belief that any given problem can be solved by the application of computation”, so that
“Computation replaces conscious thought. We think more and more like machine, or we do not think at all.”
A typical way to justify our over-reliance on machines as surrogates for our own cognitive capacities is to point to the complexity of the modern world, which has been compounded by the inter-connectivity brought about by the Internet. This view echoes this prescient 1926 passage by H. P. Lovecraft as cited by Bridle:
“The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents. We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall either go mad from the revelation or flee from the light into the peace and safety of a new dark age.”
But the effects of this unprecedented cognitive offload, even on our health, are at best unclear. Ominously, Vivienne Ming warns that, as the use of apps is depriving our brains of the exercise they have becomes used to over millions of years, we might see widespread early-onset dementia within a single generation.
Some find the (relatively) new tools a liberation from the time-consuming task of acquiring domain knowledge and advanced analytical skills. For this camp, machine learning shifts the engineering problem from the traditional model-based optimization of individual functionalities to an overall data-driven end-to-end design.
Others hold the reduction of communication engineering to the choice of objective functions and hyperparameters of general-purpose machine learning algorithms to be disturbing and demeaning. This second camp views machine learning as an excuse to cop out on the engineering responsibilities of ensuring reliable and verifiable performance guarantees.
And yet, to continue with the initial paraphrase: Where is the research group that has not attempted to apply machine learning to well-studied communication problems? Where is the communication researcher who has not tweaked hyperparameters so as to reproduce known engineering solutions?
Some perspective seems needed in order to find a beneficial synthesis between the two extreme viewpoints that may open new opportunities rather than transform communication engineering into a branch of computer science. To this end, I would propose that the use of machine learning in communications is justified when:
1) a good mathematical model exists for the problem at hand, and:
- an objective function is hard to define or to optimize mathematically (even approximately);
- it is easy to draw data samples from the model;
- it is easy to evaluate desired pairs of well-specified input and output pairs (supervised learning), or to provide feedback on sequentially selected outputs (reinforcement learning);
- the task does not required detailed explanations or performance guarantees;
or 2) a good mathematical model does not exist, and:
- it is easy to collect real-world data;
- the phenomenon under study does not change rapidly over time;
- it is easy to evaluate desired pairs of well-specified input and output pairs (supervised learning), to assess the quality of the generated output (unsupervised learning), or to provide feedback on sequentially selected outputs (reinforcement learning);
- the task does not required detailed explanations or performance guarantees.
Even when the use of machine learning methods is justified, the adoption of a data-driven approach does not entail the complete rejection of domain knowledge and engineering principles. As a case in point, the decoding of a dense channel code via efficient message passing schemes is known to fail in general. Nevertheless, the structure of a message passing algorithm can still be leveraged to define a parametrized scheme that can be tuned based on input-output examples.
More broadly, engineering a modern communication system is likely to call for a macro-classification of subtasks for which either model- or data-driven approaches are most suitable, followed by a micro-design of individual subproblems and by a possible end-to-end fine-tuning of parameters via learning methods.
Computer scientists and communication engineers of all countries, unite?
Watch any episode of Mad Men and you are likely to be struck by the casual way in which cigarettes were lit and smoked in quick succession in the 60s: How could everyone be so oblivious to the health risks? A new report by the National Toxicology Program of the US Department of Health and Human Services raises concerns that we may have been similarly nonchalant about the use of wireless devices in the last twenty years. While the findings are not conclusive, there appears to be enough evidence to recommend a reconsideration of a pervasive use of wearable wireless devices around the human body. For further discussion, I recommend this interview.
At CES 2018, most new consumer products, such as smart appliances and self-driving cars, will sport the label “AI”. As the New York Times puts it: “the real star is artificial intelligence, the culmination of software, algorithms and sensors working together to make your everyday appliances smarter and more automated”. Given the economic and intellectual clout of the acronym AI, it is worth pausing to question its significance, and to reflect on the possible implications of its inflated use in the tech industry and in the academic world.
In 1978, John McCarthy quipped that producing a human-level, or general, AI would require “1.7 Einsteins, 2 Maxwells, 5 Faradays and .3 Manhattan Projects.” Forty years later, despite many predictions of an upcoming age of superintelligent machines, little progress has been done towards the goal of a general AI. The AI powering the new consumer devices is in fact mostly deep learning, a specialized learning technique that performs pattern recognition by interpolating across a large number of data points.
Twenty years earlier, in 1959, in his seminal paper “Programs with Common Sense“, McCarthy associated human-like intelligence with the capability to deduce consequences by extrapolating from experience and knowledge even in the presence of previously unforeseen circumstances. This is in stark contrast with the interpolation of observations to closely related contingencies allowed by deep learning. According to this view, intelligent thinking is the application of common sense to long-tail phenomena, that is, to previously unobserved events that are outside the “manifold” spanned by the available data.
This generalized form of intelligence appears to be strongly related to the ability to acquire and process information through language. A machine with common sense should be able to answers queries such as “If you stick a pin into a carrot, does it make a hole in the carrot or in the pin?“, without having to rely on many joint observations of carrots and pins. To use Hector Levesque‘s distinction, if state-of-the-art machine learning techniques acquire street smarts by learning from repeated observations, general AI requires the book smarts necessary to learn from written or spoken language. Language is indeed widely considered to loom as the the next big frontier of AI.
As most of the academic talent in AI is recruited by the Big Five (Amazon, Apple, Google, Facebook and Microsoft), the economic incentives for the development of general AI seem insufficient to meet the level of effort posited by McCarthy. And so, rather than worrying about the take-over of a super-AI, given the dominance of the current state-of-the-art specialized AI, what we should “be most concerned about is the possibility of computer systems that are less than fully intelligent, but are nonetheless considered to be intelligent enough to be given the authority to control machines and make decisions on their own. The true danger […] is with systems without common sense making decisions where common sense is needed.” (Hector Levesque, “Common Sense, the Turing Test, and the Quest for Real AI“).
Having taught courses on machine learning, I am often asked by colleagues and students with a background in engineering to suggest “the best place to start” to get into this subject. I typically respond with a list of books — for a general, but slightly outdated introduction, read this book; for a detailed survey of methods based on probabilistic models, check this other reference; to learn about statistical learning, I found this text useful; and so on. This answers strikes me, and most likely also my interlocutors, as quite unsatisfactory. This is especially so since the size of many of these books may be discouraging for busy professionals and students working on other projects. These notes are my first attempt to offer a basic and compact reference that describes key ideas and principles in simple terms and within a unified treatment, encompassing also more recent developments and pointers to the literature for further study. This is a work in progress and feedback is very welcome! (MIT Technology Review link)
Feed the data on the left (adapted from this book by Pearl and co-authors) to a learning machine. With confidence, the trained algorithm will predict lower cholesterol levels for individuals who exercise less. While counter-intuitive, the prediction is sound and supported by the available data. Nonetheless, no one could in good faith use the output of this algorithm as a prescription to reduce the number of hours at the gym.
This is clearly a case of correlation being distinct from causation. But how do we know? And how can we ensure that an AI Doctor would not interpret the data incorrectly and produce a harmful diagnosis?
We know because we have prior information on the problem domain. Thanks to our past experience, we can explain away this spurious correlation by including another measurable variable in the model, namely age. To see this, consider the same data, now redrawn by highlighting the age of the individual corresponding to each data point. The resulting figure, shown on the right, reveals that older people — within the observed bracket — tend to have a higher cholesterol as well as to exercise more: Age is a common cause of both exercise and cholesterol levels. In order to capture the causality relationship between the latter variables, we hence need to adjust for age. Doing this requires to consider the trend within each age separately, recovering the expected conclusion that exercising is useful to lower one’s cholesterol.
And yet an AI Doctor that is given only the data set in the first figure would have no way of deciding that the observed upward trend hides a spurious correlation through another variable. More generally, just like the AI Doctor blinded by a wrong model, AI algorithms used for hiring, rating teachers’ performance or credit assessment can confuse causation for correlation and produce biased, or even discriminatory, decisions.
As seen, solving this problem would require making modeling choices, identifying relevant variables and their relationships — a task that appears to require a human in the loop. Add this to the, still rather short, list of new jobs created by the introduction of AI and machine learning technologies in the workplace.