Turing, London, and Information Theory

In this one-page contribution to the first London Symposium on Information Theory, Alan Turing discusses learning as opposed to programming, the role of computational complexity in information theory, and genetic algorithms — in 1950.

(with thanks to Deniz Gündüz)

Advertisements

The Rise of Hybrid Digital-Analog

Asautonomous_design-by-will-staehle a keen observer of nature, Leonardo da Vinci was more comfortable with geometry than with arithmetic. Shapes, being continuous quantities, were easier to fit, and disappear into, the observable world than discrete, discontinuous, numbers. For centuries since Leonardo, physics has shared his preference for analog thinking, building on calculus to describe macroscopic phenomena. The analog paradigm was upended at the beginning of the last century, when the quantum revolution revealed that the microscopic world behaves digitally, with observable quantities taking only discrete values. Quantum physics is, however, at heart a hybrid analog-digital theory, as it requires the presence of analog hidden variables to model the digital observations.

Computing technology appears to be following a similar path. The state-of-the-art computer that Claude Shannon found in Vannevar Bush‘s lab at MIT in the thirties was analog: turning its wheels would set the parameters of a differential equation to be solved by the computer via integration. Shannon’s thesis and the invention of the transistor ushered in the era of digital computing and the information age, relegating analog computing to little more than a historical curiosity.

But analog computing retains important advantages over digital machines. Analog computers can be faster in carrying our specialized tasks. As an example, deep neural networks, which have led to the well-publicized breakthroughs in pattern recognition, reinforcement learning, and data generation tasks, are inherently analog (although they are currently mostly implemented on digital platforms). Furthermore, while the reliance of digital computing on either-or choices can provide a higher accuracy, it can also also yield catastrophic failures. In contrast, the lower accuracy of analog systems is accompanied by a gradual performance loss in case of errors. Finally, analog computers can leverage time, not just as a neutral substrate for computation as in digital machines, but as an additional information-carrying dimension. The resulting space-time computing has the potential to reduce the energetic and spatial footprint of information processing.

The outlined complementarity of analog and digital computing has led experts to predict that hybrid digital-analog computers will be the way of the future.  Even in the eighties, Terrence J. Sejnowski is reported to have said:  ”I suspect the computers used in the future will be hybrid designs, incorporating analog and digital.” This conjecture is supported by our current understanding of the operation of biological neurons, which communicate using the digital language of spikes, but maintain internal analog states in the form of membrane potentials.

With the emergence of academic and commercial neuromorphic processors, the rise of hybrid digital-analog computing may just be around the corner. As it is often the case, the trend has been anticipated by fiction. In Autonomous, robots have a digital main logic unit with a human brain as a coprocessor to interpret people’s reactions and emotions. Analog elements can support common sense and humanity, in contrast to digital AI that “can make a perfect chess move while the room is on fire.” For instance, in H(a)ppy and Gnomon, analog is an element of disruption and reason in an ideally ordered and purified world under constant digital surveillance.

When Message and Meaning are One and the Same

Embassytown2.pngThe indigenous creatures of Embassytown — an outpost of the human diaspora somewhere/somewhen in the space-time imagined by China Miéville — communicate via the Language. Despite requiring two coordinated sound sources to be spoken, the Language does not have the capacity to express any duplicitous thought: Every message, in order to be perceived as part of the Language, must correspond to a physical reality. A spoken message is hence merely a link to an existing object, and it ceases being recognized as a message when the linked object is no longer in existence.

As Miéville describes it: “… each word of Language, sound isomorphic with some Real: not a thought, not really, only self-expressed worldness […] Language had always been redundant: it had only ever been the world.

The Language upends Shannon’s premise that the semantic aspects of communication are irrelevant to the problem of transferring and storing information. In the Language, recorded sounds, untied to the state of the mind that produced them, do not carry any information. In a reversal of Shannon’s framework, information is thus inextricably linked to its meaning, and preserving information requires the maintenance of the physical object that embodies its semantics.

When message and meaning are one and the same as in the Language, information cannot be represented in any format other than in its original expression; Shannon’s information theory ceases to be applicable; and information becomes analog, irreproducible, and intrinsically physical. (And, as the events in the novel show, interactions with the human language may lead to some dramatic unforeseen consequences.)

A Brief Introduction to Machine Learning for Engineers

Having taught courses on machine learning, I am often asked by colleagues and students with a background in engineering to suggest “the best place to start” to get into this subject. I typically respond with a list of books — for a general, but slightly outdated introduction, read this book; for a detailed survey of methods based on probabilistic models, check this other reference; to learn about statistical learning, I found this text useful; and so on. This answers strikes me, and most likely also my interlocutors, as quite unsatisfactory. This is especially so since the size of many of these books may be discouraging for busy professionals and students working on other projects. These notes are my first attempt to offer a basic and compact reference that describes key ideas and principles in simple terms and within a unified treatment, encompassing also more recent developments and pointers to the literature for further study. This is a work in progress and feedback is very welcome! (MIT Technology Review link)

Human In the Loop

figsimpson11-e1502302493173.jpgFeed the data on the left (adapted from this book by Pearl and co-authors) to a learning machine. With confidence, the trained algorithm will predict lower cholesterol levels for individuals who exercise less. While counter-intuitive, the prediction is sound and supported by the available data. Nonetheless, no one could in good faith use the output of this algorithm as a prescription to reduce the number of hours at the gym.

This is clearly a case of correlation being distinct from causation. But how do we know? And how can we ensure that an AI Doctor would not interpret the data incorrectly and produce a harmful diagnosis?

FigSimpson1We know because we have prior information on the problem domain. Thanks to our past experience, we can explain away this spurious correlation by including another measurable variable in the model, namely age. To see this, consider the same data, now redrawn by highlighting the age of the individual corresponding to each data point. The resulting figure, shown on the right, reveals that older people — within the observed bracket —  tend to have a higher cholesterol as well as to exercise more: Age is a common cause of both exercise and cholesterol levels. In order to capture the causality relationship between the latter variables, we hence need to adjust for age. Doing this requires to consider the trend within each age separately, recovering the expected conclusion that exercising is useful to lower one’s cholesterol.

And yet an AI Doctor that is given only the data set in the first figure would have no way of deciding that the observed upward trend hides a spurious correlation through another variable. More generally, just like the AI Doctor blinded by a wrong model, AI algorithms used for hiring, rating teachers’ performance or credit assessment can confuse causation for correlation and produce biased, or even discriminatory, decisions.

As seen, solving  this problem would require making modeling choices, identifying relevant variables and their relationships — a task that appears to require a human in the loop. Add this to the, still rather short, list of new jobs created by the introduction of AI and machine learning technologies in the workplace.

 

 

A Few Things I Didn’t Know About Claude Shannon

Claude SHANNON, US mathematician. 1962

  • While he was a student at MIT, Claude Shannon, the future father of Information Theory, trained as an aircraft pilot in his spare time (to the protestations of the instructor, who was worried about damaging such a promising brain).
  • What do Coco Chanel, Truman Capote, Albert Camus, Gandhi, Malcolm X and Claude Shannon have in common? They were all photographed by Henri Cartier-Bresson (see photo).
  • Having pioneered artificial intelligence research with his maze-solving mouse and his chess-playing machine, in 1984 Shannon proposed the following targets for 2001: 1) Beat the chess word champion (check); 2) Generate a poem accepted for publication by the New Yorker (work in progress); 3) Prove the Riemann hypothesis (work in progress); 4) Pick stocks outperforming the prime rate by 50% (check, although perhaps with some delay).
  • Shannon corresponded with L. Ron Hubbard of Scientology fame, writing about him that he “has been doing very interesting work lately in using a modified hypnotic technique for therapeutic purposes”, although he later conceded that he did not know “whether or not his treatment contains anything of value”.
  • He is quoted as saying that great insights spring from a “constructive dissatisfaction”, that is, “a slight irritation when things don’t quite look right”.

(From “A Mind at Play“, an excellent book about Claude Shannon by Jimmy Soni and Rob Goodman.)

Impossible Lines

deep_face_1000In a formal field such as Information Theory (IT), the boundary between possible and impossible is well delineated: given a problem, the optimality of a solution can be in principle checked and determined unambiguously. As a pertinent example, IT says that there are ways to compress an “information source”, say a class of images, up to some file size, and that no conceivable solution could do any better than the theoretical limit. This is often a cause of confusion among newcomers, who tend to more naturally focus on improving existing solutions — say on producing a better compression algorithm as in “Silicon Valley” — rather than asking if the effort could at all be fruitful due to intrinsic informational limits.

The strong formalism has been among the key reasons for the many successes of IT, but — some may argue — it has also hindered its applications to a broader set of problems. (Claude Shannon himself famously warned about an excessively liberal use of the theory.) It is not unusual for IT experts to look with suspicion at fields such as Machine Learning (ML) in which the boundaries between possible and impossible are constantly redrawn by advances in algorithm design and computing power.

In fact, a less formal field such as ML allows practice to precede theory, letting the former push the state-of-the-art boundary in the process. As a case in point, deep neural networks, which power countless algorithms and applications, are still hardly understood from a theoretical viewpoint. The same is true for the more recent algorithmic framework of Generative Adversarial Networks (GANs). GANs can generate realistic images of faces, animals and rooms from datasets of related examples, producing fake faces, animals and rooms that cannot be distinguished from their real counterparts. It is expected that soon enough GANs will even be able to generate videos of events that never happenedwatch Françoise Hardy discuss the current US president in the 60’s. While the theory may be lagging behind, these methods are making significant practical contributions.

Interestingly, GANs can be interpreted in terms of information-theoretic quantities (namely the Jensen-Shannon divergence), showing that the gap between the two fields is perhaps not as unbridgeable as it has broadly assumed to be, at least in recent years.

The Network & the Network

Full Narrative Timeline

In “The City & the City“, China Miéville imagines an usual coexistence arrangement between two cities located in the same geographical area that provides a surprisingly apt metaphor for the concept of network slicing in 5G networks — from the city & the city to the network & the network.

The two cities: Besźel and Ul Qoma occupy the same physical location, with buildings, squares, streets and parks either allocated completely to one city or “crosshatched”, that is, shared. The separation and isolation between the two cities is not ensured by physical borders, but is rather enforced by cultural customs and legal norms. The inhabitants of each city are taught from childhood to “unsee” anything that lies in the other city, consciously ignoring people, cars and buildings, even though they share the same sidewalks, roads and city blocks. Recognition of “alter” areas and citizens is made possible by the different architectures, language and clothing styles adopted in the two cities. Breaching the logical divide between Besźel and Ul Qoma by entering areas or interacting with denizens of the other city is a serious crime dealt with by a special police force. (Prospective tourists in Besźel or Ul Qoma are required to attend a long preliminary course to learn how to “unsee”.)

And now for the two networks: Experts predict an upcoming upheaval in telecommunication networks to parallel the recent revolution in computing brought on by cloudification. Just as computing and storage have become readily available on demand to individuals, companies and governments on shared cloud platforms, network slicing technologies are expected to enable the on-demand instantiation of wireless services on a common network substrate. Networking and wireless access for, say, a start-up offering IoT or vehicular communication applications, could be quickly set up on the hardware and spectrum managed by an infrastructure provider. Each service would run its own network on the same physical infrastructure but on logically separated slices — the packets and signals of one slice “unseeing” those of the other. In keeping with the metaphor, ensuring the isolation and security of the coexisting slices is among the key challenges facing this potentially revolutionary technology.

 

The Rebirth of Expertise?

BaltesThese days, conversations on almost any topic — be it finance, health care, art, the economy, music, or even religion — do not seem complete without a lively, and more or less informed, exchange on AI and on machine learning. The crux of the discussion typically rests on the role of humans in the increasingly large number of enterprises that depend on machines for decision making and manufacturing. In this context, a distinction that may prove useful in thinking about a future society of humans and “intelligent” machines is that proposed back in the 60s in the field of psychology between fluid and crystallized intelligence. As recently pointed out by Sarah Harper, taken to its logical end point, this idea may yield some possibly counter-intuitive conclusions regarding the parts to be played by AI and by different generations in the workplace.

Fluid intelligence relates to the ability to solve new problems by applying well-defined logical rules, such as by means of inductive or deductive reasoning. Fluid intelligence does not depend on any external prior knowledge about the world and the problem domain. In contrast, crystallized intelligence is the capacity to build on one’s experience and knowledge to acquire new skills and to solve problems.

In humans, fluid intelligence tends to decrease with age, while crystallized intelligence follows an inverse trend, peaking much later in life. Machines appear to have surpassed humans in terms of fluid intelligence, given their unprecedented capability to recognize patterns in large volumes of data and to optimize actions over long time horizons. But building general-purpose skills based on expertise in a computer, that is, generating artificial crystallized intelligence, is broadly considered to be unattainable with current AI techniques (listen to Obama’s eloquent explanation of this point!). Current state-of-the-art machine learning methods in fact cannot even explain why they output given decisions.

So there you have it — in a system that can leverage the fluid intelligence of sophisticated AI tools, the crystallized intelligence borne out of the experience of older women or men may become more valuable than the speed and flexibility of fresh graduates. Considering the predictions of an increased lifespan, this sounds like good news — can it be that expertise is not dead  after all?