AI's New New Age
February 13, 2017, 20 min to read
2016 saw the emergence of artificial intelligence in the public debate, not only within the tech ecosystem but more generally in mainstream media.
This follows on many news stories that piled up these last few years, and can be observed through 3 parallel aspects:
- A scientific / technological aspect, with the announcement of major breakthroughs such as AlphaGo, that can beat the best go players – which many thought would only be possible in many years, given that there are more possible positions in a game of go than atoms in the Universe.
- A product / business aspect, linked for example to the launch and improvement of intelligent assistants – Siri, Alexa, Google Assistant – or to the boom in the number of autonomous car systems being developed.
- A macro aspect related to the debate on the global impact of artificial intelligence, which we can sum up as: is AI synonym for apocalypse or cornucopia?
Yet, this very topical subject is a problem for a simple reason: the debate on what artificial intelligence will or should become and lead to is too seldom accompanied by an explanation of what it is, or rather of what the concept covers. This was the starting point of FABERNOVEL’s decision to write this essay and study the past, present and future of artificial intelligence.
It is anything but a trivial starting point, as anyone keeping an eye on AI developments quickly discovers. Defining AI is more than essential for anyone willing to grasp what is at stake: it is the source of fiery debates on a topic that can be a field, a trend, an opportunity or a threat but is above all an academic field with a short yet rich 60-year long history. It is a field whose main characteristic is to be prone to, and torn apart by, ceaseless paradoxes.
Artificial intelligence or the love of paradoxes
Paradox #1: the impossible definition
Getting immersed in artificial intelligence quickly leads to a first obvious conclusion: there is – nor ever was there – a standardized and globally accepted definition for what it is. The choice of the very name “artificial intelligence” is a perfect example: if the mathematician John McCarthy used these words to propose the Dartmouth Summer Research Project – the workshop of summer 1956 that many consider as the kick-off of the discipline – it was as much to set it apart from related research, such as automata theory and cybernetics, as to give it a proper definition.
There are actually many definitions for artificial intelligence. A first great group of definitions could be called “essentialist”, aiming at defining the end-goal a system has to show to enter the category. AI researchers Stuart Russell and Peter Norvig thus compiled 4 approaches among their peers’ definitions: the art / the creation / the study of systems that
- think like humans,
- or think rationally,
- or act like humans,
- or act rationally.
The main concern is thus to know whether artificial intelligence is a matter of process – the way of “thinking” – or result, and so whether intelligence and intelligence simulation are ultimately the same thing or not.
Besides this – and often complementarily – are the definitions one could call “analytical”, which means they unfold a list of required abilities to create artificial intelligence, in part or in whole. For example, computer vision, knowledge representation, reasoning, language comprehension and the ability to plan an action.
According to this point of view, and to borrow the words of Allen Newell and Herbert Simon (who were among the founders of the AI field), “There is no ‘intelligence principle,’ just as there is no ‘vital principle’ that conveys by its very nature the essence of life.”
Let us add two associated, potential main goals, one related to engineering – artificial intelligence as a method to solve precise problems – and the other to science – to better understand the mechanisms of intelligence, or even conscience –, and you understand that drawing clear limits around that field is simply impossible. You also understand why artificial intelligence researchers are part of computer science labs as well as statistics or neuroscience departments.
Given the multiplicity of artificial intelligence, we think that two other definitions – much more malleable – can help us grasp its meaning and stakes:
- A recursive definition: can be branded as artificial intelligence any system categorized as belonging to the field of artificial intelligence. Beneath its appearance of self-evident truth, this definition underlines how defining the outline of a research field is much more than something scientific and rational, centered on the inherent merits of its production: it also points to opposite schools of thought, divergent points of view and sometimes contradictory interests.
- The last definition is subjective: delimiting artificial intelligence can mean borrowing the words of Supreme Court Judge Stewart when he described obscenity in 1964: “I know it when I see it.” With this sentence, we want to suggest that not only are there many possible points of view on what falls under artificial intelligence – and also what is intelligence per se and what are the actions needing it or revealing it – but also how much this perception is bound to evolve with time.
Paradox #2: the constant distance
A second paradox that derives mainly from the absence of a shared definition is sometimes called “the AI effect”, which was cleverly summed up by robotics researcher Rodney Brooks: “Every time we figure out a piece of it, it stops being magical; we say, ‘Oh, that’s just a computation.’”
The evolution of the vision of what the victory of Deep Blue on Garry Kasparov in 1997 represents is exemplary in that sense. The feeling of supreme superiority of machine over men was replaced with a shrug. In the end, we hear that Deep Blue only used “sheer force” and calculated anything and everything without really thinking. Still, an AI king of chess had been announced for decades, in vain.
If the cycle recurs in time, maybe the meaning of AlphaGo will be challenged in the years to come. More than a look on the past or the sedimentation of successful achievements, AI is better understood as an ever-unreachable horizon.
Paradox #3: the manic-depressive dynamics
The history of artificial intelligence is marked by times of retreat – logically following phases of optimistic expansion – of funding and interest granted to the field: this is what we call the “AI winters”. The two main winters occurred in the second half of the 1970s and in the late 1980s / early 1990s. These episodes are often described with much excess, which can give cause to believe – wrongly – that there was no progress at all in any AI subfield at that time. But there were indeed skeptical public evaluation reports, below-expectation results and series of startup bankruptcies.
If bubble phenomena are not new, AI cycles are a separate class. First, because given the first two paradoxes, it is surprising that AI players keep making over-ambitious predictions. Second, because the harshest criticism regarding AI achievements generally comes from within the community. In the end, it is quite logical: if judgement criteria for AI vary greatly, there will always be someone to denigrate a different approach from their own. Finally, because the mere conceptualization of AI winters is a curious process. As an article from The Verge reads, “it’s worth nothing that few disciplines disappoint their acolytes so reliably that they come up with a special name for it.”
Paradox #4: the threshold obsession
Despite unreachable horizons and the inability to draw unanimous limits, specialized texts are full of concepts of intelligence thresholds: “artificial general intelligence” (AGI) and its utmost level “super AI,” or “strong AI” – as opposed to “weak AI”, that is limited to a specific problem –, human-level AI… For example, all these concepts face the same fundamental limit faced by the concept of Web 2.0: looking to crystalize a moment in a fast-evolving field with disputed borders is impossible. Could we precisely date the birth of AGI?
These concepts are even more debatable because they implicitly consider human intelligence – both individual and collective – as a fixed quantity, even though our activities and abilities substantially evolved over the centuries. Through machines and other artefacts we created.
Towards a new golden age?
Considering these paradoxes, the evermore insistent proclamation of an artificial intelligence revolution raises skepticism and asks for a critical analysis.
To briefly sum up the following demonstration: yes, we obtained meaningful results these past years in the artificial intelligence field, yet it would be wrong to see them as a clean break from an ice age when progress would have been almost null. And the risks of entering a new bubble are also quite real.
The current artificial intelligence dynamics can be split four periods: the origins, the initialization, the acceleration and the (risks of) frenzy.
The shift in approaches: from top-down to bottom-up
The current boom takes its roots in the 1980s, the starting point of a turn between the two main approaches structuring artificial intelligence. Until then, the privileged approach was top-down: researchers coded a set of rules in their programs that precisely determined the steps leading to the result. On the opposite, the most used approach these past thirty years is a bottom-up one: you don’t code logic anymore, you code a training process allowing an algorithm to adjust the internal parameters on its own according to its mistakes and successes. This is what we call machine learning.
We quickly went from a mind-inspired approach – for example, replicating the steps followed by a person to solve a mathematical problem – to a brain-inspired one. Let us note that a dominant subcategory of machine learning is the artificial neural network, which aims at replicating the mechanisms of processing and transmitting information as they exist in a human brain.
If machine learning has become more popular among AI researchers, it is because the top-down approach had started to reach its limits: to solve some problems, the number of rules to code could get huge, and for others it was very difficult to set efficient rules. Take the example of image recognition: what rules can you find to universally categorize a cat, knowing the many races, the different situations – outside / inside, sleeping / running – in which you can photograph it, the time of day, and so on? On the contrary, a learning algorithm is more able to manage data complexity.
That being said, we need to underline that these approaches have always coexisted and that the process was one of progressive shift. Inspired by the progress in the comprehension of the brain, the first neural networks date back to the 1950s, and in contrast you still find today some projects working around these rules – for instance Cyc, which aims at gathering all common sense rules obvious to a human brain but ignored by a machine, such as “you can’t be in two places at one time.”
The preeminence of engineering over science
Since the 1970s, problem solving has become more important in comparison with the comprehension of intelligence mechanisms. If, during their 1975 acceptance of the prestigious Turing award, Allen Newell and Herbert Simon evoked two examples, carefully stressing that “both conceptions have deep significance for understanding how information is processed and how intelligence is achieved,” the three figures of deep learning– a branch of machine learning relying on networks made of many neural layers – Yann LeCun, Yoshua Bengio and Geoff Hinton made no reference to the notions of intelligence and mind in their 2015 article summarizing the progress in their field. On the contrary, they showcased applications such as e-commerce recommendations and spam filters.
Nils Nilsson, artificial intelligence professor at Stanford University and historian of his field, thus summarizes the evolution of mentalities following the 1980s AI winter: “One heard fewer brave predictions about what AI could ultimately achieve. Increasingly, effort was devoted to what AI could (at the time) actually achieve. (…) The emphasis was on using AI to help humans rather than to replace them.”
It is no news that the rapid growth in the number of Internet users and in the associated uses has generated a colossal volume of data. For example, researchers Martin Hilbert and Priscila Lopez estimated that the global storage capacity went from 3 GB per inhabitant in 1993 to 45 GB in 2007. But even more than the volume per se, it was its accessibility that benefitted AI researchers. 3% of the data was thus stored numerically in 1993, in contrast to 94 % in 2007. Yet, machine learning algorithms require large training datasets to get results as reliable as possible.
The available datasets have grown in size and complexity, as proven by the example of ImageNet, a labeled image database which has led to a yearly visual recognition contest since 2010. Where a previous contest, PASCAL, was based on 20,000 images split over 20 categories, the ImageNet challenge offers 1.5 million images split over 1,000 categories. These images were collected on the Internet and tagged by humans through Amazon Mechanical Turk. ImageNet greatly contributed to improving the precision of machine learning programs dedicated to this branch – 28% error rate in the 2010 classification versus 3% in 2016, below the human 5%.
Machine learning is no scientific revolution. As we have said, its great principles were imagined as soon as the post-war period, and many great innovations in this field in the 1980s re-boosted this approach – for example, the convolutional neural networks, or ConvNets, whose architecture is based on the organization of the animal visual cortex.
Still, these past years, some advances improved neural networks' performance and brought complementary methods to solve some issues. For example, the Rectifier Linear Units or ReLU, created as late as 2010, allow for a faster training of neural networks.
The emergence of the GPU
One of the main electronic components in a computer is the CPU, for Central Processing Unit, generally described as the “brain” of the computer. In the late 1990s, a new chip was designed: the GPU, for Graphics Processing Unit. As its name suggests, it was originally designed to deal with image processing tasks, because its architecture is different from the CPU. The CPU is non-specialized and can take on a great diversity of calculations, which it does successively, whereas the GPU contains many cores that can make parallel calculations, for example so that each core updates a group of pixels.
In the early 2000s, GPUs were progressively “redirected” for other purposes requiring parallel calculations. Especially neural networks, because it is more efficient to calculate the state of each neuron separately and at the same time. For instance, in 2006, Microsoft researchers indicated that one of their models’ learning process was up to 50% faster when using a GPU instead of a CPU.
Manufacturer Nvidia contributed to the amplification of the movement by launching CUDA in 2007, a software platform facilitating GPU programing.
GPU performance had two main consequences for artificial intelligence research. For one, more than reducing the time dedicated to research, it made possible to multiply the number of iterations, and thus to increase the efficiency of the selected algorithm in the final stage. And second, it set off testing on much more complex neural networks, especially regarding the number of neuron layers used.
Let us get back to ImageNet to illustrate this evolution. In 2012, during the third edition of the contest, a team from the University of Toronto won by a wide margin, taking the research community aback: the winners’ image classification error rate was 28% in 2010, 26% in 2011, and they had just dropped the number to 16%. Their solution was based on a deep learning algorithm trained on two GPUs for one week – and it used CUDA. Their model totalized 650,000 neurons, 8 layers and 630 million connections.
Multiplying the number of players
The impressive results of deep learning, as proved by ImageNet’s champion model in 2012, got the attention of many more research teams. For example, ImageNet went from 6 participating teams in 2012 to 24 in 2013 and to 36 in 2014.
These teams are now more diverse: they come from different countries, universities and companies – particularly from digital giants. In 2013, Google bought DNNresearch, the startup founded by Geoff Hinton, one of the 2012 winners, and won the 2014 contest. In 2015, Microsoft Research won.
Sharing knowledge and skills
The artificial intelligence field benefits from much openness on the teams’ part regarding their methods, tools, and results. In 2014, the ImageNet contest organizers gave all teams the choice between transparency – the promise to describe their methods – and opacity. 31 out of 36 chose the former.
There are more and more frameworks offering ready-made bricks to spare any researcher the effort of starting from scratch on proven models. Of course, some frameworks are not new, such as Torch, created in 2002, or Theano, in 2010, but the increasing number of the players accelerate their development – network effects also exist for open-source programming. Giants such as Google and Microsoft have already started to open up parts of their models – Google’s TensorFlow in 2015. Even visual learning environments for artificial intelligence models follow this opening pattern, as we saw recently with OpenAI and Google DeepMind.
Scaling up with the cloud
The advent of cloud computing also contributed to the strong dynamics of artificial intelligence, by facilitating the access to a growing computing power, which, in conjunction with GPUs and framework use, enables Google, Microsoft and Amazon to offer turnkey artificial intelligence solutions.
This scaling up of models allowed by cloud computing is considerable, as Andrew Ng – deep learning figure and scientific director of Chinese giant Baidu – said in 2015 regarding the exponential growth in the number of connections in neural networks:
Ever more generalizable models
The history of artificial intelligence is one of progressive disappearance of silos between models. The 1960s programs were completely tied up to the problem they solved, because, for example, the rules that were encapsulated to prove mathematical theorems proved of little use to computer vision, and vice versa. Expert systems from the 1970s and 1980s improved things a little: they basically were a set of knowledge rules used in a very specific domain – such as bacterial infections diagnostic for medicine – but the rule -combination engine designed to ease decision making could be applied to something else. Tests were made to create general expert systems, to be completed with the rules appropriate to the case in point.
Since it is an agnostic approach to the functioning of algorithms, which adjust on their own, machine learning is, on the contrary, ideal for the application of one model to new problems, such as going from image to vocal recognition.
Need is to stress that machine learning models are more generalizable for many problems than previous models, but they are not entirely generalizable. There is always a need to adapt to a new problem, even if you don’t have to start from scratch.
In the end, the distinction made by Seymour Papert a few decades ago between “tau” – toy problems, or problems with very little interest used to test approaches, just like checkers or chess games –, “theta” – theoretical problems –, and “rho” problems – applications to the real world – is not as effective. DeepMind achievements are a good example of this: they mix Atari game learning models, thoughts on integration between deep learning and neuroscience, and recent applications to Google data centers to optimize their energy bill.
Here comes the echo chamber
Thanks to its recent progress, research on artificial intelligence is an object of interest and even of attraction. As media and investors get more interested in the results and in the companies looking to get more tangible applications out of it, the trend becomes more self-sustaining. For example, those who could bring "big data" forward yesterday now ought to align themselves with AI, which gives it even more visibility and thus strengthens the interest to align with it, and so on.
The 2016 NIPS conference – one of the major events dedicated to machine learning – witnessed an example of this excess: a research team created a fake startup, called upon a new promising technique – whose acronym read TROLL – and received a lot of interest, especially from several investors. They took the bait (too much).
Of course, this increase in the number of projects does not favor a simultaneous step backwards regarding the classification of the various innovations offered: for example, the concepts of “artificial intelligence” and “machine learning”, which are not synonymous, are often put at the same level, and are often mixed up. We thus recently witnessed the coining of a new term: “machine intelligence”. Which does not help clarifying things.
Another example: the news regarding the future opening of a no-checkout Amazon Go store, including a recent article by Fortune that explained it was a “mixture of computer vision [and] artificial intelligence.” But computer vision is a subfield of artificial intelligence.
This mix-up is unfortunate given all the paradoxes inherent to AI. To stress this point one last time, the distinction between “true” and “false” AI is a battle of every instant, and confusion can feed false promises and disillusionment. The unavoidable Gartner hype cycle is relevant here, with a major difference: in the AI field, ups and downs have always shown excessive amplitude.
A hoped for or dreaded convergence yet to come
Recent advances in artificial intelligence arouse many existential fears for humanity or, on the contrary, fuel new utopias – describing a future with no work, for instance. These interrogations are logical, and we can only support their study early rather than in a rush, but we must also note that today or tomorrow, no artificial “general” intelligence will be able to assist or replace us in every single task in life.
Some innovations bring us closer to it – the autonomous car, for example, which includes “bricks” from several AI branches and whose current developments very few dreamed of only 5 years back. But the convergence between the different AI “skills” or sub-categories has already been considered in the past, especially in robotics, without ever succeeding. Nothing confirms that this convergence will necessarily occur within the current trend.
Conclusion: winter is (not) coming
We saw that if talking about an AI “revolution” would be somehow excessive because it is more an acceleration of older movements rather than a historical discontinuity, significant advances in the artificial intelligence field were achieved in the last few years. AI is overflowing once again.
Now, the question is knowing where it will stop: are we talking mainly about problems solved for good, such as computer vision, now superior in many aspects to human faculties? Or will the applications prove limited again, with AI going back to being endlessly done and undone?
We truly believe there will be disappointing times. It is obvious. But the greater standardization of methods and the vast availability of tools bode well for a certain economic impact: the democratization of AI. At what scale? And what will be the consequences on our economy, and beyond, on our society? There lie the major questions guiding our long-term study.