Preface

The field of machine learning has, over the past thirty years, produced a rich variety of algorithms that enable computers to "learn" from examples. Machine learning algorithms adaptively filter the noise in telephone data transmissions, allow computers to recognize human speech, support medical decision making and influence our strategies of money investment. Soon they will take control over vehicles on our highways, become an integral part of autonomous robots that assist us in everyday life, and help us accessing information from the various new on-line information sources that have just begun to exist.

To study machine learning, it is sometimes beneficial to step back a little and investigate the species for which the term ``learning'' was first coined. Undoubtedly, learning is a key component of animal and human intelligence. Without our ability to learn, we would be unable to adapt to the world, to continuously extend our knowledge, to create and to memorize solutions to complex problems, to gain wisdom, in short: to survive. While the human learning problem is amazingly rich and diverse---and yet still poorly understood---, machine learning has come up with its own, narrow definition of what it means for a computer to learn. Put simply, machine learning addresses the fitting, transformation, and characterization of a set of data points. To interpret data in this way, most learning algorithms consider a family of possible functions, and pick one by maximizing some pre-given measure of performance. In some of the more recent approaches, the set of data points may also include rules, often provided by a knowledgeable human expert. Despite intense and certainly not unsuccessful research in the field of machine learning, the learning abilities of humans still appear to be widely unmatched by that of today's computer programs. Humans, for example, are often able to learn complex concepts and skills from a strikingly small number of training examples, despite the enormous complexity of the real world. Today's machine learning approaches scale poorly to worlds of similar complexity, as they typically require vast amounts of examples to distinguish the relevant from the irrelevant.

When considering the discrepancy between human learning abilities and that of today's machines, many explanations come to mind. Certainly, we have not yet succeeded in understanding the role each individual neuron plays in our brain, and the way they are interconnected. Neither have we come up with precise enough an idea what information is encoded in our genes, learned for us by evolution. Maybe it will take 10^10 processing units to build a machine that generalizes correctly from scarce data---which is approximately the number of neurons in a human brain. Or maybe we need 600 million years of worldwide, intense research to come up with such learning algorithms---which is about the time it took nature to design humans. Or maybe, it are just the typical problems faced by humans that differ from those studied in machine learning. For example, most of today's machine learning approaches learn single functions in isolation from an isolated set of data points. In fact, there is reason to believe that the problem of fitting points is really hard, and our current algorithms perform well in that respect. Perhaps generalizing from scarce data is easier for humans simply because we do not tend to learn isolated functions from isolated datasets that lack any context.

This book is the result of an attempt to broaden the scope of machine learning. The framework proposed here, called lifelong learning, addresses scenarios in which a learning algorithm faces a whole collection of learning tasks. Instead of having just an isolated set of data points, a lifelong learning algorithm can incrementally build on previous learning experiences in order to generalize more accurately. Consider, for example, the task of recognizing objects from color camera images, which is one of the examples studied in this book. When learning to recognize a new object, knowledge acquired in previous object recognition tasks can aid the learner with a general understanding of the invariances that apply to all object recognition tasks (e.g., invariances with respect to translation, rotation, scaling, varying illumination), hence lead to improved recognition rates from less training data. Lifelong learning addresses the question of learning to learn. The acquisition, representation and transfer of domain knowledge are the key scientific concerns that arise in lifelong learning.

To approach the lifelong learning problem, this book describes a new algorithm, called the explanation-based neural network learning algorithm (EBNN). EBNN integrates two well-understood machine learning paradigms: artificial neural network learning and explanation-based learning. The neural network learning strategy enables EBNN to learn from noisy data in the absence of prior learning experience. It also allows it to learn domain-specific knowledge that can be transferred to later learning tasks. The explanation-based strategy employs this domain-specific knowledge to explain the data in order to guide the generalization in a knowledgeable and domain-specific way. By doing so, it reduces the need for training data, replacing it by previously learned domain-specific knowledge.

To elucidate the EBNN approach in practice, empirical results derived in the context of supervised and reinforcement learning are also reported. Experimental testbeds include an object recognition task, several robot navigation and manipulation tasks, and the game of chess. The main scientific result of these studies is that the transfer of previously learned knowledge decreases the need for training data. In all our experiments, EBNN generalizes significantly more accurately than traditional methods if it has previously faced other, related learning tasks. A second key result is that EBNN's transfer mechanism is both effective and robust to errors in the domain knowledge. If the learned domain knowledge is accurate, EBNN compares well to other explanation-based methods. If this knowledge is inaccurate and thus misleading, EBNN degrades gracefully to a comparable inductive neural network algorithm. Whenever possible, I have preferred real robot hardware over simulations, and high-dimensional feature spaces over those low-dimensional ones that are commonly used in artificial ``toy'' problems. The diversity of experimental testbeds shall illustrate that EBNN is applicable under a wide variety of circumstances, and in a large class of problems.

This book is purely technical in nature. Its central aim is to advance the state-of-the-art in machine learning. In particular, it seeks to provide a learning algorithm that generalizes more correctly from less training data than conventional algorithms by exploiting domain knowledge gathered in previous learning tasks. EBNN is adequate if the learning algorithm faces multiple, related learning tasks; it will fail to improve the learning results if a single, isolated set of data points is all that is available for learning. This research demonstrates that significantly superior results can be achieved by going beyond the intrinsic limitations associated with learning single functions in isolation. Hopefully, the book opens up more questions than it provides answers, by pointing out potential research directions for future work on machine learning.