Preface
The field of machine learning has, over the past thirty years,
produced a rich variety of algorithms that enable computers to
"learn" from examples. Machine learning algorithms adaptively
filter the noise in telephone data transmissions, allow computers to
recognize human speech, support medical decision making and influence
our strategies of money investment. Soon they will take control over
vehicles on our highways, become an integral part of autonomous robots
that assist us in everyday life, and help us accessing information
from the various new on-line information sources that have just begun
to exist.
To study machine learning, it is sometimes beneficial to step back a
little and investigate the species for which the term ``learning'' was
first coined. Undoubtedly, learning is a key component of animal and
human intelligence. Without our ability to learn, we would be unable
to adapt to the world, to continuously extend our knowledge, to create
and to memorize solutions to complex problems, to gain wisdom, in
short: to survive. While the human learning problem is amazingly rich
and diverse---and yet still poorly understood---, machine learning has
come up with its own, narrow definition of what it means for a
computer to learn. Put simply, machine learning addresses the fitting,
transformation, and characterization of a set of data points. To
interpret data in this way, most learning algorithms consider a family
of possible functions, and pick one by maximizing some pre-given
measure of performance. In some of the more recent approaches, the
set of data points may also include rules, often provided by a
knowledgeable human expert. Despite intense and certainly not
unsuccessful research in the field of machine learning, the learning
abilities of humans still appear to be widely unmatched by that of
today's computer programs. Humans, for example, are often able to
learn complex concepts and skills from a strikingly small number of
training examples, despite the enormous complexity of the real
world. Today's machine learning approaches scale poorly to worlds of
similar complexity, as they typically require vast amounts of examples
to distinguish the relevant from the irrelevant.
When considering the discrepancy between human learning abilities and
that of today's machines, many explanations come to mind. Certainly,
we have not yet succeeded in understanding the role each individual
neuron plays in our brain, and the way they are interconnected.
Neither have we come up with precise enough an idea what information
is encoded in our genes, learned for us by evolution. Maybe it will
take 10^10 processing units to build a machine that generalizes
correctly from scarce data---which is approximately the number of
neurons in a human brain. Or maybe we need 600 million years of
worldwide, intense research to come up with such learning
algorithms---which is about the time it took nature to design
humans. Or maybe, it are just the typical problems faced by humans that
differ from those studied in machine learning. For example, most of
today's machine learning approaches learn single functions in
isolation from an isolated set of data points. In fact, there is
reason to believe that the problem of fitting points is really hard,
and our current algorithms perform well in that respect. Perhaps
generalizing from scarce data is easier for humans simply because we
do not tend to learn isolated functions from isolated datasets that
lack any context.
This book is the result of an attempt to broaden the scope of machine
learning. The framework proposed here, called lifelong learning,
addresses scenarios in which a learning algorithm faces a whole
collection of learning tasks. Instead of having just an isolated set
of data points, a lifelong learning algorithm can incrementally build
on previous learning experiences in order to generalize more
accurately. Consider, for example, the task of recognizing objects
from color camera images, which is one of the examples studied in this
book. When learning to recognize a new object, knowledge acquired in
previous object recognition tasks can aid the learner with a general
understanding of the invariances that apply to all object recognition
tasks (e.g., invariances with respect to translation, rotation, scaling,
varying illumination), hence lead to improved recognition rates from
less training data. Lifelong learning addresses the question of
learning to learn. The acquisition, representation and transfer of
domain knowledge are the key scientific concerns that arise in
lifelong learning.
To approach the lifelong learning problem, this book describes a new
algorithm, called the explanation-based neural network learning
algorithm (EBNN). EBNN integrates two well-understood machine learning
paradigms: artificial neural network learning and explanation-based
learning. The neural network learning strategy enables EBNN to learn
from noisy data in the absence of prior learning experience. It also
allows it to learn domain-specific knowledge that can be transferred
to later learning tasks. The explanation-based strategy employs this
domain-specific knowledge to explain the data in order to guide the
generalization in a knowledgeable and domain-specific way. By doing
so, it reduces the need for training data, replacing it by previously
learned domain-specific knowledge.
To elucidate the EBNN approach in practice, empirical results derived
in the context of supervised and reinforcement learning are also
reported. Experimental testbeds include an object recognition task,
several robot navigation and manipulation tasks, and the game of
chess. The main scientific result of these studies is that the
transfer of previously learned knowledge decreases the need for
training data. In all our experiments, EBNN generalizes significantly
more accurately than traditional methods if it has previously faced
other, related learning tasks. A second key result is that EBNN's
transfer mechanism is both effective and robust to errors in the
domain knowledge. If the learned domain knowledge is accurate, EBNN
compares well to other explanation-based methods. If this knowledge is
inaccurate and thus misleading, EBNN degrades gracefully to a
comparable inductive neural network algorithm. Whenever possible, I
have preferred real robot hardware over simulations, and
high-dimensional feature spaces over those low-dimensional ones that are
commonly used in artificial ``toy'' problems. The diversity of
experimental testbeds shall illustrate that EBNN is applicable under a
wide variety of circumstances, and in a large class of problems.
This book is purely technical in nature. Its central aim is to advance
the state-of-the-art in machine learning. In particular, it seeks to
provide a learning algorithm that generalizes more correctly from less
training data than conventional algorithms by exploiting domain
knowledge gathered in previous learning tasks. EBNN is adequate if the
learning algorithm faces multiple, related learning tasks; it will
fail to improve the learning results if a single, isolated set of data
points is all that is available for learning. This research
demonstrates that significantly superior results can be achieved by
going beyond the intrinsic limitations associated with learning single
functions in isolation. Hopefully, the book opens up more questions than it
provides answers, by pointing out potential research directions for
future work on machine learning.