Foreword by Tom M. Mitchell

Over the past thirty years the field of Machine Learning has developed a sequence of increasingly successful paradigms for automatically learning general laws from specific training data. Algorithms for learning neural networks and decision trees are now in widespread use in datamining applications such a learning to detect credit card fraud, in control applications such as optimizing manufacturing processes, and in sensor interpretation tasks such as learning to recognize human speech and human faces. While these algorithms demonstrate the practical importance of machine learning methods, researchers are actively pursuing yet more effective algorithms.

This manuscript describes research aimed at a new generation of machine learning methods -- methods that enable the computer to learn more accurately from less training data. The key to this new approach is to take advantage of other previously acquired knowledge. To see the idea, consider a mobile robot or process control system that must learn a control strategy to achieve a new type of goal (e.g., locating a new type of object) in a familiar environment (e.g., the building in which it has operated for some time). Because the robot has experience in this environment, it is likely to have previously acquired data or knowledge that can be helpful in learning the new task. It might, for example, have learned to predict the approximate effect of various robotic actions on subsequent sensor input. The Explanation-Based Neural Network (EBNN) learning algorithm presented here takes advantage of such prior knowledge, even if it is inexact, to significantly improve accuracy for the new learning task. Whereas earlier methods such as neural network and decision tree induction make use only of the training data for the current learning task, this monograph explores several settings in which previous experience in related tasks can be used to successfully bootstrap new learning.

While the specific EBNN learning algorithm presented here is interesting for its ability to use approximate prior knowledge to improve learning accuracy, the significance of this paradigm goes beyond this particular algorithm. The paradigm of lifelong learning -- using earlier learned knowledge to improve subsequent learning -- is a promising direction for a new generation of machine learning algorithms. Whereas recent theoretical results have shown fundamental bounds on the learning accuracy achievable with pure induction from input-output examples of the target function, the lifelong learning paradigm provides a new setting in which these theoretical bounds are sidestepped by the introduction of knowledge accumulated over a series of learning tasks. While it is too early to determine the eventual outcome of this line of research, it is an exciting and promising attempt to confront the issue of scaling up machine learning algorithms to more complex problems. Given the need for more accurate learning methods, it is difficult to imagine a future for machine learning that does not include this paradigm.

Tom M. Mitchell
November 1995