Publication: Efficient Learning from Faulty Data
Open/View Files
Date
1995
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Decatur, Scott Evan. 1995. Efficient Learning from Faulty Data. Harvard Computer Science Group Technical Report TR-30-95.
Research Data
Abstract
Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of efficient learning algorithms in noise tolerant variants of Valiant's PAC learning model. By applying this frame-work, we also obtain many new results for specific learning problems in various settings with faulty data. The central tool used in this thesis is the specification of learning algorithms in Kearns' Statistical Query (SQ) learning model, in which statistics, as opposed to labelled examples, are requested by the learner. These SQ learning algorithms are then converted into PAC algorithms which tolerate various types of faulty data. We develop this framework in three major parts: 1.) We design automatic compilations of SQ algorithms into PAC algorithms which tolerate various types of data errors. These results include improvements to Kearns' classification noise compilation, and the first such compilations for malicious errors, attribute noise and new classes of "hybrid" noise composed of multiple noise types. 2.) We prove nearly tight bounds on the required complexity of SQ algorithms. The upper bounds are based on a constructive technique which allows one to achieve this complexity even when it is not initially achieved by a given SQ algorithm. 3.) We define and employ an improved model of SQ learning which yields noise tolerant PAC algorithms that are more efficient than those derived from standard SQ algorithms. Together, these results provide a unified and intuitive framework for noise tolerant learning that allows the algorithm designer to achieve efficient, and often optimal, fault tolerant learning.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service