Publication:
Efficient Learning from Faulty Data

Thumbnail Image

Date

1995

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Decatur, Scott Evan. 1995. Efficient Learning from Faulty Data. Harvard Computer Science Group Technical Report TR-30-95.

Research Data

Abstract

Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of efficient learning algorithms in noise tolerant variants of Valiant's PAC learning model. By applying this frame-work, we also obtain many new results for specific learning problems in various settings with faulty data. The central tool used in this thesis is the specification of learning algorithms in Kearns' Statistical Query (SQ) learning model, in which statistics, as opposed to labelled examples, are requested by the learner. These SQ learning algorithms are then converted into PAC algorithms which tolerate various types of faulty data. We develop this framework in three major parts: 1.) We design automatic compilations of SQ algorithms into PAC algorithms which tolerate various types of data errors. These results include improvements to Kearns' classification noise compilation, and the first such compilations for malicious errors, attribute noise and new classes of "hybrid" noise composed of multiple noise types. 2.) We prove nearly tight bounds on the required complexity of SQ algorithms. The upper bounds are based on a constructive technique which allows one to achieve this complexity even when it is not initially achieved by a given SQ algorithm. 3.) We define and employ an improved model of SQ learning which yields noise tolerant PAC algorithms that are more efficient than those derived from standard SQ algorithms. Together, these results provide a unified and intuitive framework for noise tolerant learning that allows the algorithm designer to achieve efficient, and often optimal, fault tolerant learning.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories