Previous: AIDA-98-04 Next: AIDA-99-02 Index 1999

Intellectics Group: Technical Report 99-01

Iterative Concept Learning from Noisy Data

Steffen Lange and Gunter Grieser

In the present paper, we study iterative learning of indexable concept classes from noisy data. We distinguish between learning from positive data only and learning from positive and negative data; synonymously, learning from text and informant, respectively. Following \cite{Stephan95}, a noisy text (a noisy informant) for some target concept contains every correct data item infinitely often while in addition some incorrect data is presented. In the text case, incorrect data is presented only finitely many times while, in the informant case, incorrect data can occur infinitely often.

An iterative learner successively takes as input one element of an information sequence about a target concept as well as its previously made hypothesis, and outputs a new hypothesis about the target concept. The sequence of hypotheses has to converge to a hypothesis correctly describing the target concept. In contrast to an unconstrained learning device, an iterative learner has only limited access to the input data provided.

We present characterizations of iterative learning from noisy data in terms being independent from learning theory. The relevant conditions are purely structural ones. Furthermore, iterative learning from noisy data is compared with standard learning models such as learning in the limit, conservative inference, and finite learning. Surprisingly, when learning from noisy data is concerned, iterative learners are exactly as powerful as unconstrained learning devices. This nicely contrasts the noise-free case, where iterative learners are strictly less powerful. Moreover, the exact location of iterative learning from noisy data within the hierarchy of the other models of learning indexable concept classes is established.

Additionally, we study iterative learning from redundant noise-free data. It turns out that iterative learners are able to exploit redundancy in the input data to overcome, to a certain extent, limitations in their accessibility.

Full Paper: Gzipped postscript BibTeX entry