Bootstrapping for text learning tasks

Update Item Information
Publication Type Journal Article
School or College College of Engineering
Department Computing, School of
Creator Riloff, Ellen M.
Other Author Jones, Rosie; McCallum, Andrew; Nigam, Kamal
Title Bootstrapping for text learning tasks
Date 1999
Description When applying text learning algorithms to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents bootstrapping as an alternative approach to learning from large sets of labeled data. Instead of a large quantity of labeled data, this paper advocates using a small amount of seed information and a large collection of easily-obtained unlabeled data. Bootstrapping initializes a learner with the seed information; it then iterates, applying the learner to calculate labels for the unlabeled data, and incorporating some of these labels into the training input for the learner. Two case studies of this approach are presented. Bootstrapping for information extraction provides 76% precision for a 250-word dictionary for extracting locations from web pages, when starting with just a few seed locations. Bootstrapping a text classifier from a few keywords per class and a class hierarchy provides accuracy of 66%, a level close to human agreement, when placing computer science research papers into a topic hierarchy. The success of these two examples argues for the strength of the general boot¬ strapping approach for text learning tasks.
Type Text
Publisher Association for the Advancement of Artificial Intelligence (AAAI)
First Page 1
Last Page 12
Subject Bootstrapping; Text learning algorithms; Seed information
Subject LCSH Bootstrap (Statistics)
Language eng
Bibliographic Citation Jones, R., McCallum, A., Nigam, K., & Riloff, E. (1999). Bootstrapping for text learning tasks. IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications, 1-12.
Rights Management (c)AAAI http://www.aaai.org/
Format Medium application/pdf
Format Extent 1,937,347 bytes
Identifier ir-main,12412
ARK ark:/87278/s6j399q2
Setname ir_uspace
ID 702981
Reference URL https://collections.lib.utah.edu/ark:/87278/s6j399q2