Mixed Membership Classification for Documents with Hierarchically Structured Labels

Authored by: Edoardo M. Airoldi , David M. Blei , Elena A. Erosheva , Stephen E. Fienberg , Frank Wood , Adler Perotte

Handbook of Mixed Membership Models and Their Applications

Print publication date:  November  2014
Online publication date:  November  2014

Print ISBN: 9781466504080
eBook ISBN: 9781466504097
Adobe ISBN:

10.1201/b17520-20

 Download Chapter

 

Abstract

Placing documents within a hierarchical structure is a common task and can be viewed as a multi-label classification with hierarchical structure in the label space. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. We present a model for hierarchically and multiply labeled bag-of-words data called hierarchically supervised latent Dirichlet allocation (HSLDA). Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-words data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.

 Cite
Search for more...
Back to top

Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.