Model Selection Using Information Theory and the MDL Principle

Stine, Robert A

Model Selection Using Information Theory and the MDL Principle

Files

Smr.pdf (750.91 KB)

Related Collections

Statistics Papers

Subject

Akaike information criterion (AIC)
Bayes information criterion (BIC)
risk inflation criterion (RIC)
cross-validation
model selection
stepwise regression
regression tree
Applied Mathematics
Statistical Methodology
Statistical Theory

Permalink

https://repository.upenn.edu/handle/20.500.14332/47868

View all metadata

Author

Stine, Robert A

Abstract

Information theory offers a coherent, intuitive view of model selection. This perspective arises from thinking of a statistical model as a code, an algorithm for compressing data into a sequence of bits. The description length is the length of this code for the data plus the length of a description of the model itself. The length of the code for the data measures the fit of the model to the data, whereas the length of the code for the model measures its complexity. The minimum description length (MDL) principle picks the model with smallest description length, balancing fit versus complexity. The conversion of a model into a code is flexible; one can represent a regression model, for example, with codes that reproduce the AIC and BIC as well as motivate other model selection criteria. Going further, information theory allows one to choose from among various types of non-nested models, such as tree-based models and regressions identified from different sets of predictors. A running example that compares several models for the well-known Boston housing data illustrates the ideas.

Publication date

2004-11-01

Journal title

Sociological Methods Research

Collection

Articles

Model Selection Using Information Theory and the MDL Principle

Files

Related Collections

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Permalink

Author

Contributor

Abstract

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

Journal title

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation

Collection

Penn's Heritage