TY - JOUR
T1 - A Bayesian nonparametric approach for the analysis of multiple categorical item responses
AU - Waters, Andrew
AU - Fronczyk, Kassandra
AU - Guindani, Michele
AU - Baraniuk, Richard G.
AU - Vannucci, Marina
N1 - Publisher Copyright:
© 2014 Published by Elsevier B.V.
PY - 2015/11/1
Y1 - 2015/11/1
N2 - We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogeneous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.
AB - We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogeneous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.
KW - Bayesian nonparametrics
KW - Cluster analysis
KW - Factor analysis
KW - Learning analytics
KW - Multinomial probit model
UR - http://www.scopus.com/inward/record.url?scp=85027941491&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027941491&partnerID=8YFLogxK
U2 - 10.1016/j.jspi.2014.07.004
DO - 10.1016/j.jspi.2014.07.004
M3 - Article
AN - SCOPUS:85027941491
SN - 0378-3758
VL - 166
SP - 52
EP - 66
JO - Journal of Statistical Planning and Inference
JF - Journal of Statistical Planning and Inference
ER -