A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses.

grade of membership model identifiability latent variable model mixed membership model singular value decomposition spectral method successive projection algorithm

Journal

Psychometrika
ISSN: 1860-0980
Titre abrégé: Psychometrika
Pays: United States
ID NLM: 0376503

Informations de publication

Date de publication:
15 Feb 2024
Historique:
received: 04 05 2023
accepted: 30 12 2023
medline: 16 2 2024
pubmed: 16 2 2024
entrez: 15 2 2024
Statut: aheadofprint

Résumé

Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.

Identifiants

pubmed: 38360980
doi: 10.1007/s11336-024-09951-y
pii: 10.1007/s11336-024-09951-y
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Division of Mathematical Sciences
ID : 2210796

Informations de copyright

© 2024. The Author(s), under exclusive licence to The Psychometric Society.

Références

Airoldi, E. M., Blei, D., Erosheva, E. A., & Fienberg, S. E. (2014). Handbook of mixed membership models and their applications. Boca Raton: CRC Press.
doi: 10.1201/b17520
Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.
pubmed: 21701698
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. Selected papers of Hirotugu Akaike (pp. 199–213).
Araújo, M. C. U., Saldanha, T. C. B., Galvao, R. K. H., Yoneyama, T., Chame, H. C., & Visani, V. (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2), 65–73.
doi: 10.1016/S0169-7439(01)00119-8
Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1), 155–173.
doi: 10.1016/j.csda.2006.11.006
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Borsboom, D., Rhemtulla, M., Cramer, A. O., van der Maas, H. L., Scheffer, M., & Dolan, C. V. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 46(8), 1567–1579.
doi: 10.1017/S0033291715001944 pubmed: 26997244
Chen, Y., Chi, Y., Fan, J., & Ma, C. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5), 566–806.
doi: 10.1561/2200000079
Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124–146.
doi: 10.1007/s11336-018-9646-5 pubmed: 30456747
Chen, Y., Li, X., & Zhang, S. (2020). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association, 115(532), 1756–1770.
doi: 10.1080/01621459.2019.1635485
Chen, Y., Ying, Z., & Zhang, H. (2021). Unfolding-model-based visualization: Theory, method and applications. Journal of Machine Learning Research, 22, 11.
Dobriban, E., & Owen, A. B. (2019). Deterministic parallel analysis: An improved method for selecting factors and principal components. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(1), 163–183.
doi: 10.1111/rssb.12301
Donoho, D., & Stodden, V. (2003). When does non-negative matrix factorization give a correct decomposition into parts? Advances in Neural Information Processing Systems, 16.
Embretson, S. E., & Reise, S. P. (2013). Item response theory. New York: Psychology Press.
doi: 10.4324/9781410605269
Erosheva, E. A. (2002). Grade of membership and latent structure models with application to disability survey data. PhD thesis, Carnegie Mellon University.
Erosheva, E. A. (2005). Comparing latent structures of the grade of membership, Rasch, and latent class models. Psychometrika, 70(4), 619–628.
doi: 10.1007/s11336-001-0899-y
Erosheva, E. A., Fienberg, S. E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1(2), 346.
doi: 10.1214/07-AOAS126 pubmed: 21687832
Freyaldenhoven, S., Ke, S., Li, D., & Olea, J. L. M. (2023). On the testability of the anchor words assumption in topic models. Technical report, working paper, Cornell University.
Gillis, N., & Vavasis, S. A. (2013). Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 698–714.
doi: 10.1109/TPAMI.2013.226
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.
doi: 10.1093/biomet/61.2.215
Gormley, I. C., & Murphy, T. B. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2), 265–295.
doi: 10.1214/09-BA410
Gu, Y., Erosheva, E. E., Xu, G., & Dunson, D. B. (2023). Dimension-grouped mixed membership models for multivariate categorical data. Journal of Machine Learning Research, 24(88), 1–49.
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
doi: 10.1017/CBO9780511499531
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.
doi: 10.1007/BF02289447 pubmed: 14306381
Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(9), 1457–1469.
Jin, J., Ke, Z. T., & Luo, S. (2023). Mixed membership estimation for social networks. Journal of Econometrics. https://doi.org/10.1016/j.jeconom.2022.12.003
doi: 10.1016/j.jeconom.2022.12.003
Ke, Z. T., & Jin, J. (2023). Special invited paper: The score normalization, especially for heterogeneous network and text data. Stat, 12(1), e545.
doi: 10.1002/sta4.545
Ke, Z. T., & Wang, M. (2022). Using SVD for topic modeling. Journal of the American Statistical Association, 2022, 1–16.
doi: 10.1080/01621459.2022.2123813
Klopp, O., Panov, M., Sigalla, S., & Tsybakov, A. (2023). Assigning topics to documents by successive projections. Annals of Statistics (to appear).
Koopmans, T. C., & Reiersol, O. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 21(2), 165–181.
doi: 10.1214/aoms/1177729837
Manrique-Vallier, D., & Reiter, J. P. (2012). Estimating identification disclosure risk using mixed membership models. Journal of the American Statistical Association, 107(500), 1385–1394.
doi: 10.1080/01621459.2012.710508 pubmed: 25214699 pmcid: 4159106
Mao, X., Sarkar, P., & Chakrabarti, D. (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, 116(536), 1928–1940.
doi: 10.1080/01621459.2020.1751645
Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16, 1–32.
doi: 10.2307/1914288
Pokropek, A. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300–325.
doi: 10.3102/1076998616636618
Robitzsch, A., & Robitzsch, M. A. (2022). Packag ‘sirt’: Supplementary item response theory models.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
doi: 10.1214/aos/1176344136
Shang, Z., Erosheva, E. A., & Xu, G. (2021). Partial-mastery cognitive diagnosis models. Annals of Applied Statistics, 15(3), 1529–1555.
doi: 10.1214/21-AOAS1439
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.
doi: 10.1111/1467-9868.00353
Woodbury, M. A., Clive, J., & Garson, A., Jr. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.
doi: 10.1016/0010-4809(78)90012-5 pubmed: 679655
Zhang, H., Chen, Y., & Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.
doi: 10.1007/s11336-020-09704-7 pubmed: 32451743 pmcid: 7385012

Auteurs

Ling Chen (L)

Department of Statistics, Columbia University, New York, NY, 10027, USA.

Yuqi Gu (Y)

Department of Statistics, Columbia University, New York, NY, 10027, USA. yuqi.gu@columbia.edu.

Classifications MeSH