Analyzing three-way data is challenging due to complex dependencies between observations, which must be accounted for to ensure reliable results. We focus on hierarchical, multivariate, binary data organized in a three-way data structure, where rows correspond to first-level units, columns to variables, and layers to second-level units within which the first-level units are nested. In this framework, model-based clustering methods can be effectively employed for dimensionality reduction purposes, facilitating a clear understanding of the phenomenon under investigation. In this work, we propose a novel modeling tool for a hierarchical clustering of first- and second-level units. We extend the Mixture of Latent Trait Analyzers (MLTA) with concomitant variables by letting prior component probabilities depend also on second-level-specific random effects. Parameter estimation is performed by means of a double EM algorithm based on a variational approximation of the model log-likelihood function, along with a nonparametric maximum likelihood estimation of the second-level-specific random effect distribution. This latter approach allows to estimate a discrete distribution which directly provides a clustering of second-level units. Within (conditional on) each of such clusters, first-level units are partitioned thanks to the MLTA specification. The proposal is applied to data from the European Social Survey to partition countries (second-level units) according to the baseline attitude of their residents (first-level units) toward digital technologies (variables). Within these clusters, residents are partitioned on the basis of their attitude toward specific digital skills. The influence of socio-economic factors on the identification of digitalization profiles is also taken into consideration via a concomitant variable approach.
Hierarchical Mixtures of Latent Trait Analyzers with concomitant variables for multivariate binary data
Dalila Failli
;Maria Francesca Marino;
2025
Abstract
Analyzing three-way data is challenging due to complex dependencies between observations, which must be accounted for to ensure reliable results. We focus on hierarchical, multivariate, binary data organized in a three-way data structure, where rows correspond to first-level units, columns to variables, and layers to second-level units within which the first-level units are nested. In this framework, model-based clustering methods can be effectively employed for dimensionality reduction purposes, facilitating a clear understanding of the phenomenon under investigation. In this work, we propose a novel modeling tool for a hierarchical clustering of first- and second-level units. We extend the Mixture of Latent Trait Analyzers (MLTA) with concomitant variables by letting prior component probabilities depend also on second-level-specific random effects. Parameter estimation is performed by means of a double EM algorithm based on a variational approximation of the model log-likelihood function, along with a nonparametric maximum likelihood estimation of the second-level-specific random effect distribution. This latter approach allows to estimate a discrete distribution which directly provides a clustering of second-level units. Within (conditional on) each of such clusters, first-level units are partitioned thanks to the MLTA specification. The proposal is applied to data from the European Social Survey to partition countries (second-level units) according to the baseline attitude of their residents (first-level units) toward digital technologies (variables). Within these clusters, residents are partitioned on the basis of their attitude toward specific digital skills. The influence of socio-economic factors on the identification of digitalization profiles is also taken into consideration via a concomitant variable approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


