Modeling Through Latent Variables
In: Annual Review of Statistics and Its Application, Band 4, Heft 1, S. 267-282
43 Ergebnisse
Sortierung:
In: Annual Review of Statistics and Its Application, Band 4, Heft 1, S. 267-282
SSRN
Statistical models often extend beyond the data available. First, in coarse data, what is actually observed is less detailed than what might be, owing to incompleteness, censoring, grouping, or a combination thereof. Second, in augmented data, the observed data are hypothetically supplemented with random effects, latent variables/classes, or component membership in mixture distributions. The two settings together will be referred to as enriched data. Reasons for modelling enriched data encompass mathematical and computational convenience, advantages in interpretation, and substantive plausibility. Models for enriched data combine evidence coming from empirical data with unverifiable model components, resting entirely on assumptions. This has acute consequences for enriched data, but knowledge about this issue is somewhat scattered. We provide a unified framework for enriched data and show, generally and with focus on incomplete-data models and random-effects models on the other hand, that to any given model an entire class of models can be assigned, with all of its members producing the same fit to the observed data but arbitrary regarding the unobservable parts of the enriched data. The concepts developed are illustrated using a clinical trial in toenail dermatophyte onychomycosis and a developmental toxicity study conducted in mice. ; The authors gratefully acknowledge support from IAP Research Network P6/03 of the Belgian Government (Belgian Science Policy).
BASE
We analyse the problem of two clinically inseparable, repeatedly measured responses of ordinal type by also incorporating their missingness process. In our application these are the therapeutic effect and extent of side effects of fluvoxamine. In the case of a composite end point, the scientific questions addressed can be answered only when the responses are modelled jointly. As an extension of the methodology, several missingness not at random models were fitted to a set of observed data and shown to yield approximately the same result as their missingness at random counterparts, although it affects precision. In addition, the effect of various identifying restrictions on multiple imputation is investigated. An alternative numerical approximation method is suggested to reduce computational time. ; Financial support from the Interuniversity Attraction Pole research network P7/06 of the Belgian Government (Belgian Science Policy), the Flemish Supercomputer Project and the Institute for the Promotion of Innovation through Science and Technology in Flanders, in which Intel and Janssen Pharmaceutica are partners, is gratefully acknowledged. We are also grateful to Dr Kris Bogaerts of I-BioStat for his expert advice.
BASE
In many biomedical studies, one jointly collects longitudinal continuous, binary, and survival outcomes, possibly with some observations missing. Random-effects models, sometimes called shared-parameter models or frailty models, received a lot of attention. In such models, the corresponding variance components can be employed to capture the association between the various sequences. In some cases, random effects are considered common to various sequences, perhaps up to a scaling factor; in others, there are different but correlated random effects. Even though a variety of data types has been considered in the literature, less attention has been devoted to ordinal data. For univariate longitudinal or hierarchical data, the proportional odds mixed model (POMM) is an instance of the generalized linear mixed model (GLMM; Breslow and Clayton, 1993). Ordinal data are conveniently replaced by a parsimonious set of dummies, which in the longitudinal setting leads to a repeated set of dummies. When ordinal longitudinal data are part of a joint model, the complexity increases further. This is the setting considered in this paper. We formulate a random-effects based model that, in addition, allows for overdispersion. Using two case studies, it is shown that the combination of random effects to capture association with further correction for overdispersion can improve the model's fit considerably and that the resulting models allow to answer research questions that could not be addressed otherwise. Parameters can be estimated in a fairly straightforward way, using the SAS procedure NLMIXED. ; The authors gratefully acknowledge the financial support from the IAP research network #P7/06 of the Belgain Government (Belgrain Science Policy) and the Flemish Supercomputer Project.
BASE
Non-Gaussian outcomes are frequently modeled using members of the exponential family. In particular, the Bernoulli model for binary data and the Poisson model for count data are well known. Two reasons for extending this family are (1) the occurrence of overdispersion, implying that the variability in the data is not adequately described by the models, and (2) the incorporation of hierarchical structure in the data. These issues are routinely addressed separately, the first one through overdispersion models, the second one, for example, by means of random effects within the generalized linear mixed models framework. Molenberghs et al (2007, 2010) introduced a so called combined model that simultaneously addresses both. In these and subsequent papers, a lot of attention was given to binary outcomes, counts, and time-to-event responses. While common in practice, ordinal data have not been studied from this angle. In this paper, a model for ordinal repeated measures, subject to overdispersion, is formulated. It can be fitted without difficulty using standard statistical software. The model is exemplified using data from an epidemiological study in diabetic patients and using data from a clinical trial in psychiatric patients. ; Financial support from the IAP research network #P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged.
BASE
The majority of the statistical literature for the joint modeling of longitudinal and time-to-event data has focused on the development of models that aim at capturing specific aspects of the motivating case studies. However, little attention has been given to the development of diagnostic and model-assessment tools. The main difficulty in using standard model diagnostics in joint models is the nonrandom dropout in the longitudinal outcome caused by the occurrence of events. In particular, the reference distribution of statistics, such as the residuals, in missing data settings is not directly available and complex calculations are required to derive it. In this article, we propose a multiple-imputation-based approach for creating multiple versions of the completed data set under the assumed joint model. Residuals and diagnostic plots for the complete data model can then be calculated based on these imputed data sets. Our proposals are exemplified using two real data sets. ; The authors gratefully acknowledge support from the IAP research network grant P6/03 of the Belgian government (Belgian Science Policy).
BASE
In: Communications in statistics. Simulation and computation, Band 51, Heft 4, S. 1591-1615
ISSN: 1532-4141
A simple multiple imputation-based method is proposed to deal with missing data in exploratory factor analysis. Confidence intervals are obtained for the proportion of explained variance. Simulations and real data analysis are used to investigate and illustrate the use and performance of our proposal. ; Financial support from the IAP research network # P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged. The research leading to these results has also received funding from the European Seventh Framework programme FP7 2007 - 2013 under grant agreement Nr. 602552. We gratefully acknowledge support from the IWT-SBO ExaScience grant. We are grateful for suggestions made by anonymous referees, which have greatly helped to improve this manuscript.
BASE
Molenberghs, Verbeke, and Demetrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequentwork has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event). ; Financial support from the IAP research network #P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged. This work was partially supported by CNPq, a Brazilian science funding agency.
BASE
Finite mixture models have been used to model population heterogeneity and to relax distributional assumptions. These models are also convenient tools for clustering and classification of complex data such as, for example, repeated-measurements data. The performance of model-based clustering algorithms is sensitive to influential and outlying observations. Methods for identifying outliers in a finite mixture model have been described in the literature. Approaches to identify influential observations are less common. In this paper, we apply local-influence diagnostics to a finite mixture model with known number of components. The methodology is illustrated on real-life data. ; The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support from the IAP research Network P7/06 of the Belgian government (Belgian Science Policy) is gratefully acknowledged.
BASE
We describe the application of statistical emulation to the outcomes of an agent-based model. The agent-based model simulates the mechanisms that might have linked the reversal of gender inequality in higher education with observed changes in educational assortative mating in Belgium. Using the statistical emulator as a computationally fast approximation to the expensive agent-based model, it is feasible to use a genetic algorithm in finding the parameter values for which the corresponding agent-based model outcome is closest to known empirical output. These optimal parameter values are then interpreted sociologically ; The second author has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement no. 312290 for the GENDERBALL project.
BASE
Longitudinal experiments often involve multiple outcomes measured repeatedly within a set of study participants. While many questions can be answered by modeling the various outcomes separately, some questions canonly be answered in a joint analysis of all of them. In this article, we will present a review of the many approaches proposed in the statistical literature. Four main model families will be presented, discussed and compared. Focus will be on presenting advantages and disadvantages of the different models rather than on the mathematical or computational details. ; Geert Verbeke, Geert Molenberghs and Steffen Fieuws gratefully acknowledge support from IAP research Network P6/03 of the Belgian Government (Belgian Science Policy). The work of Marie Davidian was supported in part by NIH grants P01 CA142538, R37AI031789 and R01 CA085848.
BASE
In hierarchical data settings, be it of a longitudinal, spatial, multi-level, clustered, or otherwise repeated nature, often the association between repeated measurements attracts at least part of the scientific interest. Quantifying the association frequently takes the form of a correlation function, including but not limited to intraclass correlation. Vangeneugden et al. (2010) derived approximate correlation functions for longitudinal sequences of general data type, Gaussian and non-Gaussian, based on generalized linear mixed-effects models. Here, we consider the extended model family proposed by Molenberghs et al. (2010). This family flexibly accommodates data hierarchies, intrasequence correlation, and overdispersion. The family allows for closed-form means, variance functions, and correlation function, for a variety of outcome types and link functions. Unfortunately, for binary data with logit link, closed forms cannot be obtained. This is in contrast with the probit link, for which such closed forms can be derived. It is therefore that we concentrate on the probit case. It is of interest, not only in its own right, but also as an instrument to approximate the logit case, thanks to the well-known probitlogit 'conversion.' Next to the general situation, some important special cases such as exchangeable clustered outcomes receive attention because they produce insightful expressions. The closed-form expressions are contrasted with the generic approximate expressions of Vangeneugden et al. (2010) and with approximations derived for the socalled logistic-beta-normal combined model. A simulation study explores performance of the method proposed. Data from a schizophrenia trial are analyzed and correlation functions derived. ; Financial support from the IAP research network #P6/03 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged. The fourth author is supported by CNPq, a Brazilian Science Funding Agency.
BASE
In the analyses of incomplete longitudinal clinical trial data, there has been a shift, away from simple methods that are valid only if the data are missing completely at random, to more principled ignorable analyses, which are valid under the less restrictive missing at random assumption. The availability of the necessary standard statistical software nowadays allows for such analyses in practice. While the possibility of data missing not at random (MNAR) cannot be ruled out, it is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials. Rather than either forgetting about or blindly shifting to an MNAR framework, the optimal place for MNAR analyses is within a sensitivity-analysis context. One such route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models. The latter can also be extended to a latent-class mixture model, the approach taken in this article. The performance of the so-obtained flexible model is assessed through simulations and the model is applied to data from a depression trial. ; CB, GM, and GV gratefully acknowledge the financial support from the IAP research Network P6/03 of the Belgian Government (Belgian Science Policy).
BASE
Since the seminal paper by Cook and Weisberg [9 R.D. Cook and S. Weisberg, Residuals and Influence in Regression, Chapman & Hall, London, 1982.], local influence, next to case deletion, has gained popularity as a tool to detect influential subjects and measurements for a variety of statistical models. For the linear mixed model the approach leads to easily interpretable and computationally convenient expressions, not only highlighting influential subjects, but also which aspect of their profile leads to undue influence on the model's fit [17 E. Lesaffre and G. Verbeke, Local influence in linear mixed models, Biometrics 54 (1998), pp. 570–582. doi:10.2307/3109764 [CrossRef], [PubMed], [Web of Science ®]. Ouwens et al. [24 M.J.N.M. Ouwens, F.E.S. Tan, and M.P.F. Berger, Local influence to detect influential data structures for generalized linear mixed models, Biometrics 57 (2001), pp. 1166–1172. doi:10.1111/j.0006-341X.2001.01166.x [CrossRef], [PubMed], [Web of Science ®] applied the method to the Poisson-normal generalized linear mixed model (GLMM). Given the model's nonlinear structure, these authors did not derive interpretable components but rather focused on a graphical depiction of influence. In this paper, we consider GLMMs for binary, count, and time-to-event data, with the additional feature of accommodating overdispersion whenever necessary. For each situation, three approaches are considered, based on: (1) purely numerical derivations; (2) using a closed-form expression of the marginal likelihood function; and (3) using an integral representation of this likelihood. Unlike when case deletion is used, this leads to interpretable components, allowing not only to identify influential subjects, but also to study the cause thereof. The methodology is illustrated in case studies that range over the three data types mentioned. ; Financial support from the IAP research network #P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged.
BASE