author:"Arteaga Moreno, Francisco Javier" | Pollux - Fachinformationsdienst Politikwissenschaft

Filter

Format

Medientyp

Sprache

Jahre

5 Ergebnisse

Sortierung:

Open Access#12016

Assessment of maximum likelihood PCA missing data imputation

Folch Fortuny, Abel; Arteaga Moreno, Francisco Javier; Ferrer, Alberto

386 393 30 7 ; S ; Maximum likelihood principal component analysis (MLPCA) was originally proposed to incorporate measurement error variance information in principal component analysis (PCA) models. MLPCA can be used to fit PCA models in the presence of missing data, simply by assigning very large variances to the non-measured values. An assessment of maximum likelihood missing data imputation is performed in this paper, analysing the algorithm of MLPCA and adapting several methods for PCA model building with missing data to its maximum likelihood version. In this way, known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR) methods are implemented within the MLPCA method to work as different imputation steps. Six data sets are analysed using several percentages of missing data, comparing the performance of the original algorithm, and its adapted regression-based methods, with other state-of-the-art methods. Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02 and DPI2014-55276-C5-1R, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. Folch Fortuny, A.; Arteaga Moreno, FJ.; Ferrer, A. (2016). Assessment of maximum likelihood PCA missing data imputation. Journal of Chemometrics. 30(7):386-393. https://doi.org/10.1002/cem.2804

Zugriff(Open Access)

BASE

Exportieren

Open Access#22016

Assessment of maximum likelihood PCA missing data imputation

Folch Fortuny, Abel; Arteaga Moreno, Francisco Javier; Ferrer, Alberto

Maximum likelihood principal component analysis (MLPCA) was originally proposed to incorporate measurement error variance information in principal component analysis (PCA) models. MLPCA can be used to fit PCA models in the presence of missing data, simply by assigning very large variances to the non-measured values. An assessment of maximum likelihood missing data imputation is performed in this paper, analysing the algorithm of MLPCA and adapting several methods for PCA model building with missing data to its maximum likelihood version. In this way, known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR) methods are implemented within the MLPCA method to work as different imputation steps. Six data sets are analysed using several percentages of missing data, comparing the performance of the original algorithm, and its adapted regression-based methods, with other state-of-the-art methods. ; Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02 and DPI2014-55276-C5-1R, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. ; Folch Fortuny, A.; Arteaga Moreno, FJ.; Ferrer, A. (2016). Assessment of maximum likelihood PCA missing data imputation. Journal of Chemometrics. 30(7):386-393. https://doi.org/10.1002/cem.2804 ; S ; 386 ; 393 ; 30 ; 7

Zugriff(Open Access)

BASE

Exportieren

Open Access#32016

Missing Data Imputation Toolbox for MATLAB

Folch Fortuny, Abel; Arteaga Moreno, Francisco Javier; Ferrer, Alberto

[EN] Here we introduce a graphical user-friendly interface to deal with missing values called Missing Data Imputation (MDI) Toolbox. This MATLAB toolbox allows imputing missing values, following missing completely at random patterns, exploiting the relationships among variables. In this way, principal component anal- ysis (PCA) models are fitted iteratively to impute the missing data until convergence. Different methods, using PCA internally, are included in the toolbox: trimmed scores regression (TSR), known data regres- sion (KDR), KDR with principal component regression (KDR-PCR), KDR with partial least squares regression (KDR-PLS), projection to the model plane (PMP), iterative algorithm (IA), modified nonlinear iterative partial least squares regression algorithm (NIPALS) and data augmentation (DA). MDI Toolbox presents a general procedure to impute missing data, thus can be used to infer PCA models with missing data, to estimate the covariance structure of incomplete data matrices, or to impute the missing values as a preprocessing step of other methodologies. ; Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02 and DPI2014-55276-C5-1 R, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. ; Folch Fortuny, A.; Arteaga Moreno, FJ.; Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems. 154:93-100. https://doi.org/10.1016/j.chemolab.2016.03.019 ; S ; 93 ; 100 ; 154

Zugriff(Open Access)

BASE

Exportieren

Open Access#42015

PCA model building with missing data: New proposals and a comparative study

Folch-Fortuny, Abel; ARTEAGA MORENO, FRANCISCO JAVIER; Ferrer Riquelme, Alberto José

77 88 146 ; S ; [EN] This paper introduces new methods for building principal component analysis (PCA) models with missing data: projection to the model plane (PMP), known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR). These methods are adapted from their PCA model exploitation version to deal with the more general problem of PCA model building when the training set has missing values. A comparative study is carried out comparing these new methods with the standard ones, such as the modified nonlinear iterative partial least squares (NIPALS), the it- erative algorithm (IA), the data augmentation method (DA) and the nonlinear programming approach (NLP). The performance is assessed using the mean squared prediction error of the reconstructed matrix and the cosines between the actual principal components and the ones extracted by each method. Four data sets, two simulated and two real ones, with several percentages of missing data, are used to perform the comparison. Guardar / Salir Siguiente > Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. The authors gratefully acknowledge Salvador Garcia-Munoz for providing the Phi toolbox (version 1.7) to perform the nonlinear programming approach (NLP) method. Folch-Fortuny, A.; Arteaga Moreno, FJ.; Ferrer Riquelme, AJ. (2015). PCA model building with missing data: New proposals and a comparative study. Chemometrics and Intelligent Laboratory Systems. 146:77-88. https://doi.org/10.1016/j.chemolab.2015.05.006

Zugriff(Open Access)

BASE

Exportieren

Open Access#52015

PCA model building with missing data: New proposals and a comparative study

Folch-Fortuny, Abel; ARTEAGA MORENO, FRANCISCO JAVIER; Ferrer Riquelme, Alberto José

[EN] This paper introduces new methods for building principal component analysis (PCA) models with missing data: projection to the model plane (PMP), known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR). These methods are adapted from their PCA model exploitation version to deal with the more general problem of PCA model building when the training set has missing values. A comparative study is carried out comparing these new methods with the standard ones, such as the modified nonlinear iterative partial least squares (NIPALS), the it- erative algorithm (IA), the data augmentation method (DA) and the nonlinear programming approach (NLP). The performance is assessed using the mean squared prediction error of the reconstructed matrix and the cosines between the actual principal components and the ones extracted by each method. Four data sets, two simulated and two real ones, with several percentages of missing data, are used to perform the comparison. Guardar / Salir Siguiente > ; Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. The authors gratefully acknowledge Salvador Garcia-Munoz for providing the Phi toolbox (version 1.7) to perform the nonlinear programming approach (NLP) method. ; Folch-Fortuny, A.; Arteaga Moreno, FJ.; Ferrer Riquelme, AJ. (2015). PCA model building with missing data: New proposals and a comparative study. Chemometrics and Intelligent Laboratory Systems. 146:77-88. https://doi.org/10.1016/j.chemolab.2015.05.006 ; S ; 77 ; 88 ; 146

Zugriff(Open Access)

BASE

Exportieren