Advanced Multivariable Statistical Analysis Interactive Tool for Handling
Missing Data and Confounding Covariates for Label-free LC-MS Proteomics
Experiments

Sudhir      Srivastava; Michael   L.   Merchant; Craig   J.   McClain; Anil      Rai; Krishna   K.   Chaturvedi; Ulavappa   B.   Angadi; Dwijesh   C.   Mishra; Shesh   N.   Rai

doi:10.2174/1574893618666230223150253

Abstract

Background: Careful consideration is required for detecting significant features (proteins or peptides) in LC-MS proteomics studies using multivariable regression analyses. In proteomics data, missing values can arise due to random errors, bad samples, features below the detection limit in specific samples, etc. Further, expression data are always prone to heterogeneity due to technical/biological reasons. Missing values and heterogeneity in proteomics studies can confound important findings. Moreover, there is additional information in these studies, such as pre-clinical and clinical information (e.g., sex, exposure, etc.), which can be used to supplement the inference.

Methods: We introduce a user-friendly web application SATP (Statistical Analysis interactive Tool for label-free LC-MS Proteomics experiments) for differential expression analysis of proteomics data that is scalable to large clinical proteomic studies. Appropriate normalization and imputation methods have been provided. Apart from these, several statistical tests such as t-test, moderated t-test, linear fixed effect model, and linear mixed model with adjustment of effect of extra covariates have also been provided for users' benefit.

Results: Our intuitive tool has several advantages over the existing ones, including an extension to multiple factor comparisons after adjusting for covariates.

Conclusion: This is a comprehensive tool for analysis of complex experiments with multiple covariates, whereas most of the existing tools were developed for comparing simple experiments mostly with two groups without covariates.

Availability: The tool can be accessed freely by the users from https://ulbbf.shinyapps.io/satp/.

« Previous Next »

[1]
Anderson NL, Anderson NG. Proteome and proteomics: New technologies, new concepts, and new words. Electrophoresis  1998; 19(11): 1853-61.
 [http://dx.doi.org/10.1002/elps.1150191103] [PMID:  9740045]

[2]
Zhang G, Annan RS, Carr SA, Neubert TA. Overview of peptide and protein analysis by mass spectrometry. Curr Protoc Mol Biol  2014; 108: 1-30.
 [http://dx.doi.org/10.1002/0471142727.mb1021s108]

[3]
Piehowski PD, Petyuk VA, Orton DJ, et al. Sources of technical variability in quantitative LC-MS proteomics: Human brain tissue sample analysis. J Proteome Res  2013; 12(5): 2128-37.
 [http://dx.doi.org/10.1021/pr301146m] [PMID:  23495885]

[4]
Goeminne LJE, Gevaert K, Clement L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J Proteomics  2018; 171: 23-36.
 [http://dx.doi.org/10.1016/j.jprot.2017.04.004] [PMID:  28391044]

[5]
Wieczorek S, Combes F, Lazar C, et al. DAPAR & ProStaR: Software to perform statistical analyses in quantitative discovery proteomics. Bioinformatics  2017; 33(1): 135-6.
 [http://dx.doi.org/10.1093/bioinformatics/btw580] [PMID:  27605098]

[6]
Glaab E, Schneider R. RepExplore: Addressing technical replicate variance in proteomics and metabolomics data analysis. Bioinformatics  2015; 31(13): 2235-7.
 [http://dx.doi.org/10.1093/bioinformatics/btv127] [PMID:  25717197]

[7]
Choi M, Chang CY, Clough T, et al. MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics  2014; 30(17): 2524-6.
 [http://dx.doi.org/10.1093/bioinformatics/btu305] [PMID:  24794931]

[8]
Polpitiya AD, Qian WJ, Jaitly N, et al. DAnTE: A statistical tool for quantitative analysis of -omics data. Bioinformatics  2008; 24(13): 1556-8.
 [http://dx.doi.org/10.1093/bioinformatics/btn217] [PMID:  18453552]

[9]
Serang O, Käll L. Solution to statistical challenges in proteomics is more statistics, not less. J Proteome Res  2015; 14(10): 4099-103.
 [http://dx.doi.org/10.1021/acs.jproteome.5b00568] [PMID:  26257019]

[10]
Webb-Robertson BJM, Wiberg HK, Matzke MM, et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res  2015; 14(5): 1993-2001.
 [http://dx.doi.org/10.1021/pr501138h] [PMID:  25855118]

[11]
Rubin DB. Inference and missing data. Biometrika  1976; 63(3): 581-92.
 [http://dx.doi.org/10.1093/biomet/63.3.581]

[12]
R Core Team. A language and environment for statistical computing.
R Foundation for Statistical Computing. Vienna, Austria.
2020.  Available from: https://www.R-project.org/

[13]
Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y, et al. Shiny: Web Application Framework for R. R package version
1.7.0. 2021.  Available from: https://CRAN.R-project.org/package=shiny

[14]
Lazar C. MputeLCMD: A collection of methods for left-censored
missing data imputation. R package version 2.0. 2015.  Available from: https://CRAN.R-project.org/package=imputeLCMD

[15]
Karpievitch YV, Dabney AR, Smith RD. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics  2012; 13 (Suppl. 16): S5.
 [http://dx.doi.org/10.1186/1471-2105-13-S16-S5] [PMID:  23176322]

[16]
Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics  2003; 19(2): 185-93.
 [http://dx.doi.org/10.1093/bioinformatics/19.2.185] [PMID:  12538238]

[17]
Bolstad B. Preprocess Core: A collection of pre-processing functions.
R package version 1.52.1. 2021.  Available from: https://github.com/bmbolstad/preprocessCore

[18]
Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics  2002; 18 (Suppl. 1): S96-S104.
 [http://dx.doi.org/10.1093/bioinformatics/18.suppl_1.S96] [PMID:  12169536]

[19]
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res  2015; 43(7): e47.
 [http://dx.doi.org/10.1093/nar/gkv007] [PMID:  25605792]

[20]
Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann Appl Stat  2016; 10(2): 946-63.
 [http://dx.doi.org/10.1214/16-AOAS920] [PMID:  28367255]

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893618666230223150253	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Advanced Multivariable Statistical Analysis Interactive Tool for Handling Missing Data and Confounding Covariates for Label-free LC-MS Proteomics Experiments

Abstract Play Pause

Related Journals

Related Books

Abstract