E-learning in analysis of genomic and proteomic data 2. Data analysis 2.3. Analysis of high-density proteomic data 2.3.2. MASS spectrometry 2.3.2.2. Statistical issues in the analysis of proteomic profiles

2.3.2.2.1. Experimental Design

Quantitative measures of the proteome are influenced by many sources of variation, both inter-individual and experimental, in addition to the biological variation of interest. Differences between individuals might arise from the age or sex of the subject, or could even be influenced by diet or smoking habits. Experimental differences can be introduced by differences in sample collection or handling prior to analysis, or differences in the analysis itself including varying machine performance or quantity of the samples analysed. If false inferences are to be avoided, careful experimental design is crucial, at all stages in the study, including sample selection and sample handling.
Numerous studies have demonstrated the importance of sample handling. One of the first such studies was conducted in our Institute at the University of Leeds (Banks et al, 2005) and demonstrates the effect of clotting tube type, time to processing and other pre-analytic features on the resulting profiles from SELDI analysis. This was more evident for some chip types than others. Applying cluster analysis to the heights of peaks (see Section 2.3.2.2.2.2) from the spectra, the differences by tube type can be clearly seen (Figure 2.3.2.3).

Figure 2.3.2.3 Cluster analysis based on peak profiles from different tubes

Many proteomic studies comparing two or more groups of samples are essentially observational in nature, and the normal design principles of epidemiological studies should be adhered to wherever possible:

Samples to be compared (e.g. from cases with a particular disease versus controls) should be matched for potential confounders such as age and sex. In addition, if results are to be generalized they should be as representative as possible of the groups from which they are drawn
Systematic differences between the ways in which samples from the different groups or from different centres are handled should be avoided
Analysis of samples in duplicate should be considered, as this will partially remove and also provide a measure of intra-sample variability. However if the main cost constraint is sample processing, the advantages must be weighed against the disadvantage of reduced sample size
Samples, including any replicates, should be assigned to chips (and spots within chips) at random, to eliminate bias that might arise through variations in performance

Attention to experimental design can also improve the efficiency of proteomic studies. For example more sophisticated systems of randomization may be useful to remove known sources of variation (e.g. see Oberg and Vitek, 2009). Sample size calculations should be carried out to maximize power within the constraints of the study; examples of such calculations are discussed further below (Section 2.3.2.2.2.4).

search