E-learning in analysis of genomic and proteomic data 2. Data analysis 2.1. General analysis workflow 2.1.2. Class comparison (searching for differences between classes)

The majority of genomic/proteomic experiments are comparative in their nature. In other words, when we conduct such an experiment, our aim is to compare genes/proteins in different situations in order to reveal which of them are active/present under different conditions/classes. This is called class comparison. The simplest and the most common case of class comparison is the comparison of genes/proteins between two different classes. For example, we may want to know which genes are active in group of patients which suffer from a particular disease in comparison with the group of control patients. Or, we can conduct a microarray experiment to study the differences in protein expression of a particular bacterial strain cultivated under different conditions, for example aerobic versus semi-aerobic conditions. Another experiment can aim to compare the gene expression profiles of tumor samples at the time of diagnosis and in progression.
     As already mentioned, the simplest case of comparison is to compare two classes, but it is not unusual to compare three or more classes. For example, we can compare gene expression of lymphatic cells from three or more types of lymphoma. Or imagine another situation, if we have group of patients with the same diagnose split in two subgroups according to their treatment and a group of control patients, we can compare their response with respect to control group.
     But how all these comparisons can be done? The simplest way how to find differences among researched groups is to compare expression of each gene across all groups.

In general, there are three approaches that can reveal these differences. The first is based on calculation of effect sizes, the second is to perform statistical hypothesis testing and the third implies the regression strategies. The calculation of effect sizes is the simplest approach and does not require any special statistical tools. However, in this case the information about the reliability of expression changes is missing. In comparison, statistical hypothesis testing and regression strategies do calculate the reliability of the result. The choice between these two strategies depends on fact whether we wish to explore the effect of more than one variable on the differences between groups. The univariate analysis can be successfully performed using statistical hypothesis testing. For the multivariate analysis we should apply regression methods, even when a univariate linear regression can be also performed and is equivalent to t-test (described below).

     In following sections we present calculation of effect sizes and principles of statistical hypothesis testing and selected regression strategies for comparing and revealing differences among the studied classes.