E-learning in analysis of genomic and proteomic data 2. Data analysis 2.2. Analysis of high-density genomic data 2.2.1. DNA microarrays Analysis of expression arrays

The complete human genome consists of around 3 billion base pairs and is estimated to contain 20,000-25,000 genes– information units containing necessary information for the production of proteins or functional RNA molecules (tRNA, rRNA, microRNA…) via the process of transcription (expression) into mRNA and consequent translation into sequence of amino acids that create protein. The genes are expressed and proteins created according to the individual needs of the cell or whole organism. Not all the genes are active and expressed at the same time. The information about which genes are activated in the tissue/organism at the certain state is called the gene expression profile. The activity of a gene is equivalent to its expression, and the gene expression can be measured by the quantity of the corresponding mRNA in the sample. The amount of mRNA in a sample can be easily measured by DNA microarray technology which was originally designed for this purpose. DNA microarrays measuring the expression of the gene are called expression arrays.

The knowledge on gene expression profile of a sample (cell/tissue/organism) is useful information helping to distinguish between different types of samples. Different expression profiles means different behavior of the cell (tissue/organism) in specified conditions. It is the activation of specific genes that helps the cell to response on different stimuli of the environment, to adapt and survive.
In medicine, the comparison of gene expression between different tissues helps to understand the epidemiology and causes of the disease. Over or under expression of a gene can resolve in a development of the disease and be the only cause. This is true particularly in oncology as the tumor cells are well known by their genomic instability that gives them an advantage over normal cells to survive in all conditions, uncontrollably proliferate and invade into other tissues than the one of their origin. In a healthy organism, the gene expression is very strictly controlled. The expression of the gene can be modified from its normal by different mechanisms. Most common are the gene transcription regulation mechanism distortion and gene aberrations.

Gene transcription regulation distortion

The first cause is the fail of the gene expression regulation mechanism. The gene transcription is a process regulated by an ensemble of different signaling pathways including a number of molecules. These pathways can be combined and/or can alter each other.

Gene aberrations

Gene expression can be affected by changes in the structure or number of copies of a gene. In diploid organism a gene is usually present in a cell in two copies, each from one parent. During the DNA replication process, DNA is prone to insertion and deletion mutations during. These mutations are normally corrected by the inherent proofreading capacity of DNA polymerase and a group of genes involved in mismatch repair. However, if this mechanism of DNA repair fails, the gene can be deleted (in one or both copies), amplified (the gene is presented in more than two copies), or simply mutated such that its product protein is not functional or gains a special properties that can directly cause the disease. In this case, even when the mechanism of the gene expression regulation works well, the expression of the gene is affected. If for any reason the cell looses this control, other controlling mechanism directs the cell into programmed cell death called apoptosis. However, sometimes either this mechanism fails.