E-learning in analysis of genomic and proteomic data 3. Software 3.1. R and Bioconductor
R is an open-source statistical tool for data analysis. It is freely available at www.r-project.org.
To install R, please, follow the instructions on the website.
The main advantage of R is it’s availability and the fact that it is open-source. Open-source means that the codes for all functions are accessible and everybody can create, change,adjust and publish them free of charge. This has made R a popular and widely used statistical programming tool in all the biostatistical fields.
R was originally developed as a free variant of S-plus. However, C+ or Perl programmers might find its structure and especially the treatment of variables and matrices quite inefficace, what becomes visible when treating large matrices. Fortunately, R allows for implementation of C, Perl or Python coded functions, what helps to overcome this problem quite efficiently.
As already mentioned, R allows a user to create his own scripts that can be converted into functions. These functions can be shared either via the text .R files with scripts, or, which is and more effective, via so-called packages (libraries). A library
For a reader interested to learn more about the R structure, we refer to the book Phil Spector: Data Manipulation with R.
Bioconductor is an is an open source and open development software project for the analysis and comprehension of genomic data based on R (www.bioconductor.org). It contains a wide range of packages focused on genomic analysis.