Over the past decades, gene expression microarrays have been used extensively in biomedical research. However, these high-throughput experiments are affected by technical variation and biases introduced at different levels, such as mRNA processing, labeling, hybridization, scanning and/or imaging. Therefore, data preprocessing is important to minimize these systematic errors in order to identify actual biological changes. The aim of this study was to compare all possible combinations of two normalization, four summarization, and two background correction options, using two different foreground estimates. The results shows that the background correction of the raw median signal and summarization methods used here have no impact in downstream analysis. In contrast, the choice of the normalization method influences the results; the quantile normalization leading to a better biological sensitivity of the data. When Agilent processed signal was considered, regardless of the summarization and normalization options, there were consistently identified more differentially expressed genes (DEG) than when raw median signal was used. Nevertheless, the greater number of DEG didn’t result in an improvement of the biological relevance.


Microarray, Gene expression, Preprocessing, Agilent