Abstract
The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. In this thesis we show that using such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task.
The study begins by discussing the basic principles underlying the definition of gene function and the use of large microarray collections for GBA based gene function analyses. We look at the effects of condition specific gene expression on GBA analyses and provide a mathematical and biological perspective. We show that using large microarray collections to calculate correlation can mask the effectiveness of the GBA principle. We suggest that using only those experiments that are relevant to the biological function under analysis can significantly improve GBA based gene functional analyses.
We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. The algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for Yeast and Arabidopsis. We demonstrate that: (i) using the selected experiments there is a statistically significant improvement both in correlation between genes in the functional category of interest and in GBA based function predictions; (ii) the effectiveness of the selected experiments increases with annotation specificity; (iii) our algorithm can be successfully applied to GBA based pathway reconstruction.
We conclude by discussing the potential applications of our technique. We outline several developments that could be implemented in the future to improve the efficiency of the experiment selection procedure.
The study begins by discussing the basic principles underlying the definition of gene function and the use of large microarray collections for GBA based gene function analyses. We look at the effects of condition specific gene expression on GBA analyses and provide a mathematical and biological perspective. We show that using large microarray collections to calculate correlation can mask the effectiveness of the GBA principle. We suggest that using only those experiments that are relevant to the biological function under analysis can significantly improve GBA based gene functional analyses.
We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. The algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for Yeast and Arabidopsis. We demonstrate that: (i) using the selected experiments there is a statistically significant improvement both in correlation between genes in the functional category of interest and in GBA based function predictions; (ii) the effectiveness of the selected experiments increases with annotation specificity; (iii) our algorithm can be successfully applied to GBA based pathway reconstruction.
We conclude by discussing the potential applications of our technique. We outline several developments that could be implemented in the future to improve the efficiency of the experiment selection procedure.
Original language | English |
---|---|
Qualification | Ph.D. |
Awarding Institution |
|
Award date | 1 Mar 2012 |
Publication status | Unpublished - 2012 |