DESCRIPTION (provided by applicant):
Epistasis is the interaction between two or more genes to affect phenotype. It is now widely accepted that epistasis plays an important role in susceptibility to many common diseases. The advent of high-throughput technologies has enabled genome-wide association studies (GWAS or GWA studies). It is compelling that we be able to detect epistasis using GWAS data. However, so far GWA studies have mainly focused on the association of a single gene or loci with a disease. The crucial challenge to analyzing epistasis using GWAS data is finding a way to efficiently handle high-dimensional data sets. The only possible solution is to design efficient algorithms that allow us to find the most relevant epistasic relationships without doing an exhaustive investigation. To the Principal Investigator's knowledge, no current method can do this.
This career award will investigate this problem. The specific aims are as follows: (Aim 1) develop and evaluate efficient Bayesian network-based methods for learning candidate genes associated with diseases from GWAS sets. Such genes would provide candidates for follow-up biological studies, (Aim 2) implement the methods in a pilot GWAS system for use by researchers when conducting a GWAS, (Aim 3) develop simulated genome-wide data sets and evaluate the pilot system using these data sets, and (Aim 4) conduct GWA studies concerning breast cancer and lung cancer.
Aim 1 will be addressed by developing a succinct Bayesian network model representing epistasis, efficient algorithms which are tailored to investigating such models, integration of the algorithms into methods for learning epistasis, and using simulated datasets to test the effectiveness of the methods and compare their performance to other methods. Aim 2 will be met by implementing the methods in a pilot GWAS system. Aim 3 will be satisfied by developing synthetic data sets similar to those found in GWA studies, and using them to evaluate the system. Aim 4 will be achieved by conducting GWA studies concerning breast and lung cancer. By conducting these studies, we can (1) substantiate previous results concerning the genetic basis of these diseases; (2) possibly obtain interesting new findings pertaining to these diseases.
The main hypothesis is that the proposed method will be an advance over existing methods in that it will make it computationally feasible to learn epistatic relationships from genome-wide data and it will therefore yield better discovery performance than existing methods.