DESCRIPTION (provided by applicant): High throughput experimental methods have accelerated biomedical research dramatically. Approaches such as microarray analysis, genome-wide association studies (GWAS), deep sequencing and brain imaging reduce bottlenecks in data generation and collection. Understanding the biological significance of high throughput data, however, is a major challenge 1. As pointed out by Bota and Swanson, it is now "far beyond the grasp of individual investigators, no matter how brilliant, to remember, evaluate, and synthesize the neuroscience literature, even in restricted domains like network structure, physiology, or chemistry" 2. We argue that a key part of the problem is insufficient support for drawing high dimensional functional relationships based on high throughput experimental data in the context of existing literature and data. Prevailing search solutions, such as PubMed/Google Scholar, are mainly designed for retrieving the most relevant information efficiently but not for explorative hypothesis development. These solutions lack several key functionalities that our proposed system will provide, functionalities required for understanding the biology of high throughput data through literature and database explorations that aim at hypothesis development:
Overview of Medline search results in familiar biological contexts to facilitate exploration:
Presenting the search results in graphic overviews reflecting inherent biological relationships of the retrieved records will be more effective than a linear list of potentially relevant records alone. Such overviews, ideally from multiple biological contexts, should also support efficient interactive exploration of attribute data and pattern associations for deriving non-obvious relationships from multiple perspectives.
Query support for different algorithms, biological entities and data sources: One retrieval algorithm will not fit all situations. Biological entities such as gene IDs and genomic locations need to be supported for Medline queries. The Medline database needs to be supplemented by external data sources such as ontology, pathway, and various databases containing curated information derived from experimental data.
Open architecture for third party plug-ins and cross-application function integration: The support of third party data and function plug-ins are needed to enhance the functionality and the adaptation of a solution. Open architecture will enable the use of intermediate data and/or functions from other solutions.
Incorporating these functions, we propose to develop a system called PubViz that will more effectively support neurobiologists' needs for developing hypotheses on molecular mechanisms underlying major mental disorders through integrated exploration of literature and data related to high throughput experimental results. We will also conduct systematic needs assessments and user tests to ensure that functions we develop match users' needs effectively. Building on our existing component function prototypes, PubViz will provide a query and analysis environment that exceeds other systems in helping scientists work toward formulating hypotheses. It will integrate Medline search results with data and information from external resources and situate relationships visually and interactively in multiple biological contexts that are useful and usable. Creating these combined innovations and human-computer interface (HCI) designs is non-trivial but is feasible given our pilot work and experience in visual Medline exploration solution development, data analysis and integration and usability and usefulness studies. Additionally, focusing this project on neurobiology and mental disorders, a research domain in which we have extensive experience will help us address critical user needs and functionalities more effectively. Moreover, the solution we develop should be adaptable to other biomedical research domains.