About HEFalMp

HEFalMp (Human Experimental/FunctionAL MaPper) is a tool developed by Curtis Huttenhower in Olga Troyanskaya's lab at Princeton University. It was created to allow interactive exploration of functional maps as described in our paper:

This work was extensively supported by Hilary Coller's lab as well, particularly by Erin Haley, who spearheaded all of the project's experimental work.

Functional maps?

A functional map is a focused analysis of functional relationship networks predicted from integration of many genome-scale experimental results. For example, HEFalMp contains information from roughly 15,000 microarray conditions, over 15,000 publications on genetic and physical protein interactions, and several types of DNA and protein sequence analyses. Clearly, this is too much data to visualize all at once! But information on your favorite gene could be contained in any or all of those experimental results, and good computational methods should be able to integrate the data collection and mine out the portions of interest to an investigator.

The first step of functional mapping is thus data integration, the process of predicting high-level functional relationships between genes/proteins based on many individual experimental results. A functional relationship can be a direct interaction (e.g. protein binding or transcriptional regulation), but it can also be a more general relationship (e.g. two proteins participating in the same pathway or regulating the same cellular processes). HEFalMp assigns each pair of genes in the human genome a probability of functional relationship, based on probabilistic integration of its experimental data collection. These probabilities can vary from process to process - two proteins might be related in carrying out the cell cycle, for example, but also perform other unrelated metabolic or structural functions. HEFalMp thus includes over 200 process-specific functional relationship networks, including a global, process-independent network capturing the most general functional relationships.

These functional relationship networks can be mapped by investigating their contents for interesting associations among genes, pathways, or diseases. A functional association is the equivalent of a functional relationship for groups of genes. For example, if two genes perform similar tasks, they are functionally related. If two cellular pathways or processes share many regulators, have significant molecular cross-talk, or carry out similar biological tasks, they are functionally associated. Functional associations between known pathways, groups of disease-associated genes, or any gene set a user is interested in can be automatically discovered by using functional mapping techniques to mine relationship networks.

For example, consider a functional relationship network in which each edge represents two genes' relatedness based on their similarity in hundreds of datasets. This network might be specifically tailored to highlight relationships in the cell cycle. Functional mapping can reveal new genes predicted to be associated with cell cycle pathways, thus assisting in function assignment for uncharacterized genes. Or by looking for genes functionally associated with breast cancer, new causal or drug target genes might be identified. Or an experimenter investigating a particular cell cycle pathway might map his or her specific gene set of interest to find associations with other known pathways or diseases, or to discover new genes predicted to participate in the pathway. Functional mapping provides a way to perform all of these exploratory tasks efficiently while taking advantage of the thousands of publicly available genomic experimental results.

Data

HEFalMp contains information from over 30,000 experiments, collected into ~650 datasets of related experimental conditions; these cover the ~25,000 genes and ~300 million potential interactions of the human genome. Roughly 15,000 of these are microarrays, 15,000 are individual binding or interaction assays, and several more are DNA or protein sequence comparisons. The details are provided in our supplemental information; in summary:

Data points Datasets Publications Experimental conditions
Interactions (physical and genetic) 11,244,053 14 >15,000 >15,000
Sequence comparisons (nucleotide and protein) 452,199,430 7 6 NA
Microarrays 27,248,177,875 635 417 14,671
All data 27,711,621,358 656 >15,500 ~30,000

Biological processes

The 229 biological processes used for HEFalMp's process-specific analyses were chosen from the Gene Ontology using a process similar to that described in the GRIFn and MEFIT systems. Briefly, a panel of six biologists were asked whether, for each of ~10,000 GO terms, an annotation there would be sufficiently informative to direct experiments probing a gene's function. This excludes terms too general to be useful in a laboratory or function prediction setting. Any term with at least four votes was deemed informative, and all descendants of these terms in the GO hierarchy were also included as informative. Finally, the "upper fringe" of these informative terms was extracted by including any term with an uninformative parent. Restricting the resulting term set to those with at least 10 human gene annotations provided the 229 processes employed by HEFalMp. It should be noted that while these GO terms formed the basis for our process-specific analyses, our gold standard for functional relationships included additional information from many other catalogs (KEGG, Reactome, HPRD, etc.) For details on the gold standard, see our Methods; for a list of the 229 analyzed processes, see our supplemental information.

Disease associated genes

The 147 genetic disorders included in HEFalMp were extracted from the OMIM database by including all disorders with at least five associated genes. For the resulting list of OMIM IDs, diseases, and associated genes, see this file.

Tools

We'd like to extend our thanks to several groups, software packages, projects, and data repositories that have helped to make HEFalMp possible:

Contact

Please feel free to contact HEFalMp's primary author (Curtis Huttenhower), primary experimentalist (Erin Haley), or principle investigators Hilary Coller and Olga Troyanskaya directly, or visit the Coller or Troyanskaya lab web pages.

HEFalMp?

Shepard_heffalump

For the project's name to be a little clearer, one has to take into account two potentially (probably) non-obvious facts: A) it's pronounced "heff-ah-lump", and B) our lab has a long history of magical and mythical project names. Before it was horrifically commercialized by Disney, the heffalump was a figment of Winnie the Pooh's imagination, not unlike a snark or snuffleupagus (the latter again before his premise was ruined by corporate interests). After consideration of dozens of imaginary and mythical beasts whose names could potentially be used with our functional mapping project, we settled on the rather embarassing acronymization "HEFalMp" for the mundane reason that it offered the necessary H (human), F (functional), and M (map) letters in the correct order. The rest is history, as is the original E. H. Shepard illustration of a heffalump from the canonical A. A. Milne Pooh books from 1926 shown to the left.