Mutated Genes, Pathways, Processes and protein Domains in tumours web application.

Methodology

Mutated gene data

Data on mutations and mutated genes were retrieved from 3 databases, 17 whole-genome resequencing studies and resequencing screens focusing on gene subsets (Sources). All the genes with at least one non-synonymous mutation in their coding sequence or at a splice-site were taken into account, and matched to the tumour type(s) in which their mutations have been described (when possible, following the ICD10 topographic neoplasm classification).

Functional annotations

Functional annotations are retrieved from :
Kegg (http://www.genome.jp/kegg/)
Biocarta (http://www.biocarta.com/)
Reactome (http://www.reactome.org/)
Gene Ontology Biological Process (http://www.geneontology.org/)
Interpro Protein Domains database (http://www.ebi.ac.uk/interpro/)

Statistical test

A Fisher's exact test was performed with the R multtest package to find pathways, processes or protein domains in which mutated genes were significantly enriched for, in each tumour type. The Fisher's test measured the significance of the association between the mutated proteins in a given pathway and the mutated proteins in a given tumour type. For each tumour type, only the total set of genes that had been screened for mutations (for large-scale resequencing studies) was taken into account as background. All the tests were adjusted for multiple testing according to the Benjamini & Hochberg FDR-controlling procedure (Benjamini and Hochberg 1995).

Thresholds

We defined 3 threshold Q-values, Q-value<0.001 (very significant), Q-value<0.01 (significant) and Q-value<0.1 (medium) to associate each tumour type with pathways, processes and domains that contained a significant number of genes mutated.