Story Summary: Article5Article number: 321 Published online: 3 November 2009Edgetic perturbation models of human inherited disordersCenter for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USADepartment of Genetics, Harvard Medical School, Boston, MA, USACentre de Biophysique Moleculaire Numerique, Faculte Universitaire des Sciences Agronomiques de Gembloux, Gembloux, Wallonia, BelgiumReceived 5 August 2009; Accepted 2 October 2009; Published online 3 November 2009This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. This licence does not permit commercial exploitation or the creation of derivative works without specific permission. Global computational analyses of 50 000 known causative mutations in human Mendelian disorders revealed clear separations of mutations probably corresponding to those of node removal versus edgetic perturbations. Edgetic network perturbation models might improve both the understanding of dissemination of disease alleles in human populations and the development of molecular therapeutic strategies. Consequently, genotype-to-phenotype relationships in human genetic disorders are often modeled as: mutation in gene Xleads to loss of gene product X, which leads to disease A. A single gene-loss model seems pertinent for many diseases (Botstein and Risch, 2003). However, this model cannot fully reconcile with the increasingly appreciated prevalence of complex genotype-to-phenotype associations for even simple Mendelian disorders (Goh et al, 2007), particularly in which: (i) a single gene can be associated with multiple disorders (allelic heterogeneity), (ii) a single disorder can be caused by mutations in any one of several genes (locus heterogeneity), (iii) only a subset of individuals carrying a mutation are affected by the disease (incomplete penetrance), or (iv) not all individuals with a given mutation are affected equally (variable expressivity). Genes and gene products function not in isolation but as components of complex networks of macromolecules (DNA, RNA, or proteins) and metabolites linked through biochemical or physical interactions, often represented in interactome network models as nodes and edges, respectively. Cellular networks seem to exhibit systems properties underlying phenotypic variations (Goh et al, 2007). Here we propose network-perturbation models to explain molecular dysfunctions underlying human disease. (A) Schematic illustration of pleiotropic phenotypic outcomes resulting from distinct network perturbations upon complete loss of gene product (node removal, blue box) versus perturbation of specific molecular interactions (edgetic perturbation, red box). Edges are generally biophysical interactions, but could also be biochemical interactions. (B) Schematic illustration of distinct truncating versus in-frame mutations causing distinct molecular defects in proteins leading to distinct node removal versus edgetic perturbation. We differentiated all disease alleles into two subsets probably causing different molecular defects to proteins. The first subset (truncating alleles) comprises all mutations that lead to the synthesis of truncated gene products, including nonsense mutations, out-of-frame insertions or deletions, or defective splicing. Our hypothesis is that truncating and in-frame alleles probably cause distinct molecular defects in proteins, and are thus enriched in distinct node removal or edgetic perturbations, respectively. Although exceptions may apply, our hypothesis predicts that truncating versus in-frame alleles may distribute differently among diseases involving distinct node removal versus edgetic perturbations. (A) Subdivision of truncating versus in-frame mutations in Human Gene Mutation Database (HGMD) (Stenson et al, 2003). (C) Distribution of autosomal recessive and dominant disease with respect to the associated in-frame mutations. Each data point represents the fraction of autosomal recessive (blue bar) or autosomal dominant (red bar) traits that have a fraction of in-frame mutations no less than the value on the x-axis. (D) Average fraction of in-frame mutations associated with autosomal dominant disease in transcription factors and structural proteins. Full figure and legend (329K)Figures & Tables indexGiven that, with the exception of haploinsufficiency, many established molecular explanations for dominance entail production of a mutated protein that interferes in some way with the function of the product of the normal allele, autosomal dominant disease should be more frequently associated with edgetic perturbation than node removal (Figure 2B). Among genes affected solely by in-frame mutations, the proportion of autosomal dominant diseases is 10-fold higher than that of autosomal recessive diseases (Figure 2C). Mutations in cytoskeleton proteins frequently cause dominant-negative effects, in which incorporation of expressed abnormal molecules into multimeric assemblies of structural proteins disrupts the integrity and function of the complex (Wilkie, 1994). Consistent with this distinction, a significantly higher fraction of in-frame mutations was found for autosomal dominant Mendelian disorders associated with structural proteins than with transcription factors (Figure 2D). Distinct global distributions of truncating versus in-frame mutations among diseases with distinct modes of inheritance, and in proteins probably associated with distinct molecular mechanisms of dominance, support our hypothesis that truncating versus in-frame alleles are probably enriched in distinct node removal versus edgetic perturbations, respectively. Our approach includes (i) Gateway recombinational cloning of mutations by PCR-based site-directed mutagenesis (Suzuki et al, 2005), (ii) high-throughput mapping of binary protein-protein interactions (Rual et al, 2005), (iii) high-throughput characterization of protein-protein interaction defects of all cloned disease-causing mutant proteins, and (iv) integration of network perturbations by disease-causing mutations with structural or functional information of disease proteins. We selected disease proteins that have: (i) multiple mutations annotated in HGMD (Stenson et al, 2003), (ii) wild-type clones available in our human ORFeome collection, hORFeome 3. Given these criteria, we could apply our allele-profiling platform to one autosomal recessive disease protein (CBS), and to three autosomal dominant disease proteins with likely dominant-negative (ACTG1), abnormal activation (CDK4), or haploinsufficiency (PRKAR1A) molecular defects (Figure 3A). We included one additional autosomal recessive disease protein (HGD) that meets all criteria except that no protein-protein interaction data were available (Figure 3A). Profiling interaction defects of 29 alleles associated with five distinct genetic disorders revealed three classes of interaction-defective alleles (Supplementary informationand Figure 3B): (i) five alleles that behaved as null, eliminating all interactions, (ii) 16 edgetic alleles that lost specific interaction(s) while retaining other interactions, and (iii) eight alleles that behaved as pseudo-wild-type, retaining all currently available protein-protein interactions tested here. We propose that many disease-causing alleles scoring as pseudo-wild-type in the assay described here might still be true edgetic alleles. Further analysis with additional physical and biochemical interactors using additional assays should eventually settle that question. Grossly disruptive mutations tend to affect buried residues of the protein, whereas mutations leading to loss or gain of specific interaction(s) tend to lie on the surface. Edgetic perturbation of some disease alleles revealed diverse molecular mechanisms of protein dysfunction (Supplementary information). (C) Schematic illustration of distinct positions of truncating mutations with respect to protein domains probably causing node removal versus edgetic perturbation. Fold enrichment higher than one means that Pfam domains contain more mutations than expected at random, whereas enrichment between zero and one means that Pfam domains are depleted in mutations. Full figure and legend (256K)Figures & Tables indexAllele-specific perturbations observed in PRKAR1A (Supplementary Figure S6) indicate that interaction-specific perturbation by truncations is also possible. Although disease-causing truncating mutations seem to exhibit a random distribution with respect to Pfam domains (enrichment: 1. This finding is consistent with the hypothesis that different truncating mutations may cause distinct node removal versus edgetic perturbations giving rise to disease with distinct modes of inheritance. In agreement with distinct molecular mechanisms of dominance (Figure 2B), we found a depletion of autosomal dominant truncating mutations in Pfam domains for structural proteins against an enrichment for transcription factors (Figure 4D), probably associated with dominant-negative effects versus haploinsufficiency, respectively. Node removal versus edgetic perturbation in complex gene-disease associationsThe complex patterns of disease mutations noted so far indicate that a substantial fraction of causative alleles in human genetic disorders may cause edgetic perturbations rather than node removal. Among 278 disease pairs, each associated with a single one of these 142 genes, we found 88 pairs (30%) for which the proportion of in-frame versus truncating mutations is significantly different between the two diseases (P<0. A noteworthy example involves the four types (I, II, III, and IV) of osteogenesis imperfecta (OI) with COL1A1in-frame mutations causing strikingly more severe phenotypes (in type II, III, or IV) than truncating mutations involved in type I (Hamosh et al, 2005; Figure 5B). Each dot represents the fraction of in-frame mutations of a pair of distinct diseases associated with a common gene. This finding further supports our hypothesis that distinct in-frame versus truncating mutations probably cause distinct network perturbations giving rise to disease with distinct modes of inheritance (Figure 2). Edgetic interaction profiles of CBS and PRKAR1A mutant proteins (Figure 3) revealed possible connections between allele-specific interaction defects and differential treatment responses or phenotypic severity among patients (Supplementary information). There were nine proteins with at least two Pfam domains significantly enriched with in-frame mutations (P<0. A compelling example is TP63(van Bokhoven and Brunner, 2002) in which two clinically distinct developmental disorders, ectrodactyly ectodermal dysplasia (EEC) and ankyloblepharon ectodermal dysplasia (AEC), are caused by mutations in two separate domains, one predicted to bind DNA and the other to mediate protein-protein interaction(s) (Figure 6B). With more detailed structural and biochemical information available, more such allele-specific edgetic phenotype-to-genotype correlations should be uncovered. Although the node-centered gene knockout or knockdown approaches are convenient and useful in determining effects of gross disruption of proteins in model organisms, an edge-centered allele-profiling approach, as carried out here and elsewhere (Dreze et al, in press), dissects the dynamics and complexities of biological systems, in which different interactions may occur independently, and in which a single protein may carry out different functions with different partners or in different biological contexts. In addition, edgetic network perturbation models might improve our understanding of why and how disease alleles have disseminated in human populations. First, the current interactome network derived from Y2H analysis is probably incomplete. Many biologically relevant interactors remain to be tested and many may not be recovered by Y2H alone or by any other single protein interaction assay (Braun et al, 2009; Venkatesan et al, 2009). TopMaterials and methodsDatabase annotationThe lists of genes and associated phenotypes were downloaded from HGMD website (Stenson et al, 2003) (June 2006). The corresponding gene IDs were retrieved from Entrez Gene (Maglott et al, 2005) (June 2006). By manual annotation we linked phenotypes associated with each mutation, as annotated in HGMD, to the corresponding disease in the OMIM database (Hamosh et al, 2005). Profiling interaction defects of mutant proteinsDisease mutant clones were generated by PCR mutagenesis essentially as described previously (Suzuki et al, 2005). Forward and reverse internal primers used are listed (Supplementary Table 4). To test against wild-type interactors, the DB-ORF and AD-ORF clones for CBS, HGD, ACTG1, CDK4 and PRKAR1A mutant proteins were transformed into MATMaV203 or MATa MaV103 yeast strains, respectively. Each interaction pair was tested for growth on SC-His+3AT (synthetic medium without leucine, tryptophan and histidine, containing 20 mM 3-amino-1,2,4-triazole) plates to confirm GAL1::HIS3transcriptional activity, on yeast extract-peptone-dextrose (YPD) medium to determine GAL1::lacZtranscriptional activity using a -galactosidase filter assay, and on SC-Ura plates (synthetic medium without leucine, tryptophan and uracil) to determine SPAL10::URA3transcriptional activity. Interactions that lose expression of one reporter but still show expression of the other two reporters are scored as R. For immunoblotting, yeast cells with AD-ORF fusions were cultured overnight at 30degC in synthetic medium without tryptophan and then grown in YPD medium to mid-exponential phase. Whole cell lysates were cleared by centrifugation at 14 000 g. Resulting supernatants were separated on NuPAGE acrylamide gels (Invitrogen) and electrophoretically transferred onto a PVDF membrane (Invitrogen). Removal of redundant structures was achieved using the PISCES server (Wang and Dunbrack, 2005) with the following criteria: X-ray structures only; no structure with Conly; resolution 3 A; R-factor 0. 3; sequence length between 40 and 10 000 amino acids; and maximum 90% of sequence identity between similar PDB structures. The relative accessibility of over 91 000 residues in all 249 structures was calculated using PSAIA (Mihel et al, 2008). Among them, a total of 10 904 truncating mutations are used for the analysis shown in Figure 4D, including 6212 associated with autosomal dominant diseases and 4692 associated with autosomal recessive diseases. Statistics were generated on the sum of a particular mutation type that either fell into or out of any Pfam-A domain in its respective protein versus the total fraction of the Pfam-A domain sequences in the protein sequence. Transcription factors and structural proteinsInformation on genes encoding transcription factors was obtained from Gene Ontology (Harris et al, 2004) annotations (948 genes with the GO term of transcription factor activity) and predictions in the transcription factor database (DNA Binding Domain, DBD; Wilson et al, 2008a; 1467 genes). Structural protein coding genes were retrieved from Gene Ontology annotations of cytoskeleton (992 genes). Among them, 72 genes with at least one mutation in HGMD were used for Pfam analysis (Figure 4D), and 47 genes with five mutations or more were used for analysis of in-frame mutations (Figure 2D). Significance of the observed difference in the distributions of in-frame versus truncating mutations in autosomal dominant and autosomal recessive disease, the greater proportions of in-frame mutations in structural proteins than in transcription factors, as well as the greater accessibility of residues mutated in autosomal dominant versus autosomal recessive diseases, was evaluated using the non-parametric Mann-Whitney Utest. All statistics were computed using the R package (http://www. KV was supported by an NIH NRSA training grant fellowship (T32-CA09361)….Read the Full Story







