Gene superfamilies

Conotoxins, the disulfide rich conopeptides, are classified according to three schemes: the similarities between the ER signal sequence of the conotoxin precursors (gene superfamilies), the cysteine patterns of conotoxin mature peptide regions (cysteine frameworks), and the specificities to pharmacological targets (pharmacological families). This page provides a brief introduction to the gene superfamilies and a list of the gene superfamilies used in ConoServer. The two other classification schemes are detailed in separate pages accessible from the menu on the left. A more comprehensive discussion of the conopeptide classification schemes can be found in Kaas et al. Toxicon 2010 [1].

Conopeptides are expressed as precursor proteins, which are processed into mature peptide toxins in the endoplasmic reticulum (ER) and in the Golgi apparatus. The classical organisation of a conopeptide precursor is shown in Figure 1. During the maturation process, the ER signal sequence and then the N- and C-terminal pro-regions are cleaved and some amino acids can be post-translationally modified (see amino acid post-translational modifications).

Figure 1: Conopeptide protein precursor organization. The organization is examplified using the sequence of SmIVA precursor (P00021).

The sequence regions of the conopeptide precursors (Figure 1) have been shown to evolve at different rate [2]. The sequence of the mature peptide region is highly diverse, in keeping with the high variety of conopeptides, while the ER signal sequence is more conserved. The comparison of conopeptide ER signal sequences allowed to define several groups, the gene superfamilies, that share higher sequence similarity. Figure 2 shows a clustering analysis of the ER signal sequences in ConoServer together with the identification of the superfamilies. This analysis shows that by using a cut-off of 35% sequence identity, most of the superfamilies are well defined. The only exception is the unique member of the Y-superfamily which shares around 40% identity with some members of the M-superfamily.

Figure 2: Clustering analysis of ER signal sequences found in ConoServer and identification of gene superfamilies. The percentage of identity between ER signal sequences was measured on a global alignment performed using clustalw. The dissimilarity matrix was then submited to the hierarchical clustering algorithm hclust implemented in the statistical program R. The gene superfamilies used in ConoServer are highlighted in different colors. The definitions of the gene superfamilies are provided in Table 1. Clicking on the graphics give access to a high resolution picture (986 Kb) on which the ConoServer protein identifier of each signal sequence is visible. This analysis was carried out with available data in ConoServer on 25/08/2010 and is not automatically updated.

Table 1 provides the definition of the 26 published gene superfamilies that are used in ConoServer. The relationship between the gene superfamilies and the other classification schemes, the cysteine frameworks and the pharmacological families, are complex. Up-to-date statistics on those relationships can be found in the ConoServer statistics pages and in Kaas et al. Toxicon 2010 [1]. Recently the gene superfamily classification was extended to the disulfide poor conopeptides [3], and the gene superfamilies B and C have been introduced in ConoServer.

Table 1: Gene superfamilies with published references used in ConoServer. This table is automatically generated and therefore kept up-to-date with the content of ConoServer. The second column shows the mature peptide cysteine frameworks found in each gene superfamily. The third column gives the number of protein precusors for each gene superfamily (clicking on the number gives access to the list of those protein precursors). Some temporary names have been created in ConoServe to describe conopeptide ER signal sequences whose groups are unpublished names. Those temporary names are described in Table 2.
Gene superfamily Cysteine frameworks # protein precursors Reference
A I, II, IV, VI/VII, XIV, XXII 275 Santos,A.D. et al. (2004) J. Biol. Chem. 279:17596-17606
B1 18 Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
B2 VIII 2 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
B3 XXIV 1 Luo,S. et al. (2013) PLoS ONE 8
C 4 Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
D XX 28 Loughnan,M.L. et al. (2009) Biochemistry 48:3717-3729
E XXII 1 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
F 2 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
G XIII 1 Aguilar,M.B. et al. (2013) Peptides [ahead of print]
H VI/VII 10 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
I1 VI/VII, XI 26 Jimenez,E.C. et al. (2003) J. Neurochem. 85:610-621
I2 XI, XII, XIV 62 Buczek,O. et al. (2005) FEBS J. 272:4178-4188
I3 VI/VII, XI 9 Yuan,D.D. et al. (2009) Peptides 30:861-865
J XIV 30 Imperial,J.S. et al. (2006) Biochemistry 45:8331-8340
K XXIII 4 Ye,M. et al. (2012) J Biol Chem 287:14973-14983
L XIV 14 Peng,C. et al. (2006) Peptides 27:2174-2181
M I, II, III, IV, VI/VII, IX, XIV, XVI 443 Corpuz,G.P. et al. (2005) Biochemistry 44:8176-8186
N XV 4 Dutertre,S. et al. (2013) Mol. Cell Proteomics 12:312-329
O1 I, VI/VII, IX, XII, XIV, XVI 574 McIntosh,J.M. et al. (1995) J. Biol. Chem. 270:16796-16802
O2 VI/VII, XIV, XV 133 Zhangsun et al. (2006) Chem Biol Drug Des. 68:256-265
O3 VI/VII 43 Zhangsun et al. (2006) Chem Biol Drug Des. 68:256-265
P IX, XIV 12 Lirazan,M.B. et al. (2000) Biochemistry 39:1583-1588
S VIII 21 Liu,L. et al. (2008) Toxicon 51:1331-1337
T I, V, X, XVI 234 Walker,C.S. et al. (1999) J. Biol. Chem. 274:30664-30671
V XV 2 Peng,C. et al. (2008) Peptides 29:985-991
Y XVII 1 Yuan,D.D. et al. (2008) Peptides 29:1521-1525

Phylogenetic analyses have classified cone snails into different groups, or clades, according to the homology of their 16S RNA sequence [4]. One clade, named "Early", is highly divergent from the others. In a recent study [5], a number of conopeptide precusors have been sequenced from Conus californicus, a member of the Early clade, and those conopeptides do not correspond to any previously identified superfamilies. Table 2 provides the 'temporary names' that have been introduced in ConoServer to designate those superfamilies. The clustering analysis shown in Figure 2 clearly demonstrates that those new superfamilies are distinct. The names of those superfamilies are only temporary and are likely to be changed in the future when a definitive nomenclature will be published in a peer-reviewed journal.

Table 2: Temporary gene superfamily names introduced in ConoServer to designate superfamilies recently identified in the early divergent clade species. This table is automatically generated and therefore kept up-to-date with the content of ConoServer. The conopeptide precursor sequences associated with those superfamilies have mainly been identified in the study of Biggs et al 2010 [5]. The second column shows the mature peptide cysteine frameworks found in each gene superfamily. The third column gives the number of protein precusors for each gene superfamily (clicking on the number gives access to the list of those protein precursors).
Gene superfamily Cysteine frameworks # protein precursors
Divergent M---L-LTVA VI/VII, IX, XIV 9
Divergent MKFPLLFISL VI/VII 1
Divergent MKLCVVIVLL XIV 3
Divergent MKLLLTLLLG 2
Divergent MKVAVVLLVS XIV 1
Divergent MRCLSIFVLL XVI 2
Divergent MRFLHFLIVA VI/VII 1
Divergent MRFYIGLMAA I, V 3
Divergent MSKLVILAVL IX 1
Divergent MSTLGMTLL- IX, XIX, XXII 6
Divergent MTAKATLLVL XIV 1
Divergent MTFLLLLVSV IX 1
Divergent MTLTFLLVVA VI/VII 1

[1]Kaas,Q. et al. (2010) Toxicon 55:1491-1509
[2]Woodward,S.R. et al. (1990) EMBO J. 9:1015-1020
[3]Puillandre,N. et al. (2012) J. Mol. Evol. 74:297-309
[4]Espiritu,D.J. et al. (2001) Toxicon 39:1899-1916
[5]Biggs,J.S. et al. (2010) Mol. Phylogenet. Evol. 56:1-12