Volume 121, Issue 4 , Pages 847-852.e7, April 2008
Allergens are distributed into few protein families and possess a restricted number of biochemical functions
Article Outline
- Abstract
- Methods
- Results
- Protein family distribution of allergens
- Distribution of protein families among allergens of different sources and routes of exposure
- Protein family distribution of randomly selected sequences
- Structural classification of allergens
- Functional classification of allergens
- Sequence conservation within families of allergens
- Discussion
- Table E1.
- Table E2.
- Table E3.
- References
- Copyright
Background
Existing allergen databases classify their entries by source and route of exposure, thus lacking an evolutionary, structural, and functional classification of allergens.
Objective
We sought to build AllFam, a database of allergen families, and use it to extract common structural and functional properties of allergens.
Methods
Allergen data from the Allergome database and protein family definitions from the Pfam database were merged into AllFam, a database that is freely accessible on the Internet at http://www.meduniwien.ac.at/allergens/allfam/. A structural classification of allergens was established by matching Pfam families with families from the Structural Classification of Proteins database. Biochemical functions of allergens were extracted from the Gene Ontology Annotation database.
Results
Seven hundred seven allergens were classified by sequence into 134 AllFam families containing 184 Pfam domains (2% of 9318 Pfam families). A random set of 707 sequences with the same taxonomic distribution contained a significantly higher number of different Pfam domains (479 ± 17). Classifying allergens by structure revealed that 5% of 3012 Structural Classification of Proteins families contained allergens. The biochemical functions of allergens most frequently found were limited to hydrolysis of proteins, polysaccharides, and lipids; binding of metal ions and lipids; storage; and cytoskeleton association.
Conclusion
The small number of protein families that contain allergens and the narrow functional distribution of most allergens confirm the existence of yet unknown factors that render proteins allergenic.
Key words: Allergens, protein families, allergen structures, allergen databases
Abbreviations used: GO, Gene ontology, nsLTP, Nonspecific lipid transfer protein, SCOP, Structural Classification of Proteins, TIM, Triosephosphate isomerase, UniProt, Universal Protein Resource
Since the identification and cloning of the first allergenic proteins in the late 1980s, hundreds of allergens have been identified and their sequences determined. A number of databases that provide molecular, biochemical, and clinical data of allergens were established, such as the Official List of Allergens issued by the International Union of Immunological Societies Allergen Nomenclature Sub-committee (http://www.allergen.org),1 the Allergome (http://www.allergome.org),2 the Food Allergy Research and Resource Program Allergen Database (http://www.allergenonline.com), and the InFormAll database (http://foodallergens.ifr.ac.uk/).
The growing number of available allergen sequences together with the advancements of bioinformatics tools and methods enabled scientists to shed light on evolutionary and structural relationships between allergens from different sources.3 In particular, protein family databases that are linked to protein sequence databases, such as the Pfam database,4 provided the basis of a novel classification of allergens. Several studies revealed that most allergens can be found in a limited number of protein families.5, 6, 7, 8, 9, 10
Records of most allergen databases are organized by type of allergen source and route of exposure. Likewise, allergen designations according to the official allergen nomenclature are derived from the scientific name of the allergen source species and a sequential number that in most cases does not reflect evolutionary relationships between allergens. To bring together allergen data stored in allergen databases and evolutionary and structural relationships between allergens established from protein family databases, we constructed AllFam, a database of allergen families. In the present study we used data extracted from AllFam to establish the protein family distribution of allergens and to elucidate common structural and biochemical features of allergens, thus shifting the focus from single allergens or allergen families to a systematic analysis of the complete range of known allergens.
Methods
Construction of the AllFam database
Data of allergens with known sequences (name, source, routes of exposure, and Universal Protein Resource [UniProt] accession numbers) were downloaded from Allergome,2 a database based on allergen data published in peer-reviewed journals. Data on routes of exposure from Allergome were merged into the following standardized categories: inhalation, ingestion, sting/bite, contact, iatrogenic, and autoallergen.
UniProt accession numbers were compared with SwissPfam, a database of precomputed protein domain architectures generated by comparing all entries of the UniProt protein sequence database with the Pfam database (version 22.0; July 2007; http://pfam.sanger.ac.uk).4 For entries that yielded no results, sequences were downloaded and compared with Pfam protein family definitions by using the hmmpfam program from the HMMER 2.3 package (http://hmmer.wustl.edu). This hmmpfam program compares a query sequence with all Pfam protein family definitions, which are stored as hidden Markov models, probabilistic descriptions that are generated from multiple sequence alignments and yield the probabilities of occurrence of all amino acids, as well as of insertions and deletions for each alignment position.
Domain architectures of allergens were translated into AllFam allergen families by using the following criteria. For single-domain proteins, each Pfam family corresponded to an AllFam family. To avoid an artificially high number of allergen families because of counting domains of multidomain proteins as separate families, Pfam domains constituting multidomain proteins were merged into single AllFam families if the constituting domains exclusively occurred in members of a single protein family. Otherwise, each domain was treated as a separate AllFam family.
The AllFam database is freely accessible at http://www.meduniwien.ac.at/allergens/allfam/. It can be queried for lists of allergen families filtered by source and route of exposure. In addition, for each family, the database contains a list of allergens and an allergen family fact sheet with information on biochemical properties and the allergologic significance of its allergenic members. AllFam is cross-linked with the Allergome database and regularly updated.
Protein family distribution of a random set of sequences
Random entries were downloaded from the UniProt database (http://www.expasy.org/uniprot/) and parsed for taxonomic group and Pfam domains. The procedure was repeated until the number of sequences that contained Pfam annotations from plants, animals, fungi, and bacteria reached the numbers of allergens with known protein family memberships in these kingdoms. The number of different Pfam domains found in these sequences was counted. Twenty independent runs of the program were performed. Significance of the difference between the number of protein families found among allergens and among random sequences was tested by using the 1-sample t test.
Structural and functional classification of allergens
Structures of allergens and allergen homologues were classified by using the Structural Classification of Proteins (SCOP) database (release 1.71, October 2006; http://scop.mrc-lmb.cam.ac.uk/scop/).11 AllFam families and SCOP families were matched by using the links to SCOP embedded in the Pfam database.
For a functional classification of allergens using standardized descriptions of biologic functions, all UniProt accession numbers of allergen sequences in AllFam were compared with the Gene Ontology (GO) Annotation Database (http://www.ebi.ac.uk/GOA/).
Sequence conservation within families of allergens
Sequences of representative allergens from the 4 most important families of allergens (prolamins, profilins, tropomyosins, and the EF-hand family) were aligned by using ClustalX 1.83.12 Sequence identity matrices and neighbor-joining phylogenetic trees were generated from these alignments with ClustalX and visualized with TreeView 1.6.6.13
Results
Protein family distribution of allergens
The AllFam database (version of July 18, 2007) contained 847 allergens with known partial or total sequences (Table I). Of these, 707 allergens were classified into 134 AllFam families that contained 184 different Pfam domains. Thus allergens were found in only 2% of all 9318 families in the Pfam database. The list of AllFam families and associated Pfam families can be found in Table E1 in the Online Repository (available at www.jacionline.org). The distribution of allergens was highly biased toward a few protein families. Although the protein family with the highest number of allergens, the prolamin superfamily, contained 59 allergens (8% of all allergens with known protein family) and the 10 most abundant families contained 300 allergens (42%; Fig 1, A and B), there were 53 families that contained only a single allergen.
Table I. Numbers of sequences and protein families of allergens in AllFam
| Sequences | Sequences from known protein families | AllFam families | AllFam families with >1 allergen | |
|---|---|---|---|---|
| All allergens | 847 | 707 | 134 | 81 |
| Sources | ||||
| 369 | 338 | 58 | 34 | |
| 305 | 268 | 60 | 36 | |
| 163 | 91 | 37 | 16 | |
| 10 | 10 | 5 | 1 | |
| Routes of exposure | ||||
| 479 | 377 | 99 | 59 | |
| 257 | 240 | 48 | 29 | |
| 66 | 52 | 14 | 7 | |
| 58 | 50 | 35 | 10 | |
| 14 | 14 | 14 | 0 | |
| 11 | 10 | 7 | 2 |

Fig 1.
The 15 protein families with the highest number of allergens classified by source (A) and route of exposure (B). Numbers in Fig 1, B, differ because of multiple or missing routes of exposure for some allergens. C, Protein family distribution of randomly selected sequences. ox., Oxidase; oxred., oxidoreductase; PD, periplasmic domain; RuBisCo. ribulose-1,5-bisphosphate carboxylase/oxidase; TD, transmembrane domain; term., terminal; NADH, nicotinamide adenine dinucleotide.
Thirty-eight allergen families were grouped by structural similarity or common sequence motifs into 12 superfamilies (termed clans in the Pfam database; see Table E2 in the Online Repository at www.jacionline.org). The most important clan, which comprised 7 allergen families, was the triosephosphate isomerase (TIM) barrel glycosyl hydrolase superfamily that contained main allergen families from mites (chitinases from the glycoside hydrolase family 18), plants and fungi (α-amylases, β1,3-glucanases), and insect venoms (hyaluronidases).
Distribution of protein families among allergens of different sources and routes of exposure
Fig 1, A and B, shows the 15 most important families of allergens itemized by source and route of exposure. Most allergen families were confined to a single source kingdom, such as prolamins, profilins, and cupins from plants and tropomyosins, lipocalins, and caseins from animals. A minority of protein families, such as the EF-hand family and the pathogenesis-related proteins (PR-1), contained allergens from multiple kingdoms. A grouping of allergens by route of exposure yielded a different picture. Most protein families contained allergens that sensitize human subjects through different routes. Among these are allergens responsible for cross-reactivity between inhalative allergen sources and foods, such as profilins, Bet v 1–related allergens, and tropomyosins.
Protein family distribution of randomly selected sequences
A comparison of the protein family distribution of allergens with the distribution of random UniProt entries confirmed that the number of protein families among allergens was much smaller than expected from a random sample. A random selection of 707 sequences with the same proportions of plant, animal, fungal, and bacterial sequences as among allergens contained an average of 479 different Pfam domains (SD, 17), a number that was significantly higher (P < .001) than the 184 different Pfam domains among allergens.
The most abundant protein families found among these random sequences differed largely from those determined for allergens. A representative result is shown in Fig 1, C. Although the protein family distributions of different sequence sets showed considerable differences concerning minor protein families, the 15 most abundant protein families were largely identical for all runs. The only allergen-containing protein family in the top 15 was the protein kinase family (sixth rank; Fig 1, C), which contained a single allergen.
Structural classification of allergens
A structural classification of allergens whose 3-dimensional structures have been experimentally determined or inferred from sequence similarity showed a restricted distribution similar to the distribution of allergens into sequence-based Pfam families (Table II). Allergens were found in all structural classes, as defined by SCOP. However, all members of protein families that contained allergens could be grouped into only 138 structural families (5% of all families in the SCOP database).
Table II. Structural classes of all protein families that contain allergens
| All structures in SCOP | Structures of allergens and allergen homologs∗ | |||||
|---|---|---|---|---|---|---|
| SCOP class | Folds | Superfamilies | Families | Folds | Superfamilies | Families |
| a: All α proteins | 226 | 392 | 645 | 19 (8%) | 20 (5%) | 25 (4%) |
| b: All β proteins | 149 | 300 | 549 | 22 (15%) | 24 (8%) | 36 (7%) |
| c: α and β proteins (a/b) | 134 | 221 | 661 | 14 (10%) | 18 (8%) | 29 (4%) |
| d: α and β proteins (a+b) | 286 | 424 | 753 | 28 (10%) | 29 (7%) | 31 (4%) |
| e: Multidomain proteins | 48 | 48 | 64 | 2 (4%) | 2 (4%) | 2 (3%) |
| f: Membrane and cell-surface proteins | 49 | 90 | 101 | 2 (4%) | 2 (2%) | 2 (2%) |
| g: Small proteins | 79 | 114 | 186 | 8 (10%) | 9 (8%) | 9 (5%) |
| h: Coiled coil proteins | 7 | 50 | 53 | 2 (29%) | 4 (8%) | 4 (8%) |
| Totals | 978 | 1639 | 3012 | 97 (10%) | 108 (7%) | 138 (5%) |
∗Percentage values are given relative to all folds, superfamilies, and families in SCOP, respectively. |
A comparison of the numbers of structural families, superfamilies, and folds that contain allergens showed that structural allergen families did not cluster in certain folds. All 3012 families in the SCOP database were grouped into 1639 superfamilies and 978 folds, whereas the 138 structural families that contained allergens were grouped into 108 superfamilies and 97 folds (Table II). Twenty-one folds contained more than 1 allergen family (see Table E3 in the Online Repository at www.jacionline.org). The folds that contained the greatest numbers of allergen families were the TIM β/α-barrel fold (SCOP accession no. c.1), with 9 allergen families, and the concanavalin-like lectins/glucanases fold (SCOP accession no. b.29), with 7 allergen families.
Functional classification of allergens
The standardized, hierarchically organized terms of the GO database were used to determine the biologic functions most frequently found among allergens (Table III). Of the 847 allergens listed in AllFam, 644 contained GO annotations distributed among 351 different GO terms.
Table III. The 15 GO terms associated with the highest number of allergens in AllFam
| GO term | Allergens |
|---|---|
| Molecular function | |
| 119 | |
| 56 | |
| 73 | |
| 56 | |
| 55 | |
| 53 | |
| 48 | |
| Biologic process | |
| 105 | |
| 45 | |
| 58 | |
| 45 | |
| 45 | |
| Cellular component | |
| 109 | |
| 76 | |
| 45 |
One sixth of all allergens in AllFam (119 allergens) were inferred to possess hydrolase activity. Half of them (58 allergens) were proteases, such as trypsin-like and subtilisin-like serine proteases (14 and 13 allergens, respectively) and papain-like cysteine proteases (10 allergens). Other hydrolytic enzymes included polygalacturonases (8 allergens), lipases (8 allergens), and ribosome-inactivating proteins (8 allergens).
Many allergens bound metal ions. These included calcium-binding allergens from the EF-hand family (32 allergens), serum albumins (12 allergens), globins (9 allergens), enolases (9 allergens), and Fe/Mn superoxide dismutases (7 allergens). Allergens with lipid-binding activity comprised nonspecific lipid transfer proteins (nsLTPs) from the prolamin superfamily (28 allergens), serum albumins (12 allergens), and lipocalins (9 allergens). Although not annotated in the GO database, lipid-binding activity was shown for allergens from several other families, such as Bet v 1–related allergens that bind plant steroids.14
The nonmetabolic biologic process associated with the greatest number of allergens was transport. This group of allergens comprised lipid-binding proteins, such as the nsLTPs (28 allergens) and lipocalins (21 allergens), as well as general carrier proteins, such as serum albumins (12 allergens) and caseins (12 allergens). Many allergens from the cupin and prolamin superfamilies (26 and 22 allergens, respectively) were annotated as nutrient reservoirs.
The GO terms from the category “cellular component” most frequently found in allergen sequence annotations were the general terms “extracellular region” and “cytoplasm.” In addition, 45 allergens (44 profilins and a single tropomyosin) were described as associated with the cytoskeleton.
Of the 203 allergens without GO annotations, 112 were not assigned to a protein family, in most cases because their sequences were too short. The remaining 91 sequences were grouped into 17 AllFam families, with tropomyosins (34 allergens), group 2 mite allergens (10 allergens), thaumatin-like proteins (9 allergens), Ole e 1–related proteins (9 allergens), and pectate lyases (9 allergens) as the prevailing families.
Sequence conservation within families of allergens
Fig 2 shows phylogenetic trees representing the degree of sequence conservation within 3 of the 4 most important protein families of allergens. The extent of sequence conservation among members of these families showed considerable differences. nsLTPs from the prolamin superfamily (Fig 2, A) showed a moderate degree of sequence identity between allergens from different plant families (25% to 67%) and considerable sequence conservation only among proteins from botanically related species (at least 69% sequence identity among nsLTPs from Rosaceae fruits). In contrast, sequence identities between 2S albumin allergens from different plant families were generally low (18% to 39%; Fig 2, A). 2S albumins shared only 7% to 25% of their sequences with nsLTPs.

Fig 2.
Sequence conservation among homologous allergens. Amino acid sequences of allergens from the prolamin superfamily (A), the tropomyosin family (B), and the EF-hand superfamily (C) were aligned, and neighbor-joining phylogenetic trees were generated. Percentage sequence identities to reference allergens (bold) are encoded by gray shades.
Sequences of allergens from the tropomyosin family (Fig 2, B) were well conserved, even beyond phylum boundaries, with identities of at least 50%. Members of the plant profilin family showed even higher sequence identities of greater than 70% (data not shown).
The EF-hand superfamily contained 2 important families of allergens, β-parvalbumins (major fish allergens) and polcalcins (ubiquitous pollen allergens; Fig 2, C). β-Parvalbumins from fish were well conserved, with at least 53% sequence identity between homologues from unrelated fish species. Polcalcins from different plant families showed sequence identities of at least 67%. They were related to a group of pollen allergens with 4 instead of 2 EF-hand domains that showed 24% to 57% sequence identity with polcalcins (Fig 2, C). Polcalcins and parvalbumins showed only low degrees of sequence identity between 10% and 27%.
Discussion
The identification of a large number of allergens from diverse sources has triggered the search for common properties of allergens. The discovery of such features would be a step toward the prediction of allergenicity from protein sequence, structure, or function, a procedure that is essential for risk assessment of novel foods. Knowledge of features that make proteins allergenic would also shed light on the mechanism of the initiation of an allergic immune response, thus paving the way for novel therapeutic concepts.
There is an ongoing discussion on whether common properties of allergens exist. One view claims that any protein that comes into contact with the immune system of an atopic individual in sufficient amounts and in the appropriate context can elicit an allergic immune response.15 Two results established in our study, however, support the view that allergens possess special features and not every protein can become allergenic: (1) the small number of protein families in which allergens were found and (2) the frequent occurrence of certain biochemical functions among allergens.
Allergens were found in only 2% of all sequence-based and 5% of all structural protein families (Table I, Table II). A restricted distribution of allergens was previously found for food allergens from plants8 and animals10 and for pollen allergens9 and is extended in this study to all allergens, irrespective of their source and route of exposure. The number of Pfam families that contain allergens was significantly smaller than the number of protein families found in a random sample of sequences with the same size and taxonomic distribution as the set of allergen sequences. In addition, the protein family distribution of allergens was highly different from the distribution of randomly selected sequences (Fig 1). Similar differences have been observed when comparing the protein family distributions of plant food and pollen allergens with the proteomes of Arabidopsis species and rice8 and with all seed plant proteins in the UniProt database.9
Biochemical functions of allergens showed a bias toward certain classes, such as hydrolysis of proteins, polysaccharides, and lipids; binding of metal ions and lipids; transport; storage; and cytoskeleton association (Table III). About one fourth of the allergen sequences contained no GO annotation. However, most of these sequences were either too short to allow a protein family assignment or they belonged to protein families that can be assigned biochemical functions related to the functions of annotated allergens: tropomyosins are, like profilins, actin-binding proteins16; group 2 mite allergens are thought to bind lipids17; and pectate lyases18 and some thaumatin-like proteins with β1,3-glucanase activity19 are, like many other allergens, involved in the degradation of polysaccharides.
A possible connection between biochemical function and allergenicity is best understood in the case of proteases. The major house dust mite allergen Der p 1, a cysteine protease, was shown to cleave the tight junction protein occludin, thus increasing epithelial permeability and facilitating its entry into the tissue.20 Furthermore, several studies demonstrated that Der p 1 acts directly on cells of the human immune system by cleaving cell-surface proteins, such as CD23, CD25, CD40, and dendritic cell–specific intercellular adhesion molecule–grabbing nonintegrin.21 The link between other biochemical functions and allergenicity is less clear. Interestingly, many families of allergens are involved in defense against pathogens and predators, such as several groups of plant pathogenesis–related proteins,6 cereal bifunctional inhibitors,6 and enzymes from insect venoms.22
The assumption that allergens do not possess special features that render them allergenic was based on the observation that allergens fold into highly diverse structures and no “allergenic” folds could be detected.23 With the much larger number of structures of allergens and allergen homologues available today, we showed that the structural repertoire of allergens was restricted to only 5% of all structural families, but most of these families were grouped into different superfamilies and folds (Table II). Thus most folds and superfamilies contained either no or only a single allergen family. These data argue against the hypothesis that it is a single structure that makes a protein allergenic. In contrast, common structural features have been established for food allergens that sensitize through the gastrointestinal tract.24 These features, such as high numbers of disulfide bonds, repetitive structures, binding of lipid or metal ions, and formation of stable oligomers, confer stability toward heat, acid, and proteolysis. However, these general features cannot be traced back to certain folds, making this observation compatible with the lack of specific allergenic folds.
The distribution of allergens into protein families and functional classes presented in this study does not represent a definitive data set because probably many new allergen families still remain to be discovered. The number of different Pfam domains listed in AllFam grew from 179 to 184 between February and June 2007. However, all newly added families contained minor allergens (data not shown). Thus the assumption that most major allergens of all important sources are already identified is justified, as exemplified by grass pollen allergens, in which a combination of only 5 allergens is sufficient to detect nearly all sera that show IgE binding to a total grass pollen extract.25 Another bias is introduced by the fact that new members of families of cross-reactive allergens, such as profilins and tropomyosins, are easily identified, which explains the high rank of these families in the AllFam family list. To circumvent this bias, a database could be used that exclusively contains true-sensitizing allergens. Such a database does not exist and will be difficult to establish, because for many allergens, their sensitizing potency is still unknown. Apart from these concerns, changing the rank of some allergen families by removing nonsensitizing cross-reactive allergens will not derogate the main conclusion of this study (ie, the narrow distribution of allergens with respect to protein family membership and biochemical function).
The AllFam database can be used to test hypotheses on factors that determine allergenicity by comparing allergen-containing protein families with respect to the features in question. For instance, it was proposed that allergens are proteins that lack bacterial homologues.26 This hypothesis was based on database similarity searches using a sample of only 30 allergen sequences from an even smaller number of protein families. In contrast, an overview of the most important protein families of all allergens (Fig 1) shows that members of many of these families are found in bacteria, such as EF-hand proteins (fourth rank), cupins (fifth rank), PR-1 proteins (ninth rank), subtilisin-like serine proteases (tenth rank), and trypsin-like serine proteases (eleventh rank).
Sequence comparison of allergenic members of the 3 most important protein families of allergens showed a wide range of the degree of sequence conservation (Fig 2). Allergenic tropomyosins from invertebrates and profilins from higher plants show sequence identities between homologues from unrelated species of more than 50%. This sequence conservation is reflected by the high extent of IgE cross-reactivity observed within these families.16, 27 On the other end of the spectrum are the 2S albumins, important food allergens from legumes, nuts, and other seeds. 2S albumins from different plant families show sequence identities of less than 40%. Cross-reactivity was thought to be low or even absent between 2S albumins from different plant orders. Recently, considerable cross-reactivity between Ara h 2 from peanut and yet unidentified allergens from almond and Brazil nut was demonstrated.28 Furthermore, high sequence similarity between linear IgE epitopes, despite low global sequence similarity of 2S albumins from cashew and walnut, was shown.29 In a previous analysis of cross-reactivity and sequence similarity among homologous pollen allergens, we proposed sequence similarity as a suitable parameter for assessing potential IgE cross-reactivity.9 The situation seems to be different for food allergens, which come into contact with the human immune system after partial denaturation in the digestive tract, leading to significant IgE binding to linear epitopes. Thus global sequence similarity seems not to be suitable to predict cross-reactivity among these allergens.
In summary, we introduce here the AllFam database, an Internet resource for classifying allergens into protein families. Analysis of allergen families confirmed that allergens are distributed among a small number of protein families and possess a limited range of biologic functions. The answer to the question of what makes a protein an allergen will require additional both in silico and wet lab research, such as extended comparisons of whole proteomes and allergomes of important allergen sources with respect to expression levels, stability, biochemical functions, and protein family memberships.
The classification of allergens supports the elucidation of factors that make proteins allergenic, thus possibly paving the way for novel therapeutic concepts.
Table E1.
AllFam families and associated Pfam families
| AllFam ID | AllFam name | Pfam ID | Pfam name |
|---|---|---|---|
| AF001 | Helix-loop-helix DNA-binding domain | PF00010 | Helix-loop-helix DNA-binding domain |
| AF002 | Heat shock proteins Hsp70 | PF00012 | Hsp70 protein |
| AF003 | Animal Kunitz serine protease inhibitors | PF00014 | Kunitz/Bovine pancreatic trypsin inhibitor domain |
| AF004 | Eukaryotic aspartyl proteases | PF00026 | Eukaryotic aspartyl protease |
| PF07966 | A1 propeptide | ||
| AF005 | Cystatins | PF00031 | Cystatin domain |
| AF006 | Cytochromes c | PF00034 | Cytochrome c |
| AF007 | EF-hand domain | PF00036 | EF-hand |
| PF01023 | S-100/ICaBP type calcium binding domain | ||
| AF008 | Intermediate filament proteins | PF00038 | Intermediate filament protein |
| AF009 | Globins | PF00042 | Globin |
| AF010 | Glutathione S-transferases, C-terminal domain | PF00043 | Glutathione-S-transferase, C-terminal domain |
| AF011 | Eukaryotic elongation factors 1 | PF00736 | EF-1 guanine nucleotide exchange domain |
| AF012 | Insulin family | PF00049 | Insulin/IGF/relaxin family |
| AF013 | Kazal-type serine protease inhibitors | PF00050 | Kazal-type serine protease inhibitor domain |
| PF07648 | Kazal-type serine protease inhibitor domain | ||
| AF014 | Lactate/malate dehydrogenases | PF00056 | Lactate/malate dehydrogenase, NAD binding domain |
| PF02866 | Lactate/malate dehydrogenase, α/β C-terminal domain | ||
| AF015 | Lipocalins | PF00061 | Lipocalin/cytosolic fatty acid–binding protein family |
| PF08212 | Lipocalin-like domain | ||
| AF016 | C-type lysozyme/α-lactalbumin family | PF00062 | C-type lysozyme/α-lactalbumin family |
| AF017 | Protein kinases | PF00069 | Protein kinase domain |
| AF018 | Serpin serine protease inhibitors | PF00079 | Serpin (serine protease inhibitor) |
| AF019 | Cu/Zn superoxide dismutases | PF00080 | Copper/zinc superoxide dismutase (SODC) |
| AF020 | Fe/Mn superoxide dismutases | PF00081 | Iron/manganese superoxide dismutases, α-hairpin domain |
| PF02777 | Iron/manganese superoxide dismutases, C-terminal domain | ||
| AF021 | Subtilisin-like serine proteases | PF00082 | Subtilase family |
| PF02225 | PA domain | ||
| PF05922 | Subtilisin N-terminal Region | ||
| AF022 | Glutathione-S-transferases, N-terminal domain | PF02798 | Glutathione-S-transferase, N-terminal domain |
| AF023 | Thioredoxins | PF00085 | Thioredoxin |
| AF024 | Trypsin-like serine proteases | PF00051 | Kringle domain |
| PF00089 | Trypsin | ||
| PF00431 | CUB domain | ||
| PF00594 | Vitamin K-dependent carboxylation/γ-carboxyglutamic (GLA) domain | ||
| PF02983 | α-Lytic protease prodomain | ||
| PF09396 | Thrombin light chain | ||
| AF025 | Tubulin/FtsZ family | PF00091 | Tubulin/FtsZ family, GTPase domain |
| PF03953 | Tubulin/FtsZ family, C-terminal domain | ||
| AF027 | Trypsin inhibitor–like domain | PF00093 | von Willebrand factor type C domain |
| PF00094 | von Willebrand factor type D domain | ||
| PF01826 | Trypsin Inhibitor like cysteine rich domain | ||
| PF08742 | C8 domain | ||
| AF028 | Short-chain dehydrogenases | PF00106 | short chain dehydrogenase |
| AF029 | Zn-containing dehydrogenases | PF00107 | Zinc-binding dehydrogenase |
| PF08240 | Alcohol dehydrogenase GroES-like domain | ||
| AF030 | Papain-like cysteine proteases | PF00112 | Papain family cysteine protease |
| PF08246 | Cathepsin propeptide inhibitor domain (I29) | ||
| AF031 | Enolases | PF00113 | Enolase, C-terminal TIM barrel domain |
| PF03952 | Enolase, N-terminal domain | ||
| AF032 | Triosephosphate isomerases | PF00121 | Triosephosphate isomerase |
| AF033 | α-Amylases | PF00128 | α-Amylase, catalytic domain |
| PF02806 | α-Amylase, C-terminal all-beta domain | ||
| PF07821 | α-Amylase C-terminal beta-sheet domain | ||
| PF09260 | Domain of unknown function (DUF1966) | ||
| AF034 | Legume lectins | PF00139 | Legume lectin domain |
| AF035 | Peroxidases | PF00141 | Peroxidase |
| AF036 | Calcineurin-like phosphoesterases | PF00149 | Calcineurin-like phosphoesterase |
| PF02872 | 5′-nucleotidase, C-terminal domain | ||
| AF037 | Lipases | PF00151 | Lipase |
| PF01477 | PLAT/LH2 domain | ||
| AF038 | Cyclophilins | PF00160 | Cyclophilin type peptidyl-prolyl cis-trans isomerase/CLD |
| AF039 | Ribosome inactivating proteins | PF00161 | Ribosome-inactivating protein |
| PF00652 | Ricin-type β-trefoil lectin domain | ||
| AF040 | Aldehyde dehydrogenases | PF00171 | Aldehyde dehydrogenase family |
| AF041 | Class I chitinases | PF00182 | Chitinase class I |
| AF042 | Heat shock proteins Hsp90 | PF00183 | Hsp90 protein |
| PF02518 | Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase | ||
| AF043 | Hevein-like domain | PF00187 | Chitin recognition protein |
| AF044 | PR-1 proteins | PF00188 | SCP-like extracellular protein |
| AF045 | Cupin superfamily | PF00190 | Cupin |
| PF04702 | Vicilin N terminal region | ||
| AF046 | Kunitz soybean trypsin inhibitor family | PF00197 | Trypsin and protease inhibitor |
| AF047 | Catalases | PF00199 | Catalase |
| PF06628 | Catalase-related | ||
| AF048 | ATP synthases | PF00213 | ATP synthase δ (OSCP) subunit |
| AF049 | ATP:guanido phosphotransferases | PF00217 | ATP:guanido phosphotransferase, C-terminal catalytic domain |
| PF02807 | ATP:guanido phosphotransferase, N-terminal domain | ||
| AF050 | Prolamin superfamily | PF00234 | Protease inhibitor/seed storage/LTP family |
| AF051 | Profilins | PF00235 | Profilin |
| AF052 | Glycoside hydrolase family 32 | PF00251 | Glycosyl hydrolases family 32 N terminal |
| PF08244 | Glycosyl hydrolases family 32 C terminal | ||
| AF053 | Flavodoxins | PF00258 | Flavodoxin |
| AF054 | Tropomyosins | PF00261 | Tropomyosin |
| AF055 | Calreticulin family | PF00262 | Calreticulin family |
| AF056 | Serum albumins | PF00273 | Serum albumin family |
| AF057 | Polygalacturonases | PF00295 | Glycosyl hydrolases family 28 |
| AF058 | Ribosomal proteins L3 | PF00297 | Ribosomal protein L3 |
| AF059 | γ-thionin family | PF00304 | γ-Thionin family |
| AF060 | Thaumatin-like proteins | PF00314 | Thaumatin family |
| AF061 | Prolyl oligopeptidase family | PF00326 | Prolyl oligopeptidase family |
| AF062 | Histidine acid phosphatases | PF00328 | Histidine acid phosphatase |
| AF063 | β-1,3-Glucanases | PF00332 | Glycosyl hydrolases family 17 |
| AF064 | X8 domain | PF07983 | X8 domain |
| AF065 | Caseins | PF00363 | Casein |
| AF066 | Haemocyanins | PF00372 | Hemocyanin, copper containing domain |
| PF03723 | Hemocyanin, ig-like domain | ||
| AF067 | Multicopper oxidases | PF00394 | Multicopper oxidase |
| PF07731 | Multicopper oxidase | ||
| PF07732 | Multicopper oxidase | ||
| AF068 | Transferrins | PF00405 | Transferrin |
| AF069 | Bet v 1–related proteins | PF00407 | Pathogenesis-related protein Bet v I family |
| AF070 | 60S acidic ribosomal proteins | PF00428 | 60s Acidic ribosomal protein |
| AF071 | Xylanases | PF00457 | Glycosyl hydrolases family 11 |
| AF072 | Chlorophyll binding proteins | PF00504 | Chlorophyll A-B binding protein |
| AF073 | Pectate lyases | PF00544 | Pectate lyase |
| AF074 | Gelsolin family | PF00626 | Gelsolin repeat |
| PF02209 | Villin headpiece domain | ||
| AF075 | SGNH-hydrolase family | PF00657 | GDSL-like lipase/acylhydrolase |
| AF076 | Glycoside hydrolase family 15 | PF00686 | Starch-binding domain |
| PF00723 | Glycosyl hydrolases family 15 | ||
| AF077 | Glycoside hydrolase family 18 | PF00704 | Glycosyl hydrolases family 18 |
| AF078 | Chitin-binding peritrophin-A domain | PF01607 | Chitin-binding peritrophin-A domain |
| AF079 | Glycoside hydrolase family 16 | PF00722 | Glycosyl hydrolases family 16 |
| AF080 | Glycoside hydrolase family 20 | PF00728 | Glycosyl hydrolase family 20, catalytic domain |
| AF081 | GMC oxidoreductases | PF00732 | GMC oxidoreductase |
| PF05199 | GMC oxidoreductase | ||
| AF082 | Glyoxalase superfamily | PF00903 | Glyoxalase/bleomycin resistance protein/dioxygenase superfamily |
| AF083 | Glycoside hydrolase family 3 | PF00933 | Glycosyl hydrolase family 3 N terminal domain |
| PF01915 | Glycosyl hydrolase family 3 C terminal domain | ||
| AF084 | Barwin family | PF00967 | Barwin family |
| AF085 | κ-Caseins | PF00997 | κ-Casein |
| AF086 | Staphylococcal/streptococcal toxins | PF01123 | Staphylococcal/streptococcal toxin, OB-fold domain |
| PF02876 | Staphylococcal/Streptococcal toxin, β-grasp domain | ||
| AF087 | Ole e 1–related proteins | PF01190 | Pollen proteins Ole e I family |
| AF088 | Casein kinase 2 regulatory subunit | PF01214 | Casein kinase II regulatory subunit |
| AF089 | Lipid-binding serum glycoproteins | PF01273 | LBP/BPI/CETP family, N-terminal domain |
| AF090 | Oleosins | PF01277 | Oleosin |
| AF091 | Diphtheria toxins | PF01324 | Diphtheria toxin, R domain |
| PF02763 | Diphtheria toxin, C domain | ||
| PF02764 | Diphtheria toxin, T domain | ||
| AF092 | Lipoproteins | PF01347 | Lipoprotein amino terminal region |
| AF093 | Expansins, C-terminal domain | PF01357 | Pollen allergen |
| AF094 | Expansins, N-terminal domain | PF03330 | Rare lipoprotein A (RlpA)-like double-psi β-barrel |
| AF095 | Melittins | PF01372 | Melittin |
| AF096 | β-Amylases | PF01373 | Glycosyl hydrolase family 14 |
| AF097 | Collagens | PF01391 | Collagen triple helix repeat (20 copies) |
| PF01410 | Fibrillar collagen C-terminal domain | ||
| AF098 | Pheromone and odorant binding proteins | PF01395 | PBP/GOBP family |
| AF099 | Berberine bridge enzymes | PF01565 | FAD-binding domain |
| PF08031 | Berberine and berberine like | ||
| AF100 | Myosin tail | PF01576 | Myosin tail |
| AF101 | Flavin containing amine oxidoreductases | PF01593 | Flavin containing amine oxidoreductase |
| AF102 | Group 5/6 grass pollen allergens | PF01620 | Ribonuclease (pollen allergen) |
| AF103 | Hyaluronidases | PF01630 | Hyaluronidase |
| AF104 | Patatin family | PF01734 | Patatin-like phospholipase |
| AF105 | Clostridial neurotoxins | PF01742 | Clostridial neurotoxin zinc protease |
| PF07951 | Clostridium neurotoxin, C-terminal receptor binding | ||
| PF07952 | Clostridium neurotoxin, translocation domain | ||
| PF07953 | Clostridium neurotoxin, N-terminal receptor binding | ||
| AF106 | Class 3 lipases | PF01764 | Lipase (class 3) |
| PF03893 | Lipase 3 N-terminal region | ||
| AF107 | NAC domain | PF00627 | UBA/TS-N domain |
| PF01849 | NAC domain | ||
| AF108 | DJ-1/PfpI family | PF01965 | DJ-1/PfpI family |
| AF109 | Fungalysin metalloproteases | PF02128 | Fungalysin metallopeptidase (M36) |
| PF07504 | Fungalysin/thermolysin propeptide motif | ||
| AF110 | Nuclear transport factor 2 | PF02136 | Nuclear transport factor 2 (NTF2) domain |
| AF111 | Group 2 mite allergens | PF02221 | ML domain |
| AF112 | Plastocyanin-like proteins | PF02298 | Plastocyanin-like domain |
| AF113 | Ribonucleases N1 and T1 | PF00545 | ribonuclease |
| AF114 | Animal haem peroxidases | PF03098 | Animal haem peroxidase |
| AF115 | High-molecular-weight glutenins | PF03157 | High-molecular-weight glutenin subunit |
| AF116 | SART-1 family | PF03343 | SART-1 family |
| AF117 | Endonuclease/exonuclease/phosphatase family | PF03372 | Endonuclease/exonuclease/phosphatase family |
| AF118 | Group 5 ragweed allergens | PF03913 | Amb V allergen |
| AF119 | Triabin family | PF03973 | Triabin |
| AF120 | Plant invertase/pectin methylesterase inhibitors | PF04043 | Plant invertase/pectin methylesterase inhibitor |
| AF121 | BCL7 family | PF04714 | BCL7, N-terminal conserver region |
| AF122 | Alliinases | PF04863 | Alliinase EGF-like domain |
| PF04864 | Allinase | ||
| AF123 | Isoflavone reductase family | PF05368 | NmrA-like family |
| AF124 | Apovitellenin I | PF05418 | Apovitellenin I (Apo-VLDL-II) |
| AF125 | Rubber elongation factor family | PF05755 | Rubber elongation factor protein (REF) |
| AF126 | Phospholipases A2 | PF05826 | Phospholipase A2 |
| AF127 | Group 1 cockroach allergens | PF06757 | Insect allergen-related repeat |
| AF128 | Proteins of unknown function (DUF1397) | PF07165 | Protein of unknown function (DUF1397) |
| AF129 | Cerato-platanins | PF07249 | Cerato-platanin |
| AF130 | Apolipophorin III | PF07464 | Apolipophorin-III precursor (apoLp-III) |
| AF131 | Redoxins | PF08534 | Redoxin |
| AF132 | Fibrinogen α-chain | PF08702 | Fibrinogen α-chain |
| AF133 | Alginate lyases | PF08787 | Alginate lyase |
| AF134 | Fel d 1 family | PF09252 | Allergen Fel d I-B chain |
| AF135 | Ole e 6 family | PF09253 | Pollen allergen ole e 6 |
Table E2.
Pfam clans that contain more than 1 family of allergens
| Pfam ID | Protein family name | Allergens |
|---|---|---|
| TIM barrel glycosyl hydrolase superfamily (CL0058) | ||
| Glycoside hydrolase family 18 | 6 | |
| α-Amylases | 6 | |
| β1,3-Glucanases | 6 | |
| Hyaluronidases | 5 | |
| Glycoside hydrolase family 20 | 1 | |
| Glycoside hydrolase family 3 | 1 | |
| β-Amylases | 1 | |
| Total | 26 | |
| Thioredoxin-like (CL0172) | ||
| Thioredoxins | 11 | |
| Redoxins | 6 | |
| Glutathione-S-transferases, N-terminal domain | 6 | |
| Total | 23 | |
| Calycin superfamily (CL0116) | ||
| Lipocalins | 21 | |
| Triabin family | 1 | |
| Total | 22 | |
| Pectate lyase-like β-helix (CL0268) | ||
| Pectate lyases | 9 | |
| Polygalacturonases | 8 | |
| Total | 17 | |
| Double Psi β-barrel glucanase (CL0199) | ||
| Expansins, N-terminal domain | 12 | |
| Cerato-platanins | 2 | |
| Barwin family | 1 | |
| Total | 15 | |
| β-Trefoil superfamily (CL0066) | ||
| Ribosome-inactivating proteins | 8 | |
| Kunitz soybean trypsin inhibitor family | 4 | |
| Clostridial neurotoxins | 1 | |
| Total | 13 | |
| α/β hydrolase fold (CL0028) | ||
| Lipases | 8 | |
| Prolyl oligopeptidase family | 3 | |
| Class 3 lipases | 1 | |
| Total | 12 | |
| Lysozyme-like superfamily (CL0037) | ||
| Class I chitinases | 6 | |
| C-type lysozyme/α-lactalbumin family | 4 | |
| Total | 10 | |
| FAD/NAD(P)-binding Rossmann fold superfamily (CL0063) | ||
| Isoflavone reductase family | 3 | |
| Short-chain dehydrogenases | 2 | |
| Lactate/malate dehydrogenases | 1 | |
| Zn-containing dehydrogenases | 1 | |
| GMC oxidoreductases | 1 | |
| Flavin containing amine oxidoreductases | 1 | |
| Total | 9 | |
| Concanavalin-like lectin/glucanase superfamily (CL0004) | ||
| Legume lectins | 3 | |
| Glycoside hydrolase family 16 | 2 | |
| Xylanases | 1 | |
| Total | 6 | |
| Multicopper oxidase-like domain (CL0026) | ||
| Plastocyanin-like proteins | 1 | |
| Multicopper oxidases | 1 | |
| Total | 2 | |
| Peptidase clan MA (CL0126) | ||
| Clostridial neurotoxins | 1 | |
| Fungalysin metalloproteases | 1 | |
| Total | 2 | |
Table E3.
SCOP folds that contain more than 1 family of allergens
| SCOP classification | ||||
|---|---|---|---|---|
| Class | Fold | Superfamily | Family | AllFam family |
| a: All α proteins | a.39: EF Hand-like | a.39.1: EF-hand | a.39.1.2: S100 proteins | AF007: EF-hand domain |
| a.39.1.4: Parvalbumin | ||||
| a.39.1.5: Calmodulin-like | ||||
| a.39.1.10: Polcalcin | ||||
| a.39.2: Insect pheromone/odorant-binding proteins | a.39.2.1: Insect pheromone/odorant-binding proteins | AF098: Pheromone and odorant binding proteins | ||
| a.52: Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin | a.52.1: Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin | a.52.1.1: Plant lipid-transfer and hydrophobic proteins | AF050: Prolamin superfamily | |
| a.52.1.2: Proteinase/α-amylase inhibitors | ||||
| a.52.1.3: Seed storage protein, 2S albumin | ||||
| a.93: Heme-dependent peroxidases | a.93.1: Heme-dependent peroxidases | a.93.1.1: CCP-like | AF035: Peroxidases | |
| a.93.1.2: Myeloperoxidase-like | AF114: Animal haem peroxidases | |||
| b: All β proteins | b.1: Immunoglobulin-like β-sandwich | b.1.8: Cu, Zn superoxide dismutase-like | b.1.8.1: Cu, Zn superoxide dismutase-like | AF019: Cu/Zn Superoxide dismutases |
| b.1.18: E set domains | b.1.18.7: ML domain | AF111: Group 2 mite allergens | ||
| b.1.18.3: Arthropod hemocyanin, C-terminal domain | AF066: Haemocyanins (C-terminal domain) | |||
| b.6: Cupredoxin-like | b.6.1: Cupredoxins | b.6.1.1: Plastocyanin/azurin-like | AF112: Plastocyanin-like proteins | |
| b.6.1.3: Multidomain cupredoxins | AF067: Multicopper oxidases | |||
| b.29: Concanavalin A-like lectins/glucanases | b.29.1: Concanavalin A-like lectins/glucanases | b.29.1.1: Legume lectins | AF034: Legume lectins | |
| b.29.1.2: Glycosyl hydrolases family 16 | AF079: Glycoside hydrolase family 16 | |||
| b.29.1.6: Clostridium neurotoxins, the second last domain | AF105: Clostridial neurotoxins (1st receptor binding domain) | |||
| b.29.1.11: Xylanase/endoglucanase 11/12 | AF071: Xylanases | |||
| b.29.1.12: Calnexin/calreticulin | AF055: Calreticulin family (N-terminal domain) | |||
| b.29.1.18: Alginate lyase | AF133: Alginate lyases | |||
| b.29.1.19: Glycosyl hydrolases family 32 C-terminal domain | AF052: Glycoside hydrolase family 32 (C-terminal domain) | |||
| b.42: β-Trefoil | b.42.2: Ricin B-like lectins | b.42.2.1: Ricin B-like | AF039: Ribosome-inactivating proteins (β-trefoil domain) | |
| b.42.4: STI-like | b.42.4.1: Kunitz (STI) inhibitors | AF046: Kunitz soybean trypsin inhibitor family | ||
| b.42.4.2: Clostridium neurotoxins, C-terminal domain | AF105: Clostridial neurotoxins (2nd receptor binding domain) | |||
| b.52: Double psi β-barrel | b.52.1: Barwin-like endoglucanases | b.52.1.2: Barwin | AF084: Barwin family | |
| b.52.1.3: Pollen allergen PHL P 1 N-terminal domain | AF094: Expansins, N-terminal domain | |||
| b.60: Lipocalins | b.60.1: Lipocalins | b.60.1.1: Retinol binding protein-like | AF015: Lipocalins | |
| b.60.1.3: Thrombin inhibitor | AF119: Triabin family | |||
| b.80: Single-stranded right-handed β-helix | b.80.1: Pectin lyase-like | b.80.1.1: Pectate lyase-like | AF073: Pectate lyases | |
| b.80.1.3: Galacturonase | AF057: Polygalacturonases | |||
| c: α and β proteins (a/b) | c.1: TIM β/α -barrel | c.1.1: Triosephosphate isomerase (TIM) | c.1.1.1: Triosephosphate isomerase (TIM) | AF032: Triosephosphate isomerases |
| c.1.8: (Trans)glycosidases | c.1.8.1: Amylase, catalytic domain | AF033: α-Amylases (catalytic domain) | ||
| AF096: β-Amylases | ||||
| c.1.8.3: β-glycanases | AF063: β-1,3-Glucanases | |||
| c.1.8.5: Type II chitinase | AF077: Glycoside hydrolase family 18 | |||
| c.1.8.6: β-N-acetylhexosaminidase catalytic domain | AF080: Glycoside hydrolase family 20 | |||
| c.1.8.7: NagZ-like | AF083: Glycoside hydrolase family 3 (N-terminal domain) | |||
| c.1.8.9: Bee venom hyaluronidase | AF103: Hyaluronidases | |||
| c.1.11: Enolase C-terminal domain-like | c.1.11.1: Enolase | AF031: Enolases (C-terminal domain) | ||
| c.2: NAD(P)-binding Rossmann-fold domains | c.2.1: NAD(P)-binding Rossmann-fold domains | c.2.1.1: Alcohol dehydrogenase-like, C-terminal domain | AF029: Zn-containing dehydrogenases (C-terminal domain) | |
| c.2.1.2: Tyrosine-dependent oxidoreductases | AF123: Isoflavone reductase family | |||
| AF028: Short-chain dehydrogenases | ||||
| c.2.1.5: LDH N-terminal domain-like | AF014: Lactate/malate dehydrogenases (N-terminal domain) | |||
| c.3: FAD/NAD(P)-binding domain | c.3.1: FAD/NAD(P)-binding domain | c.3.1.2: FAD-linked reductases, N-terminal domain | AF101: Flavin containing amine oxidoreductases (N-terminal domain) | |
| AF081: GMC oxidoreductases (N-terminal domain) | ||||
| c.23: Flavodoxin-like | c.23.5: Flavoproteins | c.23.5.1: Flavodoxin-related | AF053: Flavodoxins | |
| c.23.16: Class I glutamine amidotransferase-like | c.23.16.2: DJ-1/PfpI | AF108: DJ-1/PfpI family | ||
| c.23.11: β-D-glucan exohydrolase, C-terminal domain | c.23.11.1: β-D-glucan exohydrolase, C-terminal domain | AF083: Glycoside hydrolase family 3 (C-terminal domain) | ||
| c.47: Thioredoxin fold | c.47.1: Thioredoxin-like | c.47.1.1: Thioltransferase | AF023: Thioredoxins | |
| c.47.1.5: Glutathione S-transferase (GST), N-terminal domain | AF022: Glutathione S-transferases, N-terminal domain | |||
| c.47.1.10: Glutathione peroxidase-like | AF131: Redoxins | |||
| c.69: α/β-Hydrolases | c.69.1: α/β-Hydrolases | c.69.1.4: Prolyl oligopeptidase, C-terminal domain | AF061: Prolyl oligopeptidase family (C-terminal domain) | |
| c.69.1.17: Fungal lipases | AF106: Class 3 lipases | |||
| c.69.1.19: Pancreatic lipase, N-terminal domain | AF037: Lipases | |||
| d: α and β proteins (a+b) | d.2: Lysozyme-like | d.2.1: Lysozyme-like | d.2.1.1: Family 19 glycosidase | AF041: Class I chitinases |
| d.2.1.2: C-type lysozyme | AF016: C-type lysozyme/α-lactalbumin family | |||
| d.16: FAD-linked reductases, C-terminal domain | d.16.1: FAD-linked reductases, C-terminal domain | d.16.1.1: GMC oxidoreductases | AF081: GMC oxidoreductases (C-terminal domain) | |
| d.16.1.5: L-amino acid/polyamine oxidase | AF101: Flavin containing amine oxidoreductases (C-terminal domain) | |||
| d.17: Cystatin-like | d.17.1: Cystatin/monellin | d.17.1.2: Cystatins | AF005: Cystatins | |
| d.17.4: NTF2-like | d.17.4.2: NTF2-like | AF110: Nuclear transport factor 2 | ||
| g: Small proteins | g.3: Knottins (small inhibitors, toxins, lectins) | g.3.1: Plant lectins/antimicrobial peptides | g.3.1.1: Hevein-like agglutinin (lectin) domain | AF043: Hevein-like domain |
| g.3.7: Scorpion toxin-like | g.3.7.5: Plant defensins | AF059: γ-thionin family | ||
| h: Coiled coil proteins | h.1: Parallel coiled-coil | h.1.5: Tropomyosin | h.1.5.1: Tropomyosin | AF054: Tropomyosins |
| h.1.8: Fibrinogen coiled-coil and central regions | h.1.8.1: Fibrinogen coiled-coil and central regions | AF132: Fibrinogen α-chain | ||
| h.1.20: Intermediate filament protein, coiled coil region | h.1.20.1: Intermediate filament protein, coiled coil region | AF008: Intermediate filament proteins | ||
References
- . Nomenclature and structural biology of allergens. J Allergy Clin Immunol. 2007;119:414–420
- . Bioinformatics applied to allergy: allergen databases, from collecting sequence information to data integration. The Allergome platform as a model. Cell Immunol. 2006;244:97–100
- . Evolutionary biology of plant food allergens. J Allergy Clin Immunol. 2007;120:518–525
- Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251
- . Allergenicity prediction by protein sequence. FASEB J. 2003;17:1141–1143
- . A classification of plant food allergens. J Allergy Clin Immunol. 2004;113:821–830
- . Allergic cross-reactivity: from gene to the clinic. Allergy. 2004;59:243–267
- . Structural relatedness of plant food allergens with specific reference to cross-reactive allergens: an in silico analysis. J Allergy Clin Immunol. 2005;115:163–170
- . Pollen allergens are restricted to few protein families and show distinct patterns of species distribution. J Allergy Clin Immunol. 2006;117:141–147
- . Evolutionary distance from human homologs reflects allergenicity of animal food proteins. J Allergy Clin Immunol. 2007;120:1399–1405
- . SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:D226–D229
- . The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882
- . TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996;12:357–358
- Crystal structure of a hypoallergenic isoform of the major birch pollen allergen Bet v 1 and its likely biological function as a plant steroid carrier. J Mol Biol. 2003;325:123–133
- . Structural features of allergenic molecules. Chem Immunol Allergy. 2006;91:134–146
- . Allergenic tropomyosins and their cross-reactivities. Protein Pept Lett. 2006;13:835–845
- . MD-2 and Der p 2—a tale of two cousins or distant relatives?. J Endotoxin Res. 2005;11:186–192
- . Pectate lyases, cell wall degradation and fruit softening. J Exp Bot. 2002;53:2115–2119
- . A molecular basis for the endo-beta 1,3-glucanase activity of the thaumatin-like proteins from edible fruits. Biochimie. 2003;85:123–131
- Der p 1 facilitates transepithelial allergen delivery by disruption of tight junctions. J Clin Invest. 1999;104:123–133
- The protease allergen Der p 1 cleaves cell surface DC-SIGN and DC-SIGNR: experimental analysis of in silico substrate identification and implications in allergic responses. Clin Exp Allergy. 2007;37:231–242
- . Structure and biology of stinging insect venom allergens. Int Arch Allergy Immunol. 2000;123:99–106
- . Structural biology of allergens. J Allergy Clin Immunol. 2000;106:228–238
- . Molecular properties of food allergens. J Allergy Clin Immunol. 2005;115:14–23
- . Characteristics and immunobiology of grass pollen allergens. Int Arch Allergy Immunol. 2003;130:87–107
- . Allergens as eukaryotic proteins lacking bacterial homologues. Mol Immunol. 2007;44:3256–3260
- Cross-reactive and species-specific immunoglobulin E epitopes of plant profilins: an experimental and structure-based analysis. Clin Exp Allergy. 2006;36:920–929
- . IgE cross-reactivity between the major peanut allergen Ara h 2 and tree nut allergens. Mol Immunol. 2007;44:463–471
- Ana o 3, an important cashew nut (Anacardium occidentale L.) allergen of the 2S albumin family. J Allergy Clin Immunol. 2005;115:1284–1290
Supported by grant SFB-F01802 from the Austrian Science Fund (to H.B.) and an Austrian Programme for Advanced Research and Technology grant from the Austrian Academy of Sciences (to S.W.).
Disclosure of potential conflict of interest: A. Mari is the responsible administrative contact for Allergy Data Laboratories. The rest of the authors have declared that they have no conflict of interest.
PII: S0091-6749(08)00163-2
doi:10.1016/j.jaci.2008.01.025
© 2008 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Volume 121, Issue 4 , Pages 847-852.e7, April 2008
