The Journal of Allergy and Clinical Immunology
Volume 121, Issue 4 , Pages 847-852.e7, April 2008

Allergens are distributed into few protein families and possess a restricted number of biochemical functions

  • Christian Radauer, PhD

      Affiliations

    • Department of Pathophysiology, Center for Physiology, Pathophysiology and Immunology, Medical University of Vienna, Vienna, Austria
  • ,
  • Merima Bublin, PhD

      Affiliations

    • Department of Pathophysiology, Center for Physiology, Pathophysiology and Immunology, Medical University of Vienna, Vienna, Austria
  • ,
  • Stefan Wagner, PhD

      Affiliations

    • Department of Pathophysiology, Center for Physiology, Pathophysiology and Immunology, Medical University of Vienna, Vienna, Austria
  • ,
  • Adriano Mari, MD

      Affiliations

    • Allergy Data Laboratories, Latina, Italy
    • Center for Clinical and Experimental Allergology, IDI-IRCCS, Rome, Italy
  • ,
  • Heimo Breiteneder, PhD

      Affiliations

    • Department of Pathophysiology, Center for Physiology, Pathophysiology and Immunology, Medical University of Vienna, Vienna, Austria
    • Corresponding Author InformationReprint requests: Heimo Breiteneder, PhD, Department of Pathophysiology, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria.

Received 29 October 2007; received in revised form 22 January 2008; accepted 23 January 2008.

Article Outline

Background

Existing allergen databases classify their entries by source and route of exposure, thus lacking an evolutionary, structural, and functional classification of allergens.

Objective

We sought to build AllFam, a database of allergen families, and use it to extract common structural and functional properties of allergens.

Methods

Allergen data from the Allergome database and protein family definitions from the Pfam database were merged into AllFam, a database that is freely accessible on the Internet at http://www.meduniwien.ac.at/allergens/allfam/. A structural classification of allergens was established by matching Pfam families with families from the Structural Classification of Proteins database. Biochemical functions of allergens were extracted from the Gene Ontology Annotation database.

Results

Seven hundred seven allergens were classified by sequence into 134 AllFam families containing 184 Pfam domains (2% of 9318 Pfam families). A random set of 707 sequences with the same taxonomic distribution contained a significantly higher number of different Pfam domains (479 ± 17). Classifying allergens by structure revealed that 5% of 3012 Structural Classification of Proteins families contained allergens. The biochemical functions of allergens most frequently found were limited to hydrolysis of proteins, polysaccharides, and lipids; binding of metal ions and lipids; storage; and cytoskeleton association.

Conclusion

The small number of protein families that contain allergens and the narrow functional distribution of most allergens confirm the existence of yet unknown factors that render proteins allergenic.

Key words: Allergens, protein families, allergen structures, allergen databases

Abbreviations used: GO, Gene ontology, nsLTP, Nonspecific lipid transfer protein, SCOP, Structural Classification of Proteins, TIM, Triosephosphate isomerase, UniProt, Universal Protein Resource

 

Since the identification and cloning of the first allergenic proteins in the late 1980s, hundreds of allergens have been identified and their sequences determined. A number of databases that provide molecular, biochemical, and clinical data of allergens were established, such as the Official List of Allergens issued by the International Union of Immunological Societies Allergen Nomenclature Sub-committee (http://www.allergen.org),1 the Allergome (http://www.allergome.org),2 the Food Allergy Research and Resource Program Allergen Database (http://www.allergenonline.com), and the InFormAll database (http://foodallergens.ifr.ac.uk/).

The growing number of available allergen sequences together with the advancements of bioinformatics tools and methods enabled scientists to shed light on evolutionary and structural relationships between allergens from different sources.3 In particular, protein family databases that are linked to protein sequence databases, such as the Pfam database,4 provided the basis of a novel classification of allergens. Several studies revealed that most allergens can be found in a limited number of protein families.5, 6, 7, 8, 9, 10

Records of most allergen databases are organized by type of allergen source and route of exposure. Likewise, allergen designations according to the official allergen nomenclature are derived from the scientific name of the allergen source species and a sequential number that in most cases does not reflect evolutionary relationships between allergens. To bring together allergen data stored in allergen databases and evolutionary and structural relationships between allergens established from protein family databases, we constructed AllFam, a database of allergen families. In the present study we used data extracted from AllFam to establish the protein family distribution of allergens and to elucidate common structural and biochemical features of allergens, thus shifting the focus from single allergens or allergen families to a systematic analysis of the complete range of known allergens.

Back to Article Outline

Methods 

Construction of the AllFam database 

Data of allergens with known sequences (name, source, routes of exposure, and Universal Protein Resource [UniProt] accession numbers) were downloaded from Allergome,2 a database based on allergen data published in peer-reviewed journals. Data on routes of exposure from Allergome were merged into the following standardized categories: inhalation, ingestion, sting/bite, contact, iatrogenic, and autoallergen.

UniProt accession numbers were compared with SwissPfam, a database of precomputed protein domain architectures generated by comparing all entries of the UniProt protein sequence database with the Pfam database (version 22.0; July 2007; http://pfam.sanger.ac.uk).4 For entries that yielded no results, sequences were downloaded and compared with Pfam protein family definitions by using the hmmpfam program from the HMMER 2.3 package (http://hmmer.wustl.edu). This hmmpfam program compares a query sequence with all Pfam protein family definitions, which are stored as hidden Markov models, probabilistic descriptions that are generated from multiple sequence alignments and yield the probabilities of occurrence of all amino acids, as well as of insertions and deletions for each alignment position.

Domain architectures of allergens were translated into AllFam allergen families by using the following criteria. For single-domain proteins, each Pfam family corresponded to an AllFam family. To avoid an artificially high number of allergen families because of counting domains of multidomain proteins as separate families, Pfam domains constituting multidomain proteins were merged into single AllFam families if the constituting domains exclusively occurred in members of a single protein family. Otherwise, each domain was treated as a separate AllFam family.

The AllFam database is freely accessible at http://www.meduniwien.ac.at/allergens/allfam/. It can be queried for lists of allergen families filtered by source and route of exposure. In addition, for each family, the database contains a list of allergens and an allergen family fact sheet with information on biochemical properties and the allergologic significance of its allergenic members. AllFam is cross-linked with the Allergome database and regularly updated.

Protein family distribution of a random set of sequences 

Random entries were downloaded from the UniProt database (http://www.expasy.org/uniprot/) and parsed for taxonomic group and Pfam domains. The procedure was repeated until the number of sequences that contained Pfam annotations from plants, animals, fungi, and bacteria reached the numbers of allergens with known protein family memberships in these kingdoms. The number of different Pfam domains found in these sequences was counted. Twenty independent runs of the program were performed. Significance of the difference between the number of protein families found among allergens and among random sequences was tested by using the 1-sample t test.

Structural and functional classification of allergens 

Structures of allergens and allergen homologues were classified by using the Structural Classification of Proteins (SCOP) database (release 1.71, October 2006; http://scop.mrc-lmb.cam.ac.uk/scop/).11 AllFam families and SCOP families were matched by using the links to SCOP embedded in the Pfam database.

For a functional classification of allergens using standardized descriptions of biologic functions, all UniProt accession numbers of allergen sequences in AllFam were compared with the Gene Ontology (GO) Annotation Database (http://www.ebi.ac.uk/GOA/).

Sequence conservation within families of allergens 

Sequences of representative allergens from the 4 most important families of allergens (prolamins, profilins, tropomyosins, and the EF-hand family) were aligned by using ClustalX 1.83.12 Sequence identity matrices and neighbor-joining phylogenetic trees were generated from these alignments with ClustalX and visualized with TreeView 1.6.6.13

Back to Article Outline

Results 

Protein family distribution of allergens 

The AllFam database (version of July 18, 2007) contained 847 allergens with known partial or total sequences (Table I). Of these, 707 allergens were classified into 134 AllFam families that contained 184 different Pfam domains. Thus allergens were found in only 2% of all 9318 families in the Pfam database. The list of AllFam families and associated Pfam families can be found in Table E1 in the Online Repository (available at www.jacionline.org). The distribution of allergens was highly biased toward a few protein families. Although the protein family with the highest number of allergens, the prolamin superfamily, contained 59 allergens (8% of all allergens with known protein family) and the 10 most abundant families contained 300 allergens (42%; Fig 1, A and B), there were 53 families that contained only a single allergen.

Table I. Numbers of sequences and protein families of allergens in AllFam
SequencesSequences from known protein familiesAllFam familiesAllFam families with >1 allergen
All allergens84770713481
Sources
Plants3693385834
Animals3052686036
Fungi163913716
Bacteria101051
Routes of exposure
Inhalation4793779959
Ingestion2572404829
Sting, bite6652147
Contact58503510
Autoallergen1414140
Iatrogenic111072
  • View full-size image.
  • Fig 1. 

    The 15 protein families with the highest number of allergens classified by source (A) and route of exposure (B). Numbers in Fig 1, B, differ because of multiple or missing routes of exposure for some allergens. C, Protein family distribution of randomly selected sequences. ox., Oxidase; oxred., oxidoreductase; PD, periplasmic domain; RuBisCo. ribulose-1,5-bisphosphate carboxylase/oxidase; TD, transmembrane domain; term., terminal; NADH, nicotinamide adenine dinucleotide.

Thirty-eight allergen families were grouped by structural similarity or common sequence motifs into 12 superfamilies (termed clans in the Pfam database; see Table E2 in the Online Repository at www.jacionline.org). The most important clan, which comprised 7 allergen families, was the triosephosphate isomerase (TIM) barrel glycosyl hydrolase superfamily that contained main allergen families from mites (chitinases from the glycoside hydrolase family 18), plants and fungi (α-amylases, β1,3-glucanases), and insect venoms (hyaluronidases).

Distribution of protein families among allergens of different sources and routes of exposure 

Fig 1, A and B, shows the 15 most important families of allergens itemized by source and route of exposure. Most allergen families were confined to a single source kingdom, such as prolamins, profilins, and cupins from plants and tropomyosins, lipocalins, and caseins from animals. A minority of protein families, such as the EF-hand family and the pathogenesis-related proteins (PR-1), contained allergens from multiple kingdoms. A grouping of allergens by route of exposure yielded a different picture. Most protein families contained allergens that sensitize human subjects through different routes. Among these are allergens responsible for cross-reactivity between inhalative allergen sources and foods, such as profilins, Bet v 1–related allergens, and tropomyosins.

Protein family distribution of randomly selected sequences 

A comparison of the protein family distribution of allergens with the distribution of random UniProt entries confirmed that the number of protein families among allergens was much smaller than expected from a random sample. A random selection of 707 sequences with the same proportions of plant, animal, fungal, and bacterial sequences as among allergens contained an average of 479 different Pfam domains (SD, 17), a number that was significantly higher (P < .001) than the 184 different Pfam domains among allergens.

The most abundant protein families found among these random sequences differed largely from those determined for allergens. A representative result is shown in Fig 1, C. Although the protein family distributions of different sequence sets showed considerable differences concerning minor protein families, the 15 most abundant protein families were largely identical for all runs. The only allergen-containing protein family in the top 15 was the protein kinase family (sixth rank; Fig 1, C), which contained a single allergen.

Structural classification of allergens 

A structural classification of allergens whose 3-dimensional structures have been experimentally determined or inferred from sequence similarity showed a restricted distribution similar to the distribution of allergens into sequence-based Pfam families (Table II). Allergens were found in all structural classes, as defined by SCOP. However, all members of protein families that contained allergens could be grouped into only 138 structural families (5% of all families in the SCOP database).

Table II. Structural classes of all protein families that contain allergens
All structures in SCOPStructures of allergens and allergen homologs
SCOP classFoldsSuperfamiliesFamiliesFoldsSuperfamiliesFamilies
a: All α proteins22639264519 (8%)20 (5%)25 (4%)
b: All β proteins14930054922 (15%)24 (8%)36 (7%)
c: α and β proteins (a/b)13422166114 (10%)18 (8%)29 (4%)
d: α and β proteins (a+b)28642475328 (10%)29 (7%)31 (4%)
e: Multidomain proteins4848642 (4%)2 (4%)2 (3%)
f: Membrane and cell-surface proteins49901012 (4%)2 (2%)2 (2%)
g: Small proteins791141868 (10%)9 (8%)9 (5%)
h: Coiled coil proteins750532 (29%)4 (8%)4 (8%)
Totals9781639301297 (10%)108 (7%)138 (5%)

Percentage values are given relative to all folds, superfamilies, and families in SCOP, respectively.

A comparison of the numbers of structural families, superfamilies, and folds that contain allergens showed that structural allergen families did not cluster in certain folds. All 3012 families in the SCOP database were grouped into 1639 superfamilies and 978 folds, whereas the 138 structural families that contained allergens were grouped into 108 superfamilies and 97 folds (Table II). Twenty-one folds contained more than 1 allergen family (see Table E3 in the Online Repository at www.jacionline.org). The folds that contained the greatest numbers of allergen families were the TIM β/α-barrel fold (SCOP accession no. c.1), with 9 allergen families, and the concanavalin-like lectins/glucanases fold (SCOP accession no. b.29), with 7 allergen families.

Functional classification of allergens 

The standardized, hierarchically organized terms of the GO database were used to determine the biologic functions most frequently found among allergens (Table III). Of the 847 allergens listed in AllFam, 644 contained GO annotations distributed among 351 different GO terms.

Table III. The 15 GO terms associated with the highest number of allergens in AllFam
GO termAllergens
Molecular function
Hydrolase activity (GO:0016787)119
Peptidase activity (GO:0008233)56
Metal ion binding (GO:0046872)73
Calcium ion binding (GO:0005509)56
Nutrient reservoir activity (GO:0045735)55
Lipid binding (GO:0008289)53
Actin binding (GO:0003779)48
Biologic process
Transport (GO:0006810)105
Metabolic process (GO:0008152)45
Proteolysis (GO:0006508)58
Carbohydrate metabolic process (GO:0005975)45
Cytoskeleton organization and biogenesis (GO: 0007010)45
Cellular component
Extracellular region (GO:0005576)109
Cytoplasm (GO:0005737)76
Cytoskeleton (GO:0005856)45

One sixth of all allergens in AllFam (119 allergens) were inferred to possess hydrolase activity. Half of them (58 allergens) were proteases, such as trypsin-like and subtilisin-like serine proteases (14 and 13 allergens, respectively) and papain-like cysteine proteases (10 allergens). Other hydrolytic enzymes included polygalacturonases (8 allergens), lipases (8 allergens), and ribosome-inactivating proteins (8 allergens).

Many allergens bound metal ions. These included calcium-binding allergens from the EF-hand family (32 allergens), serum albumins (12 allergens), globins (9 allergens), enolases (9 allergens), and Fe/Mn superoxide dismutases (7 allergens). Allergens with lipid-binding activity comprised nonspecific lipid transfer proteins (nsLTPs) from the prolamin superfamily (28 allergens), serum albumins (12 allergens), and lipocalins (9 allergens). Although not annotated in the GO database, lipid-binding activity was shown for allergens from several other families, such as Bet v 1–related allergens that bind plant steroids.14

The nonmetabolic biologic process associated with the greatest number of allergens was transport. This group of allergens comprised lipid-binding proteins, such as the nsLTPs (28 allergens) and lipocalins (21 allergens), as well as general carrier proteins, such as serum albumins (12 allergens) and caseins (12 allergens). Many allergens from the cupin and prolamin superfamilies (26 and 22 allergens, respectively) were annotated as nutrient reservoirs.

The GO terms from the category “cellular component” most frequently found in allergen sequence annotations were the general terms “extracellular region” and “cytoplasm.” In addition, 45 allergens (44 profilins and a single tropomyosin) were described as associated with the cytoskeleton.

Of the 203 allergens without GO annotations, 112 were not assigned to a protein family, in most cases because their sequences were too short. The remaining 91 sequences were grouped into 17 AllFam families, with tropomyosins (34 allergens), group 2 mite allergens (10 allergens), thaumatin-like proteins (9 allergens), Ole e 1–related proteins (9 allergens), and pectate lyases (9 allergens) as the prevailing families.

Sequence conservation within families of allergens 

Fig 2 shows phylogenetic trees representing the degree of sequence conservation within 3 of the 4 most important protein families of allergens. The extent of sequence conservation among members of these families showed considerable differences. nsLTPs from the prolamin superfamily (Fig 2, A) showed a moderate degree of sequence identity between allergens from different plant families (25% to 67%) and considerable sequence conservation only among proteins from botanically related species (at least 69% sequence identity among nsLTPs from Rosaceae fruits). In contrast, sequence identities between 2S albumin allergens from different plant families were generally low (18% to 39%; Fig 2, A). 2S albumins shared only 7% to 25% of their sequences with nsLTPs.

  • View full-size image.
  • Fig 2. 

    Sequence conservation among homologous allergens. Amino acid sequences of allergens from the prolamin superfamily (A), the tropomyosin family (B), and the EF-hand superfamily (C) were aligned, and neighbor-joining phylogenetic trees were generated. Percentage sequence identities to reference allergens (bold) are encoded by gray shades.

Sequences of allergens from the tropomyosin family (Fig 2, B) were well conserved, even beyond phylum boundaries, with identities of at least 50%. Members of the plant profilin family showed even higher sequence identities of greater than 70% (data not shown).

The EF-hand superfamily contained 2 important families of allergens, β-parvalbumins (major fish allergens) and polcalcins (ubiquitous pollen allergens; Fig 2, C). β-Parvalbumins from fish were well conserved, with at least 53% sequence identity between homologues from unrelated fish species. Polcalcins from different plant families showed sequence identities of at least 67%. They were related to a group of pollen allergens with 4 instead of 2 EF-hand domains that showed 24% to 57% sequence identity with polcalcins (Fig 2, C). Polcalcins and parvalbumins showed only low degrees of sequence identity between 10% and 27%.

Back to Article Outline

Discussion 

The identification of a large number of allergens from diverse sources has triggered the search for common properties of allergens. The discovery of such features would be a step toward the prediction of allergenicity from protein sequence, structure, or function, a procedure that is essential for risk assessment of novel foods. Knowledge of features that make proteins allergenic would also shed light on the mechanism of the initiation of an allergic immune response, thus paving the way for novel therapeutic concepts.

There is an ongoing discussion on whether common properties of allergens exist. One view claims that any protein that comes into contact with the immune system of an atopic individual in sufficient amounts and in the appropriate context can elicit an allergic immune response.15 Two results established in our study, however, support the view that allergens possess special features and not every protein can become allergenic: (1) the small number of protein families in which allergens were found and (2) the frequent occurrence of certain biochemical functions among allergens.

Allergens were found in only 2% of all sequence-based and 5% of all structural protein families (Table I, Table II). A restricted distribution of allergens was previously found for food allergens from plants8 and animals10 and for pollen allergens9 and is extended in this study to all allergens, irrespective of their source and route of exposure. The number of Pfam families that contain allergens was significantly smaller than the number of protein families found in a random sample of sequences with the same size and taxonomic distribution as the set of allergen sequences. In addition, the protein family distribution of allergens was highly different from the distribution of randomly selected sequences (Fig 1). Similar differences have been observed when comparing the protein family distributions of plant food and pollen allergens with the proteomes of Arabidopsis species and rice8 and with all seed plant proteins in the UniProt database.9

Biochemical functions of allergens showed a bias toward certain classes, such as hydrolysis of proteins, polysaccharides, and lipids; binding of metal ions and lipids; transport; storage; and cytoskeleton association (Table III). About one fourth of the allergen sequences contained no GO annotation. However, most of these sequences were either too short to allow a protein family assignment or they belonged to protein families that can be assigned biochemical functions related to the functions of annotated allergens: tropomyosins are, like profilins, actin-binding proteins16; group 2 mite allergens are thought to bind lipids17; and pectate lyases18 and some thaumatin-like proteins with β1,3-glucanase activity19 are, like many other allergens, involved in the degradation of polysaccharides.

A possible connection between biochemical function and allergenicity is best understood in the case of proteases. The major house dust mite allergen Der p 1, a cysteine protease, was shown to cleave the tight junction protein occludin, thus increasing epithelial permeability and facilitating its entry into the tissue.20 Furthermore, several studies demonstrated that Der p 1 acts directly on cells of the human immune system by cleaving cell-surface proteins, such as CD23, CD25, CD40, and dendritic cell–specific intercellular adhesion molecule–grabbing nonintegrin.21 The link between other biochemical functions and allergenicity is less clear. Interestingly, many families of allergens are involved in defense against pathogens and predators, such as several groups of plant pathogenesis–related proteins,6 cereal bifunctional inhibitors,6 and enzymes from insect venoms.22

The assumption that allergens do not possess special features that render them allergenic was based on the observation that allergens fold into highly diverse structures and no “allergenic” folds could be detected.23 With the much larger number of structures of allergens and allergen homologues available today, we showed that the structural repertoire of allergens was restricted to only 5% of all structural families, but most of these families were grouped into different superfamilies and folds (Table II). Thus most folds and superfamilies contained either no or only a single allergen family. These data argue against the hypothesis that it is a single structure that makes a protein allergenic. In contrast, common structural features have been established for food allergens that sensitize through the gastrointestinal tract.24 These features, such as high numbers of disulfide bonds, repetitive structures, binding of lipid or metal ions, and formation of stable oligomers, confer stability toward heat, acid, and proteolysis. However, these general features cannot be traced back to certain folds, making this observation compatible with the lack of specific allergenic folds.

The distribution of allergens into protein families and functional classes presented in this study does not represent a definitive data set because probably many new allergen families still remain to be discovered. The number of different Pfam domains listed in AllFam grew from 179 to 184 between February and June 2007. However, all newly added families contained minor allergens (data not shown). Thus the assumption that most major allergens of all important sources are already identified is justified, as exemplified by grass pollen allergens, in which a combination of only 5 allergens is sufficient to detect nearly all sera that show IgE binding to a total grass pollen extract.25 Another bias is introduced by the fact that new members of families of cross-reactive allergens, such as profilins and tropomyosins, are easily identified, which explains the high rank of these families in the AllFam family list. To circumvent this bias, a database could be used that exclusively contains true-sensitizing allergens. Such a database does not exist and will be difficult to establish, because for many allergens, their sensitizing potency is still unknown. Apart from these concerns, changing the rank of some allergen families by removing nonsensitizing cross-reactive allergens will not derogate the main conclusion of this study (ie, the narrow distribution of allergens with respect to protein family membership and biochemical function).

The AllFam database can be used to test hypotheses on factors that determine allergenicity by comparing allergen-containing protein families with respect to the features in question. For instance, it was proposed that allergens are proteins that lack bacterial homologues.26 This hypothesis was based on database similarity searches using a sample of only 30 allergen sequences from an even smaller number of protein families. In contrast, an overview of the most important protein families of all allergens (Fig 1) shows that members of many of these families are found in bacteria, such as EF-hand proteins (fourth rank), cupins (fifth rank), PR-1 proteins (ninth rank), subtilisin-like serine proteases (tenth rank), and trypsin-like serine proteases (eleventh rank).

Sequence comparison of allergenic members of the 3 most important protein families of allergens showed a wide range of the degree of sequence conservation (Fig 2). Allergenic tropomyosins from invertebrates and profilins from higher plants show sequence identities between homologues from unrelated species of more than 50%. This sequence conservation is reflected by the high extent of IgE cross-reactivity observed within these families.16, 27 On the other end of the spectrum are the 2S albumins, important food allergens from legumes, nuts, and other seeds. 2S albumins from different plant families show sequence identities of less than 40%. Cross-reactivity was thought to be low or even absent between 2S albumins from different plant orders. Recently, considerable cross-reactivity between Ara h 2 from peanut and yet unidentified allergens from almond and Brazil nut was demonstrated.28 Furthermore, high sequence similarity between linear IgE epitopes, despite low global sequence similarity of 2S albumins from cashew and walnut, was shown.29 In a previous analysis of cross-reactivity and sequence similarity among homologous pollen allergens, we proposed sequence similarity as a suitable parameter for assessing potential IgE cross-reactivity.9 The situation seems to be different for food allergens, which come into contact with the human immune system after partial denaturation in the digestive tract, leading to significant IgE binding to linear epitopes. Thus global sequence similarity seems not to be suitable to predict cross-reactivity among these allergens.

In summary, we introduce here the AllFam database, an Internet resource for classifying allergens into protein families. Analysis of allergen families confirmed that allergens are distributed among a small number of protein families and possess a limited range of biologic functions. The answer to the question of what makes a protein an allergen will require additional both in silico and wet lab research, such as extended comparisons of whole proteomes and allergomes of important allergen sources with respect to expression levels, stability, biochemical functions, and protein family memberships.

Clinical implications

The classification of allergens supports the elucidation of factors that make proteins allergenic, thus possibly paving the way for novel therapeutic concepts.

Back to Article Outline

Table E1. 

AllFam families and associated Pfam families
AllFam IDAllFam namePfam IDPfam name
AF001Helix-loop-helix DNA-binding domainPF00010Helix-loop-helix DNA-binding domain
AF002Heat shock proteins Hsp70PF00012Hsp70 protein
AF003Animal Kunitz serine protease inhibitorsPF00014Kunitz/Bovine pancreatic trypsin inhibitor domain
AF004Eukaryotic aspartyl proteasesPF00026Eukaryotic aspartyl protease
PF07966A1 propeptide
AF005CystatinsPF00031Cystatin domain
AF006Cytochromes cPF00034Cytochrome c
AF007EF-hand domainPF00036EF-hand
PF01023S-100/ICaBP type calcium binding domain
AF008Intermediate filament proteinsPF00038Intermediate filament protein
AF009GlobinsPF00042Globin
AF010Glutathione S-transferases, C-terminal domainPF00043Glutathione-S-transferase, C-terminal domain
AF011Eukaryotic elongation factors 1PF00736EF-1 guanine nucleotide exchange domain
AF012Insulin familyPF00049Insulin/IGF/relaxin family
AF013Kazal-type serine protease inhibitorsPF00050Kazal-type serine protease inhibitor domain
PF07648Kazal-type serine protease inhibitor domain
AF014Lactate/malate dehydrogenasesPF00056Lactate/malate dehydrogenase, NAD binding domain
PF02866Lactate/malate dehydrogenase, α/β C-terminal domain
AF015LipocalinsPF00061Lipocalin/cytosolic fatty acid–binding protein family
PF08212Lipocalin-like domain
AF016C-type lysozyme/α-lactalbumin familyPF00062C-type lysozyme/α-lactalbumin family
AF017Protein kinasesPF00069Protein kinase domain
AF018Serpin serine protease inhibitorsPF00079Serpin (serine protease inhibitor)
AF019Cu/Zn superoxide dismutasesPF00080Copper/zinc superoxide dismutase (SODC)
AF020Fe/Mn superoxide dismutasesPF00081Iron/manganese superoxide dismutases, α-hairpin domain
PF02777Iron/manganese superoxide dismutases, C-terminal domain
AF021Subtilisin-like serine proteasesPF00082Subtilase family
PF02225PA domain
PF05922Subtilisin N-terminal Region
AF022Glutathione-S-transferases, N-terminal domainPF02798Glutathione-S-transferase, N-terminal domain
AF023ThioredoxinsPF00085Thioredoxin
AF024Trypsin-like serine proteasesPF00051Kringle domain
PF00089Trypsin
PF00431CUB domain
PF00594Vitamin K-dependent carboxylation/γ-carboxyglutamic (GLA) domain
PF02983α-Lytic protease prodomain
PF09396Thrombin light chain
AF025Tubulin/FtsZ familyPF00091Tubulin/FtsZ family, GTPase domain
PF03953Tubulin/FtsZ family, C-terminal domain
AF027Trypsin inhibitor–like domainPF00093von Willebrand factor type C domain
PF00094von Willebrand factor type D domain
PF01826Trypsin Inhibitor like cysteine rich domain
PF08742C8 domain
AF028Short-chain dehydrogenasesPF00106short chain dehydrogenase
AF029Zn-containing dehydrogenasesPF00107Zinc-binding dehydrogenase
PF08240Alcohol dehydrogenase GroES-like domain
AF030Papain-like cysteine proteasesPF00112Papain family cysteine protease
PF08246Cathepsin propeptide inhibitor domain (I29)
AF031EnolasesPF00113Enolase, C-terminal TIM barrel domain
PF03952Enolase, N-terminal domain
AF032Triosephosphate isomerasesPF00121Triosephosphate isomerase
AF033α-AmylasesPF00128α-Amylase, catalytic domain
PF02806α-Amylase, C-terminal all-beta domain
PF07821α-Amylase C-terminal beta-sheet domain
PF09260Domain of unknown function (DUF1966)
AF034Legume lectinsPF00139Legume lectin domain
AF035PeroxidasesPF00141Peroxidase
AF036Calcineurin-like phosphoesterasesPF00149Calcineurin-like phosphoesterase
PF028725′-nucleotidase, C-terminal domain
AF037LipasesPF00151Lipase
PF01477PLAT/LH2 domain
AF038CyclophilinsPF00160Cyclophilin type peptidyl-prolyl cis-trans isomerase/CLD
AF039Ribosome inactivating proteinsPF00161Ribosome-inactivating protein
PF00652Ricin-type β-trefoil lectin domain
AF040Aldehyde dehydrogenasesPF00171Aldehyde dehydrogenase family
AF041Class I chitinasesPF00182Chitinase class I
AF042Heat shock proteins Hsp90PF00183Hsp90 protein
PF02518Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase
AF043Hevein-like domainPF00187Chitin recognition protein
AF044PR-1 proteinsPF00188SCP-like extracellular protein
AF045Cupin superfamilyPF00190Cupin
PF04702Vicilin N terminal region
AF046Kunitz soybean trypsin inhibitor familyPF00197Trypsin and protease inhibitor
AF047CatalasesPF00199Catalase
PF06628Catalase-related
AF048ATP synthasesPF00213ATP synthase δ (OSCP) subunit
AF049ATP:guanido phosphotransferasesPF00217ATP:guanido phosphotransferase, C-terminal catalytic domain
PF02807ATP:guanido phosphotransferase, N-terminal domain
AF050Prolamin superfamilyPF00234Protease inhibitor/seed storage/LTP family
AF051ProfilinsPF00235Profilin
AF052Glycoside hydrolase family 32PF00251Glycosyl hydrolases family 32 N terminal
PF08244Glycosyl hydrolases family 32 C terminal
AF053FlavodoxinsPF00258Flavodoxin
AF054TropomyosinsPF00261Tropomyosin
AF055Calreticulin familyPF00262Calreticulin family
AF056Serum albuminsPF00273Serum albumin family
AF057PolygalacturonasesPF00295Glycosyl hydrolases family 28
AF058Ribosomal proteins L3PF00297Ribosomal protein L3
AF059γ-thionin familyPF00304γ-Thionin family
AF060Thaumatin-like proteinsPF00314Thaumatin family
AF061Prolyl oligopeptidase familyPF00326Prolyl oligopeptidase family
AF062Histidine acid phosphatasesPF00328Histidine acid phosphatase
AF063β-1,3-GlucanasesPF00332Glycosyl hydrolases family 17
AF064X8 domainPF07983X8 domain
AF065CaseinsPF00363Casein
AF066HaemocyaninsPF00372Hemocyanin, copper containing domain
PF03723Hemocyanin, ig-like domain
AF067Multicopper oxidasesPF00394Multicopper oxidase
PF07731Multicopper oxidase
PF07732Multicopper oxidase
AF068TransferrinsPF00405Transferrin
AF069Bet v 1–related proteinsPF00407Pathogenesis-related protein Bet v I family
AF07060S acidic ribosomal proteinsPF0042860s Acidic ribosomal protein
AF071XylanasesPF00457Glycosyl hydrolases family 11
AF072Chlorophyll binding proteinsPF00504Chlorophyll A-B binding protein
AF073Pectate lyasesPF00544Pectate lyase
AF074Gelsolin familyPF00626Gelsolin repeat
PF02209Villin headpiece domain
AF075SGNH-hydrolase familyPF00657GDSL-like lipase/acylhydrolase
AF076Glycoside hydrolase family 15PF00686Starch-binding domain
PF00723Glycosyl hydrolases family 15
AF077Glycoside hydrolase family 18PF00704Glycosyl hydrolases family 18
AF078Chitin-binding peritrophin-A domainPF01607Chitin-binding peritrophin-A domain
AF079Glycoside hydrolase family 16PF00722Glycosyl hydrolases family 16
AF080Glycoside hydrolase family 20PF00728Glycosyl hydrolase family 20, catalytic domain
AF081GMC oxidoreductasesPF00732GMC oxidoreductase
PF05199GMC oxidoreductase
AF082Glyoxalase superfamilyPF00903Glyoxalase/bleomycin resistance protein/dioxygenase superfamily
AF083Glycoside hydrolase family 3PF00933Glycosyl hydrolase family 3 N terminal domain
PF01915Glycosyl hydrolase family 3 C terminal domain
AF084Barwin familyPF00967Barwin family
AF085κ-CaseinsPF00997κ-Casein
AF086Staphylococcal/streptococcal toxinsPF01123Staphylococcal/streptococcal toxin, OB-fold domain
PF02876Staphylococcal/Streptococcal toxin, β-grasp domain
AF087Ole e 1–related proteinsPF01190Pollen proteins Ole e I family
AF088Casein kinase 2 regulatory subunitPF01214Casein kinase II regulatory subunit
AF089Lipid-binding serum glycoproteinsPF01273LBP/BPI/CETP family, N-terminal domain
AF090OleosinsPF01277Oleosin
AF091Diphtheria toxinsPF01324Diphtheria toxin, R domain
PF02763Diphtheria toxin, C domain
PF02764Diphtheria toxin, T domain
AF092LipoproteinsPF01347Lipoprotein amino terminal region
AF093Expansins, C-terminal domainPF01357Pollen allergen
AF094Expansins, N-terminal domainPF03330Rare lipoprotein A (RlpA)-like double-psi β-barrel
AF095MelittinsPF01372Melittin
AF096β-AmylasesPF01373Glycosyl hydrolase family 14
AF097CollagensPF01391Collagen triple helix repeat (20 copies)
PF01410Fibrillar collagen C-terminal domain
AF098Pheromone and odorant binding proteinsPF01395PBP/GOBP family
AF099Berberine bridge enzymesPF01565FAD-binding domain
PF08031Berberine and berberine like
AF100Myosin tailPF01576Myosin tail
AF101Flavin containing amine oxidoreductasesPF01593Flavin containing amine oxidoreductase
AF102Group 5/6 grass pollen allergensPF01620Ribonuclease (pollen allergen)
AF103HyaluronidasesPF01630Hyaluronidase
AF104Patatin familyPF01734Patatin-like phospholipase
AF105Clostridial neurotoxinsPF01742Clostridial neurotoxin zinc protease
PF07951Clostridium neurotoxin, C-terminal receptor binding
PF07952Clostridium neurotoxin, translocation domain
PF07953Clostridium neurotoxin, N-terminal receptor binding
AF106Class 3 lipasesPF01764Lipase (class 3)
PF03893Lipase 3 N-terminal region
AF107NAC domainPF00627UBA/TS-N domain
PF01849NAC domain
AF108DJ-1/PfpI familyPF01965DJ-1/PfpI family
AF109Fungalysin metalloproteasesPF02128Fungalysin metallopeptidase (M36)
PF07504Fungalysin/thermolysin propeptide motif
AF110Nuclear transport factor 2PF02136Nuclear transport factor 2 (NTF2) domain
AF111Group 2 mite allergensPF02221ML domain
AF112Plastocyanin-like proteinsPF02298Plastocyanin-like domain
AF113Ribonucleases N1 and T1PF00545ribonuclease
AF114Animal haem peroxidasesPF03098Animal haem peroxidase
AF115High-molecular-weight gluteninsPF03157High-molecular-weight glutenin subunit
AF116SART-1 familyPF03343SART-1 family
AF117Endonuclease/exonuclease/phosphatase familyPF03372Endonuclease/exonuclease/phosphatase family
AF118Group 5 ragweed allergensPF03913Amb V allergen
AF119Triabin familyPF03973Triabin
AF120Plant invertase/pectin methylesterase inhibitorsPF04043Plant invertase/pectin methylesterase inhibitor
AF121BCL7 familyPF04714BCL7, N-terminal conserver region
AF122AlliinasesPF04863Alliinase EGF-like domain
PF04864Allinase
AF123Isoflavone reductase familyPF05368NmrA-like family
AF124Apovitellenin IPF05418Apovitellenin I (Apo-VLDL-II)
AF125Rubber elongation factor familyPF05755Rubber elongation factor protein (REF)
AF126Phospholipases A2PF05826Phospholipase A2
AF127Group 1 cockroach allergensPF06757Insect allergen-related repeat
AF128Proteins of unknown function (DUF1397)PF07165Protein of unknown function (DUF1397)
AF129Cerato-plataninsPF07249Cerato-platanin
AF130Apolipophorin IIIPF07464Apolipophorin-III precursor (apoLp-III)
AF131RedoxinsPF08534Redoxin
AF132Fibrinogen α-chainPF08702Fibrinogen α-chain
AF133Alginate lyasesPF08787Alginate lyase
AF134Fel d 1 familyPF09252Allergen Fel d I-B chain
AF135Ole e 6 familyPF09253Pollen allergen ole e 6

Back to Article Outline

Table E2. 

Pfam clans that contain more than 1 family of allergens
Pfam IDProtein family nameAllergens
TIM barrel glycosyl hydrolase superfamily (CL0058)
PF00704Glycoside hydrolase family 186
PF00128α-Amylases6
PF00332β1,3-Glucanases6
PF01630Hyaluronidases5
PF00728Glycoside hydrolase family 201
PF00933Glycoside hydrolase family 31
PF01373β-Amylases1
Total26
Thioredoxin-like (CL0172)
PF00085Thioredoxins11
PF08534Redoxins6
PF02798Glutathione-S-transferases, N-terminal domain6
Total23
Calycin superfamily (CL0116)
PF08212, PF08212Lipocalins21
PF03973Triabin family1
Total22
Pectate lyase-like β-helix (CL0268)
PF00544Pectate lyases9
PF00295Polygalacturonases8
Total17
Double Psi β-barrel glucanase (CL0199)
PF03330Expansins, N-terminal domain12
PF07249Cerato-platanins2
PF00967Barwin family1
Total15
β-Trefoil superfamily (CL0066)
PF00652Ribosome-inactivating proteins8
PF00197Kunitz soybean trypsin inhibitor family4
PF07951Clostridial neurotoxins1
Total13
α/β hydrolase fold (CL0028)
PF00151Lipases8
PF00326Prolyl oligopeptidase family3
PF01764Class 3 lipases1
Total12
Lysozyme-like superfamily (CL0037)
PF00182Class I chitinases6
PF00062C-type lysozyme/α-lactalbumin family4
Total10
FAD/NAD(P)-binding Rossmann fold superfamily (CL0063)
PF05368Isoflavone reductase family3
PF00106Short-chain dehydrogenases2
PF00056Lactate/malate dehydrogenases1
PF00107Zn-containing dehydrogenases1
PF00732GMC oxidoreductases1
PF01593Flavin containing amine oxidoreductases1
Total9
Concanavalin-like lectin/glucanase superfamily (CL0004)
PF00139Legume lectins3
PF00722Glycoside hydrolase family 162
PF00457Xylanases1
Total6
Multicopper oxidase-like domain (CL0026)
PF02298Plastocyanin-like proteins1
PF00394, PF07731, F07732Multicopper oxidases1
Total2
Peptidase clan MA (CL0126)
PF01742Clostridial neurotoxins1
PF02128Fungalysin metalloproteases1
Total2

Back to Article Outline

Table E3. 

SCOP folds that contain more than 1 family of allergens
SCOP classification
ClassFoldSuperfamilyFamilyAllFam family
a: All α proteinsa.39: EF Hand-likea.39.1: EF-handa.39.1.2: S100 proteinsAF007: EF-hand domain
a.39.1.4: Parvalbumin
a.39.1.5: Calmodulin-like
a.39.1.10: Polcalcin
a.39.2: Insect pheromone/odorant-binding proteinsa.39.2.1: Insect pheromone/odorant-binding proteinsAF098: Pheromone and odorant binding proteins
a.52: Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumina.52.1: Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumina.52.1.1: Plant lipid-transfer and hydrophobic proteinsAF050: Prolamin superfamily
a.52.1.2: Proteinase/α-amylase inhibitors
a.52.1.3: Seed storage protein, 2S albumin
a.93: Heme-dependent peroxidasesa.93.1: Heme-dependent peroxidasesa.93.1.1: CCP-likeAF035: Peroxidases
a.93.1.2: Myeloperoxidase-likeAF114: Animal haem peroxidases
b: All β proteinsb.1: Immunoglobulin-like β-sandwichb.1.8: Cu, Zn superoxide dismutase-likeb.1.8.1: Cu, Zn superoxide dismutase-likeAF019: Cu/Zn Superoxide dismutases
b.1.18: E set domainsb.1.18.7: ML domainAF111: Group 2 mite allergens
b.1.18.3: Arthropod hemocyanin, C-terminal domainAF066: Haemocyanins (C-terminal domain)
b.6: Cupredoxin-likeb.6.1: Cupredoxinsb.6.1.1: Plastocyanin/azurin-likeAF112: Plastocyanin-like proteins
b.6.1.3: Multidomain cupredoxinsAF067: Multicopper oxidases
b.29: Concanavalin A-like lectins/glucanasesb.29.1: Concanavalin A-like lectins/glucanasesb.29.1.1: Legume lectinsAF034: Legume lectins
b.29.1.2: Glycosyl hydrolases family 16AF079: Glycoside hydrolase family 16
b.29.1.6: Clostridium neurotoxins, the second last domainAF105: Clostridial neurotoxins (1st receptor binding domain)
b.29.1.11: Xylanase/endoglucanase 11/12AF071: Xylanases
b.29.1.12: Calnexin/calreticulinAF055: Calreticulin family (N-terminal domain)
b.29.1.18: Alginate lyaseAF133: Alginate lyases
b.29.1.19: Glycosyl hydrolases family 32 C-terminal domainAF052: Glycoside hydrolase family 32 (C-terminal domain)
b.42: β-Trefoilb.42.2: Ricin B-like lectinsb.42.2.1: Ricin B-likeAF039: Ribosome-inactivating proteins (β-trefoil domain)
b.42.4: STI-likeb.42.4.1: Kunitz (STI) inhibitorsAF046: Kunitz soybean trypsin inhibitor family
b.42.4.2: Clostridium neurotoxins, C-terminal domainAF105: Clostridial neurotoxins (2nd receptor binding domain)
b.52: Double psi β-barrelb.52.1: Barwin-like endoglucanasesb.52.1.2: BarwinAF084: Barwin family
b.52.1.3: Pollen allergen PHL P 1 N-terminal domainAF094: Expansins, N-terminal domain
b.60: Lipocalinsb.60.1: Lipocalinsb.60.1.1: Retinol binding protein-likeAF015: Lipocalins
b.60.1.3: Thrombin inhibitorAF119: Triabin family
b.80: Single-stranded right-handed β-helixb.80.1: Pectin lyase-likeb.80.1.1: Pectate lyase-likeAF073: Pectate lyases
b.80.1.3: GalacturonaseAF057: Polygalacturonases
c: α and β proteins (a/b)c.1: TIM β/α -barrelc.1.1: Triosephosphate isomerase (TIM)c.1.1.1: Triosephosphate isomerase (TIM)AF032: Triosephosphate isomerases
c.1.8: (Trans)glycosidasesc.1.8.1: Amylase, catalytic domainAF033: α-Amylases (catalytic domain)
AF096: β-Amylases
c.1.8.3: β-glycanasesAF063: β-1,3-Glucanases
c.1.8.5: Type II chitinaseAF077: Glycoside hydrolase family 18
c.1.8.6: β-N-acetylhexosaminidase catalytic domainAF080: Glycoside hydrolase family 20
c.1.8.7: NagZ-likeAF083: Glycoside hydrolase family 3 (N-terminal domain)
c.1.8.9: Bee venom hyaluronidaseAF103: Hyaluronidases
c.1.11: Enolase C-terminal domain-likec.1.11.1: EnolaseAF031: Enolases (C-terminal domain)
c.2: NAD(P)-binding Rossmann-fold domainsc.2.1: NAD(P)-binding Rossmann-fold domainsc.2.1.1: Alcohol dehydrogenase-like, C-terminal domainAF029: Zn-containing dehydrogenases (C-terminal domain)
c.2.1.2: Tyrosine-dependent oxidoreductasesAF123: Isoflavone reductase family
AF028: Short-chain dehydrogenases
c.2.1.5: LDH N-terminal domain-likeAF014: Lactate/malate dehydrogenases (N-terminal domain)
c.3: FAD/NAD(P)-binding domainc.3.1: FAD/NAD(P)-binding domainc.3.1.2: FAD-linked reductases, N-terminal domainAF101: Flavin containing amine oxidoreductases (N-terminal domain)
AF081: GMC oxidoreductases (N-terminal domain)
c.23: Flavodoxin-likec.23.5: Flavoproteinsc.23.5.1: Flavodoxin-relatedAF053: Flavodoxins
c.23.16: Class I glutamine amidotransferase-likec.23.16.2: DJ-1/PfpIAF108: DJ-1/PfpI family
c.23.11: β-D-glucan exohydrolase, C-terminal domainc.23.11.1: β-D-glucan exohydrolase, C-terminal domainAF083: Glycoside hydrolase family 3 (C-terminal domain)
c.47: Thioredoxin foldc.47.1: Thioredoxin-likec.47.1.1: ThioltransferaseAF023: Thioredoxins
c.47.1.5: Glutathione S-transferase (GST), N-terminal domainAF022: Glutathione S-transferases, N-terminal domain
c.47.1.10: Glutathione peroxidase-likeAF131: Redoxins
c.69: α/β-Hydrolasesc.69.1: α/β-Hydrolasesc.69.1.4: Prolyl oligopeptidase, C-terminal domainAF061: Prolyl oligopeptidase family (C-terminal domain)
c.69.1.17: Fungal lipasesAF106: Class 3 lipases
c.69.1.19: Pancreatic lipase, N-terminal domainAF037: Lipases
d: α and β proteins (a+b)d.2: Lysozyme-liked.2.1: Lysozyme-liked.2.1.1: Family 19 glycosidaseAF041: Class I chitinases
d.2.1.2: C-type lysozymeAF016: C-type lysozyme/α-lactalbumin family
d.16: FAD-linked reductases, C-terminal domaind.16.1: FAD-linked reductases, C-terminal domaind.16.1.1: GMC oxidoreductasesAF081: GMC oxidoreductases (C-terminal domain)
d.16.1.5: L-amino acid/polyamine oxidaseAF101: Flavin containing amine oxidoreductases (C-terminal domain)
d.17: Cystatin-liked.17.1: Cystatin/monellind.17.1.2: CystatinsAF005: Cystatins
d.17.4: NTF2-liked.17.4.2: NTF2-likeAF110: Nuclear transport factor 2
g: Small proteinsg.3: Knottins (small inhibitors, toxins, lectins)g.3.1: Plant lectins/antimicrobial peptidesg.3.1.1: Hevein-like agglutinin (lectin) domainAF043: Hevein-like domain
g.3.7: Scorpion toxin-likeg.3.7.5: Plant defensinsAF059: γ-thionin family
h: Coiled coil proteinsh.1: Parallel coiled-coilh.1.5: Tropomyosinh.1.5.1: TropomyosinAF054: Tropomyosins
h.1.8: Fibrinogen coiled-coil and central regionsh.1.8.1: Fibrinogen coiled-coil and central regionsAF132: Fibrinogen α-chain
h.1.20: Intermediate filament protein, coiled coil regionh.1.20.1: Intermediate filament protein, coiled coil regionAF008: Intermediate filament proteins

Back to Article Outline

References 

  1. Chapman MD, Pomes A, Breiteneder H, Ferreira F. Nomenclature and structural biology of allergens. J Allergy Clin Immunol. 2007;119:414–420
  2. Mari A, Scala E, Palazzo P, Ridolfi S, Zennaro D, Carabella G. Bioinformatics applied to allergy: allergen databases, from collecting sequence information to data integration. The Allergome platform as a model. Cell Immunol. 2006;244:97–100
  3. Radauer C, Breiteneder H. Evolutionary biology of plant food allergens. J Allergy Clin Immunol. 2007;120:518–525
  4. Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251
  5. Stadler MB, Stadler BM. Allergenicity prediction by protein sequence. FASEB J. 2003;17:1141–1143
  6. Breiteneder H, Radauer C. A classification of plant food allergens. J Allergy Clin Immunol. 2004;113:821–830
  7. Ferreira F, Hawranek T, Gruber P, Wopfner N, Mari A. Allergic cross-reactivity: from gene to the clinic. Allergy. 2004;59:243–267
  8. Jenkins JA, Griffiths-Jones S, Shewry PR, Breiteneder H, Mills EN. Structural relatedness of plant food allergens with specific reference to cross-reactive allergens: an in silico analysis. J Allergy Clin Immunol. 2005;115:163–170
  9. Radauer C, Breiteneder H. Pollen allergens are restricted to few protein families and show distinct patterns of species distribution. J Allergy Clin Immunol. 2006;117:141–147
  10. Jenkins JA, Breiteneder H, Mills EN. Evolutionary distance from human homologs reflects allergenicity of animal food proteins. J Allergy Clin Immunol. 2007;120:1399–1405
  11. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:D226–D229
  12. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882
  13. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996;12:357–358
  14. Markovic-Housley Z, Degano M, Lamba D, von Roepenack-Lahaye E, Clemens S, Susani M, et al. Crystal structure of a hypoallergenic isoform of the major birch pollen allergen Bet v 1 and its likely biological function as a plant steroid carrier. J Mol Biol. 2003;325:123–133
  15. Aalberse RC. Structural features of allergenic molecules. Chem Immunol Allergy. 2006;91:134–146
  16. Jeong KY, Hong CS, Yong TS. Allergenic tropomyosins and their cross-reactivities. Protein Pept Lett. 2006;13:835–845
  17. Keber MM, Gradisar H, Jerala R. MD-2 and Der p 2—a tale of two cousins or distant relatives?. J Endotoxin Res. 2005;11:186–192
  18. Marin-Rodriguez MC, Orchard J, Seymour GB. Pectate lyases, cell wall degradation and fruit softening. J Exp Bot. 2002;53:2115–2119
  19. Menu-Bouaouiche L, Vriet C, Peumans WJ, Barre A, Van Damme EJ, Rouge P. A molecular basis for the endo-beta 1,3-glucanase activity of the thaumatin-like proteins from edible fruits. Biochimie. 2003;85:123–131
  20. Wan H, Winton HL, Soeller C, Tovey ER, Gruenert DC, Thompson PJ, et al. Der p 1 facilitates transepithelial allergen delivery by disruption of tight junctions. J Clin Invest. 1999;104:123–133
  21. Furmonaviciene R, Ghaemmaghami AM, Boyd SE, Jones NS, Bailey K, Willis AC, et al. The protease allergen Der p 1 cleaves cell surface DC-SIGN and DC-SIGNR: experimental analysis of in silico substrate identification and implications in allergic responses. Clin Exp Allergy. 2007;37:231–242
  22. King TP, Spangfort MD. Structure and biology of stinging insect venom allergens. Int Arch Allergy Immunol. 2000;123:99–106
  23. Aalberse RC. Structural biology of allergens. J Allergy Clin Immunol. 2000;106:228–238
  24. Breiteneder H, Mills EN. Molecular properties of food allergens. J Allergy Clin Immunol. 2005;115:14–23
  25. Andersson K, Lidholm J. Characteristics and immunobiology of grass pollen allergens. Int Arch Allergy Immunol. 2003;130:87–107
  26. Emanuelsson C, Spangfort MD. Allergens as eukaryotic proteins lacking bacterial homologues. Mol Immunol. 2007;44:3256–3260
  27. Radauer C, Willerroider M, Fuchs H, Hoffmann-Sommergruber K, Thalhamer J, Ferreira F, et al. Cross-reactive and species-specific immunoglobulin E epitopes of plant profilins: an experimental and structure-based analysis. Clin Exp Allergy. 2006;36:920–929
  28. de Leon MP, Drew AC, Glaspole IN, Suphioglu C, O'Hehir RE, Rolland JM. IgE cross-reactivity between the major peanut allergen Ara h 2 and tree nut allergens. Mol Immunol. 2007;44:463–471
  29. Robotham JM, Wang F, Seamon V, Teuber SS, Sathe SK, Sampson HA, et al. Ana o 3, an important cashew nut (Anacardium occidentale L.) allergen of the 2S albumin family. J Allergy Clin Immunol. 2005;115:1284–1290

 Supported by grant SFB-F01802 from the Austrian Science Fund (to H.B.) and an Austrian Programme for Advanced Research and Technology grant from the Austrian Academy of Sciences (to S.W.).

 Disclosure of potential conflict of interest: A. Mari is the responsible administrative contact for Allergy Data Laboratories. The rest of the authors have declared that they have no conflict of interest.

PII: S0091-6749(08)00163-2

doi:10.1016/j.jaci.2008.01.025

The Journal of Allergy and Clinical Immunology
Volume 121, Issue 4 , Pages 847-852.e7, April 2008