Epitope-based vaccines are short, antigen-derived peptides (corresponding to T-cell epitopes) that are administrated to be presented to T-cells in association with major histocompatibility complex (MHC) proteins. Peptide vaccination based on multiple T-cell epitopes can be used for the rational design of immunogens targeting well defined ethnic populations, while offering several potential benefits over traditional vaccines, such as precise control over the immune response activation, focusing on most relevant antigen regions (conserved and immunodominant), as well as production and biosafety advantages. CD4+ T-cell epitopes play a key role in epitope-based vaccine (EV) design, as the cognate help provided by these cells is essential for the generation of vigorous humoral and cytotoxic CD8+ T-cell responses. However, because the response to T-cell epitopes is restricted by HLA proteins, the HLA specificity of T-cell epitopes becomes a major consideration for EV design, as the aim is to induce broad immune responses in genetically diverse human populations. Two factors cause major problems in EV design; i) MHC class II alleles are expressed at dramatically different frequencies in different ethnicities and ii) MHC class II genes are the most polymorphic in the human genome (See Figure 1). As different allotypes (proteins) have different epitope repertoires (restriction), individuals are likely to react with a different set of peptides from a given pathogen. Because experimental screening of large sets of peptides is time-consuming and costly, in silico methods capable of handling MHC class II polymorphism and enabling CD4+ T-cell epitope mapping on protein antigens are of paramount importance to enable EV development.



Figure 1. HLA class II polymorphism and three-dimensional structure. (A) Ribbon diagram
of the binding groove. Chain α, grey; chain β, red. Each chain consists of two domains
(α1, α2 and β1, β2, respectively). Only the ectodomains α1 and β1 shape the binding groove.
(B), as (A), but showing the entire molecule in a different orientation. (C) Three-dimensional representation of the
sequence variability in the binding groove highlighting the location of SDRs in the β1 domain
Variability is represented in a two-colour scale from blue to red, where blue indicates
non-variable positions and red indicates variable positions.

T-cell Epitope Mapping

Predivac is based on the specificity-determining residue (SDR) concept, which is a pan-specific method that covers 95% of human MHC (HLA) class II protein variants (DR locus) (Oyarzun et al., 2013). SDRs are a small set of structurally conserved positions in the peptide-binding interaction interface that are responsible for specific recognition events. The approach was first described by our group for substrate specificity prediction of protein kinases (Ellis and Kobe, 2011; Kobe and Boden, 2012; Saunders, et al., 2008; Saunders and Kobe, 2008). Predivac predicts HLA class II peptide binding by establishing a correlation between the SDRs in the HLA class II query protein and the SDRs associated with HLA proteins of known specificity. The process involves the following steps: i) SDRs for each binding position are identified in the query HLA class II protein sequence; ii) PredivacDB, a purposed-built database of SDRs and high affinity binding data (IC50 ≤ 50 nM), is queried and amino acid frequencies and weights are calculated for peptide sequences associated with allotypes sharing similar SDRs as the query protein at each binding position; and iii) a position-specific scoring matrix (PSSM) is built based on the binding data. T-cell epitope mapping is carried out by parsing query protein sequences into overlapping nonameric segments (peptides), each of which is assigned a binding score using the PSSM (sliding window technique). The outcome of Predivac is a relative score between 0 and 100. For a given HLA class II allotype (protein), a peptide is a "better" candidate to be a MHC class II high-affinity binder, and therefore a CD4+ T cell epitope, if it scores higher than another peptide from a given protein.


A specific problem that Predivac addresses is the immunodominance, i.e., the restricted responsiveness of T-cells to a few selected epitopes from complex antigens. As a consequence of this property, most of the immune response in protein-based vaccination is mounted against a few (dominant) epitopes, despite the presence of many potential epitopes within a given antigen (immunogen). Vaccine formulations built on epitopes that do not dominate the immune response will not induce effective protection in the vaccinated organism, therefore; it is important identifying immunodominant epitopes. Mounting evidence suggests that the peptide:MHC kinetic stability plays a central role in controlling MHC class II peptide’s immunogenicity. Predivac was consequently developed using high-affinity binding data, on the assumption that it is the positive bias toward capturing underlying peptide features that correlates with promiscuity and immunodominance, two properties that are fundamental for EV design. Our results tend to confirm this assumption, as Predivac performed better than other pan-specific methods on a benchmark of immunodominant CD4+ T-cell epitopes, reaching 92% of accuracy and the highest specificity (Oyarzun et al, 2013). Next is the list of epitopes and their respective source protein sequences employed in the benchmarking:

CD4+ T-cell immunodominant epitopes

The ability of Predivac to identify promiscuous and immunodominant regions in antigens has been succesfully tested using CD4+ T-cell epitope maps of the HIV Gag polyprotein, available in the Los Alamos HIV Molecular Immunology Database (

Epitope-based Vaccine Design

Predivac integrates CD4+ T-cell epitope prediction with population coverage calculation and epitope selection algorithms in order to identify putative epitopes and to determine the population coverage potentially afforded by a vaccine based on these peptides. The fraction of individuals that would be potentially covered by the selected epitopes in a given target population is determined by implementing a previously reported algorithm (Bui et al, 2006). HLA class II allele frequency data in human populations is retrieved from the "The Allele Frequency Net Database" ( (Gonzalez-Galarza et al, 2009), which is the most comprehensive repository of immune gene frequencies of worldwide populations. HLA class II polymorphism is handled by favouring the selection of broadly recognized (promiscuous) and immunodominant CD4+ T-cell epitopes having the ability of potentially triggering a broad CD4+ T-cell response in a high proportion of individuals expressing distinct HLA molecules (Figure 2). As the method accounts comprehensively for human ethnic diversity, it is particularly suited as a tool to aid EV design in the context of virus-related emerging infectious diseases (EIDs), because the geographic distributions of the viruses are well defined and ethnic background of the (target) population in need of vaccination can be determined. The predicted epitopes are suitable candidates to be experimentally tested, as they hold the potential to provide cognate help in vaccination settings in these particular geographic regions.



Figure 2. Scheme illustrating the steps followed by Predivac to select promiscuous
CD4+ T-cell epitopes for EV design. Upon the user setting the target geographic
region, the program retrieves from the AFND all the HLA class II allele frequency
data available for population samples occurring in this region and searches the
input proteins for promiscuous epitopes restricted to those alleles.


* Oyarzun P, Ellis JJ, Boden M and Kobe B. (2013) PREDIVAC: CD4+ T-cell epitope prediction for vaccine design that covers 95% of HLA class II protein diversity. BMC Bioinformatics, 14:52. doi:10.1186/1471-2105-14-52

* Kobe B and Boden M. (2012) Computational modelling of linear motif-mediated protein interactions. Curr Top Med Chem, 12(14):1553-61.

* Ellis JJ and Kobe B. (2011) Predicting protein kinase specificity: Predikin update and performance in the DREAM4 challenge. PLoS One, 6, e21169.

* Saunders, N.F., et al. (2008) Predikin and PredikinDB: a computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites. BMC Bioinformatics, 9, 245.

* Saunders NF and Kobe B. (2008) The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information. Nucleic Acids Res, 36, W286-290.

* Bui HH, Sidney J, Dinh K, Southwood S, Newman MJ, et al. (2006) Predicting population coverage of T-cell epitope-based diagnostics and vaccines. BMC Bioinformatics, 7: 153.

* Gonzalez-Galarza FF, Christmas S, Middleton D and Jones AR. (2011) Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Res, 39:D913-919