File Formats Recognised by Predivac

  1. Protein sequence
  2. Peptide list
  3. Epitope dataset

Protein Sequence

Only sequences in FASTA are accepted, that means it must have a valid FASTA header line and contain only one sequence. Only the standard amino acids are accepted.

There must be no space between the start of the header line ('>') and the sequence ID.

The following is an example of a sequence in valid FASTA format:

>my_protein1
MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

The following are all INVALID FASTA files.

>my_protein2
MXITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

Contains a non-standard amino acid: second amino acid is 'X'.


> my_protein3
MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

Space between start of header line and sequence ID.


>
MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

Missing sequence ID.


MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

Missing header line.


>my_protein1
MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN
>my_protein1
MPITINNFRYSDPVNNDTIIMMEPPYCKGLDIYYKAFKITDRIWIVPERYEFGTKPEDFN
PPSSLIEGASEYYDPNYLRTDSDKDRFLQTMVKLFNRIKNNVAGEALLDKIINAIPYLGN
SYSLLDKFDTNSNSVSFNLLEQDPSGATTKSAMLTNLIIFGPGPVLNKNEVRGIVLRVDN

Contains multiple sequences.

Peptide List

Either all sequences must have a valid header line or no sequence may have a header line.

FASTA format

>p1
RHYLHTLWKAGILYK
>p2
CADARMYGVLPWNAFPGKVC
>p3
LYGALLLAEGFYTTGAVRQI
>p4
KPVSQMRMATPLLMRPM

Simple List

RHYLHTLWKAGILYK
CADARMYGVLPWNAFPGKVC
LYGALLLAEGFYTTGAVRQI
KPVSQMRMATPLLMRPM

Epitope Dataset

gag171,DRB1*01:01,DRB1*15:01,DRB1*04:01,DRB1*04:05,DRB1*13:02,DRB1*07:01,DRB1*09:01,DRB5*01:01,DRB4*01:01
gag294,DRB1*01:01,DRB1*15:01,DRB1*04:05,DRB1*11:01,DRB1*13:02,DRB1*07:01,DRB1*08:02,DRB5*01:01,DRB4*01:01
gag298,DRB1*01:01,DRB1*15:01,DRB1*03:01,DRB1*04:01,DRB1*04:05,DRB1*11:01,DRB1*12:01,DRB1*13:02,DRB1*07:01,DRB1*08:02,DRB1*09:01,DRB5*01:01,DRB4*01:01
pol303,DRB1*01:01,DRB1*15:01,DRB1*03:01,DRB1*04:05,DRB1*07:01,DRB1*09:01
pol335,DRB1*01:01,DRB1*15:01,DRB1*04:05,DRB1*13:02,DRB1*07:01,DRB1*09:01
pol596,DRB1*01:01,DRB1*15:01,DRB1*04:01,DRB1*04:05,DRB1*11:01,DRB1*13:02,DRB1*07:01,DRB1*08:02,DRB1*09:01,DRB5*01:01
pol711,DRB1*01:01,DRB1*15:01,DRB1*04:01,DRB1*04:05,DRB1*11:01,DRB1*13:02,DRB1*07:01,DRB1*08:02,DRB1*09:01,DRB5*01:01
pol712,DRB1*01:01,DRB1*15:01,DRB1*04:01,DRB1*04:05,DRB1*11:01,DRB1*07:01,DRB1*08:02,DRB1*09:01,DRB5*01:01,DRB4*01:01
pol758,DRB1*01:01,DRB1*04:01,DRB1*04:05,DRB1*11:01,DRB1*07:01,DRB5*01:01
pol915,DRB1*01:01,DRB1*04:05,DRB1*11:01,DRB1*13:02