A comprehensive system for consistent numbering of HCV sequences, proteins and epitopes

Carla Kuiken, Christophe Combet, Jens Bukh, Tadasu Shin-I, Gilbert Deleage, Masashi Mizokami, Russell Richardson, Erwin Sablon, Karina Yusim, Jean Michel Pawlotsky, Peter Simmonds, Bette Korber, Werner Abfalterer, Charles Calef, Brian Foley, Robert Funkhouser, Brian Gaschen, Dorothy Lang, Thomas Leitner, James SzingerMing Zhang

Research output: Contribution to journalReview articlepeer-review

97 Scopus citations


This numbering proposal, using the AF009606 (isolate H77) sequence as a reference, should be able to unequivocally number all possible mutations in HCV, both natural and manmade. The HCV sequence databases8 and the Los Alamos HCV immunology database9 (as well as the Los Alamos HIV database) number positions and epitopes according to this system. Moreover, the databases websites provides tools for finding stretches of sequence by their numbers, for assigning start and end coordinates to a sequence, and for converting between the various numbering systems. Numbering HCV nucleotide sequences is done by analogy to H77. The first step is aligning your sequence to H77. If there is no length variation, the numbering is straightforward; nucleotide numbers run from 1 (start of 5′ UTR) to 9646 (end of 3′ UTR). Insertions relative to H77 are labeled with letters. Protein numbering works like the nucleotide numbering, but starts at the start of the polyprotein. The sequence databases will support both systems, but use polyprotein numbering as a basis. Absolute numbering moves across the coding regions, relative numbering starts over at every coding region. Relative numbering is almost exclusively used for proteins, polyprotein numbering mainly in immunology, protein numbering in drug resistance research. The Los Alamos immunology database uses polyprotein numbering. The 5′ UTR numbering starts at 1 and ends at 341; the Core cds starts at 342. The numbering of the 3′ UTR starts at 9378 (after the stop codon), but complications arise due to the variable length of the PPT. The UTR consists of 3 elements: a variable 5′ region, the PPT, and a conserved 3′ region, often called X. The first region is numbered 9378-9410. The PPT consists almost entirely of T's and therefore cannot be meaningfully aligned; it is numbered according to its length in H77, 9411-9545. The X region starts at 9546 (regardless of its actual location, which depends on the length of the PPT) and ends at 9646.

Original languageEnglish (US)
Pages (from-to)1355-1361
Number of pages7
Issue number5
StatePublished - Nov 2006

ASJC Scopus subject areas

  • Hepatology


Dive into the research topics of 'A comprehensive system for consistent numbering of HCV sequences, proteins and epitopes'. Together they form a unique fingerprint.

Cite this