TY - JOUR
T1 - A comprehensive system for consistent numbering of HCV sequences, proteins and epitopes
AU - Kuiken, Carla
AU - Combet, Christophe
AU - Bukh, Jens
AU - Shin-I, Tadasu
AU - Deleage, Gilbert
AU - Mizokami, Masashi
AU - Richardson, Russell
AU - Sablon, Erwin
AU - Yusim, Karina
AU - Pawlotsky, Jean Michel
AU - Simmonds, Peter
AU - Korber, Bette
AU - Abfalterer, Werner
AU - Calef, Charles
AU - Foley, Brian
AU - Funkhouser, Robert
AU - Gaschen, Brian
AU - Lang, Dorothy
AU - Leitner, Thomas
AU - Szinger, James
AU - Zhang, Ming
PY - 2006/11
Y1 - 2006/11
N2 - This numbering proposal, using the AF009606 (isolate H77) sequence as a reference, should be able to unequivocally number all possible mutations in HCV, both natural and manmade. The HCV sequence databases8 and the Los Alamos HCV immunology database9 (as well as the Los Alamos HIV database) number positions and epitopes according to this system. Moreover, the databases websites provides tools for finding stretches of sequence by their numbers, for assigning start and end coordinates to a sequence, and for converting between the various numbering systems. Numbering HCV nucleotide sequences is done by analogy to H77. The first step is aligning your sequence to H77. If there is no length variation, the numbering is straightforward; nucleotide numbers run from 1 (start of 5′ UTR) to 9646 (end of 3′ UTR). Insertions relative to H77 are labeled with letters. Protein numbering works like the nucleotide numbering, but starts at the start of the polyprotein. The sequence databases will support both systems, but use polyprotein numbering as a basis. Absolute numbering moves across the coding regions, relative numbering starts over at every coding region. Relative numbering is almost exclusively used for proteins, polyprotein numbering mainly in immunology, protein numbering in drug resistance research. The Los Alamos immunology database uses polyprotein numbering. The 5′ UTR numbering starts at 1 and ends at 341; the Core cds starts at 342. The numbering of the 3′ UTR starts at 9378 (after the stop codon), but complications arise due to the variable length of the PPT. The UTR consists of 3 elements: a variable 5′ region, the PPT, and a conserved 3′ region, often called X. The first region is numbered 9378-9410. The PPT consists almost entirely of T's and therefore cannot be meaningfully aligned; it is numbered according to its length in H77, 9411-9545. The X region starts at 9546 (regardless of its actual location, which depends on the length of the PPT) and ends at 9646.
AB - This numbering proposal, using the AF009606 (isolate H77) sequence as a reference, should be able to unequivocally number all possible mutations in HCV, both natural and manmade. The HCV sequence databases8 and the Los Alamos HCV immunology database9 (as well as the Los Alamos HIV database) number positions and epitopes according to this system. Moreover, the databases websites provides tools for finding stretches of sequence by their numbers, for assigning start and end coordinates to a sequence, and for converting between the various numbering systems. Numbering HCV nucleotide sequences is done by analogy to H77. The first step is aligning your sequence to H77. If there is no length variation, the numbering is straightforward; nucleotide numbers run from 1 (start of 5′ UTR) to 9646 (end of 3′ UTR). Insertions relative to H77 are labeled with letters. Protein numbering works like the nucleotide numbering, but starts at the start of the polyprotein. The sequence databases will support both systems, but use polyprotein numbering as a basis. Absolute numbering moves across the coding regions, relative numbering starts over at every coding region. Relative numbering is almost exclusively used for proteins, polyprotein numbering mainly in immunology, protein numbering in drug resistance research. The Los Alamos immunology database uses polyprotein numbering. The 5′ UTR numbering starts at 1 and ends at 341; the Core cds starts at 342. The numbering of the 3′ UTR starts at 9378 (after the stop codon), but complications arise due to the variable length of the PPT. The UTR consists of 3 elements: a variable 5′ region, the PPT, and a conserved 3′ region, often called X. The first region is numbered 9378-9410. The PPT consists almost entirely of T's and therefore cannot be meaningfully aligned; it is numbered according to its length in H77, 9411-9545. The X region starts at 9546 (regardless of its actual location, which depends on the length of the PPT) and ends at 9646.
UR - http://www.scopus.com/inward/record.url?scp=33750999948&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750999948&partnerID=8YFLogxK
U2 - 10.1002/hep.21377
DO - 10.1002/hep.21377
M3 - Review article
C2 - 17058236
AN - SCOPUS:33750999948
SN - 0270-9139
VL - 44
SP - 1355
EP - 1361
JO - Hepatology
JF - Hepatology
IS - 5
ER -