|
a |
|
b/docs/research/CFTR Annotations.txt |
|
|
1 |
All annotations courtesy of the National Library of Medicine |
|
|
2 |
Author's note: DNAse hypersensitive loci and most enhancer sequences are not included because, at the current state of DNAnalyzer, will not be of importance. |
|
|
3 |
|
|
|
4 |
|
|
|
5 |
Coding Sequence (mutations along these sequences are very likely to affect the protein structure in a negative manner): atgcagaggt cgcctctgga aaaggccagc gttgtctcca aacttttttt cagctggacc |
|
|
6 |
agaccaattt tgaggaaagg atacagacag cgcctggaat tgtcagacat ataccaaatc |
|
|
7 |
ccttctgttg attctgctga caatctatct gaaaaattgg aaagagaatg ggatagagag |
|
|
8 |
ctggcttcaa agaaaaatcc taaactcatt aatgcccttc ggcgatgttt tttctggaga |
|
|
9 |
tttatgttct atggaatctt tttatattta ggggaagtca ccaaagcagt acagcctctc |
|
|
10 |
ttactgggaa gaatcatagc ttcctatgac ccggataaca aggaggaacg ctctatcgcg |
|
|
11 |
atttatctag gcataggctt atgccttctc tttattgtga ggacactgct cctacaccca |
|
|
12 |
gccatttttg gccttcatca cattggaatg cagatgagaa tagctatgtt tagtttgatt |
|
|
13 |
tataagaaga ctttaaagct gtcaagccgt gttctagata aaataagtat tggacaactt |
|
|
14 |
gttagtctcc tttccaacaa cctgaacaaa tttgatgaag gacttgcatt ggcacatttc |
|
|
15 |
gtgtggatcg ctcctttgca agtggcactc ctcatggggc taatctggga gttgttacag |
|
|
16 |
gcgtctgcct tctgtggact tggtttcctg atagtccttg ccctttttca ggctgggcta |
|
|
17 |
gggagaatga tgatgaagta cagagatcag agagctggga agatcagtga aagacttgtg |
|
|
18 |
attacctcag aaatgattga aaatatccaa tctgttaagg catactgctg ggaagaagca |
|
|
19 |
atggaaaaaa tgattgaaaa cttaagacaa acagaactga aactgactcg gaaggcagcc |
|
|
20 |
tatgtgagat acttcaatag ctcagccttc ttcttctcag ggttctttgt ggtgttttta |
|
|
21 |
tctgtgcttc cctatgcact aatcaaagga atcatcctcc ggaaaatatt caccaccatc |
|
|
22 |
tcattctgca ttgttctgcg catggcggtc actcggcaat ttccctgggc tgtacaaaca |
|
|
23 |
tggtatgact ctcttggagc aataaacaaa atacaggatt tcttacaaaa gcaagaatat |
|
|
24 |
aagacattgg aatataactt aacgactaca gaagtagtga tggagaatgt aacagccttc |
|
|
25 |
tgggaggagg gatttgggga attatttgag aaagcaaaac aaaacaataa caatagaaaa |
|
|
26 |
acttctaatg gtgatgacag cctcttcttc agtaatttct cacttcttgg tactcctgtc |
|
|
27 |
ctgaaagata ttaatttcaa gatagaaaga ggacagttgt tggcggttgc tggatccact |
|
|
28 |
ggagcaggca agacttcact tctaatggtg attatgggag aactggagcc ttcagagggt |
|
|
29 |
aaaattaagc acagtggaag aatttcattc tgttctcagt tttcctggat tatgcctggc |
|
|
30 |
accattaaag aaaatatcat ctttggtgtt tcctatgatg aatatagata cagaagcgtc |
|
|
31 |
atcaaagcat gccaactaga agaggacatc tccaagtttg cagagaaaga caatatagtt |
|
|
32 |
cttggagaag gtggaatcac actgagtgga ggtcaacgag caagaatttc tttagcaaga |
|
|
33 |
gcagtataca aagatgctga tttgtattta ttagactctc cttttggata cctagatgtt |
|
|
34 |
ttaacagaaa aagaaatatt tgaaagctgt gtctgtaaac tgatggctaa caaaactagg |
|
|
35 |
attttggtca cttctaaaat ggaacattta aagaaagctg acaaaatatt aattttgcat |
|
|
36 |
gaaggtagca gctattttta tgggacattt tcagaactcc aaaatctaca gccagacttt |
|
|
37 |
agctcaaaac tcatgggatg tgattctttc gaccaattta gtgcagaaag aagaaattca |
|
|
38 |
atcctaactg agaccttaca ccgtttctca ttagaaggag atgctcctgt ctcctggaca |
|
|
39 |
gaaacaaaaa aacaatcttt taaacagact ggagagtttg gggaaaaaag gaagaattct |
|
|
40 |
attctcaatc caatcaactc tatacgaaaa ttttccattg tgcaaaagac tcccttacaa |
|
|
41 |
atgaatggca tcgaagagga ttctgatgag cctttagaga gaaggctgtc cttagtacca |
|
|
42 |
gattctgagc agggagaggc gatactgcct cgcatcagcg tgatcagcac tggccccacg |
|
|
43 |
cttcaggcac gaaggaggca gtctgtcctg aacctgatga cacactcagt taaccaaggt |
|
|
44 |
cagaacattc accgaaagac aacagcatcc acacgaaaag tgtcactggc ccctcaggca |
|
|
45 |
aacttgactg aactggatat atattcaaga aggttatctc aagaaactgg cttggaaata |
|
|
46 |
agtgaagaaa ttaacgaaga agacttaaag gagtgctttt ttgatgatat ggagagcata |
|
|
47 |
ccagcagtga ctacatggaa cacatacctt cgatatatta ctgtccacaa gagcttaatt |
|
|
48 |
tttgtgctaa tttggtgctt agtaattttt ctggcagagg tggctgcttc tttggttgtg |
|
|
49 |
ctgtggctcc ttggaaacac tcctcttcaa gacaaaggga atagtactca tagtagaaat |
|
|
50 |
aacagctatg cagtgattat caccagcacc agttcgtatt atgtgtttta catttacgtg |
|
|
51 |
ggagtagccg acactttgct tgctatggga ttcttcagag gtctaccact ggtgcatact |
|
|
52 |
ctaatcacag tgtcgaaaat tttacaccac aaaatgttac attctgttct tcaagcacct |
|
|
53 |
atgtcaaccc tcaacacgtt gaaagcaggt gggattctta atagattctc caaagatata |
|
|
54 |
gcaattttgg atgaccttct gcctcttacc atatttgact tcatccagtt gttattaatt |
|
|
55 |
gtgattggag ctatagcagt tgtcgcagtt ttacaaccct acatctttgt tgcaacagtg |
|
|
56 |
ccagtgatag tggcttttat tatgttgaga gcatatttcc tccaaacctc acagcaactc |
|
|
57 |
aaacaactgg aatctgaagg caggagtcca attttcactc atcttgttac aagcttaaaa |
|
|
58 |
ggactatgga cacttcgtgc cttcggacgg cagccttact ttgaaactct gttccacaaa |
|
|
59 |
gctctgaatt tacatactgc caactggttc ttgtacctgt caacactgcg ctggttccaa |
|
|
60 |
atgagaatag aaatgatttt tgtcatcttc ttcattgctg ttaccttcat ttccatttta |
|
|
61 |
acaacaggag aaggagaagg aagagttggt attatcctga ctttagccat gaatatcatg |
|
|
62 |
agtacattgc agtgggctgt aaactccagc atagatgtgg atagcttgat gcgatctgtg |
|
|
63 |
agccgagtct ttaagttcat tgacatgcca acagaaggta aacctaccaa gtcaaccaaa |
|
|
64 |
ccatacaaga atggccaact ctcgaaagtt atgattattg agaattcaca cgtgaagaaa |
|
|
65 |
gatgacatct ggccctcagg gggccaaatg actgtcaaag atctcacagc aaaatacaca |
|
|
66 |
gaaggtggaa atgccatatt agagaacatt tccttctcaa taagtcctgg ccagagggtg |
|
|
67 |
ggcctcttgg gaagaactgg atcagggaag agtactttgt tatcagcttt tttgagacta |
|
|
68 |
ctgaacactg aaggagaaat ccagatcgat ggtgtgtctt gggattcaat aactttgcaa |
|
|
69 |
cagtggagga aagcctttgg agtgatacca cagaaagtat ttattttttc tggaacattt |
|
|
70 |
agaaaaaact tggatcccta tgaacagtgg agtgatcaag aaatatggaa agttgcagat |
|
|
71 |
gaggttgggc tcagatctgt gatagaacag tttcctggga agcttgactt tgtccttgtg |
|
|
72 |
gatgggggct gtgtcctaag ccatggccac aagcagttga tgtgcttggc tagatctgtt |
|
|
73 |
ctcagtaagg cgaagatctt gctgcttgat gaacccagtg ctcatttgga tccagtaaca |
|
|
74 |
taccaaataa ttagaagaac tctaaaacaa gcatttgctg attgcacagt aattctctgt |
|
|
75 |
gaacacagga tagaagcaat gctggaatgc caacaatttt tggtcataga agagaacaaa |
|
|
76 |
gtgcggcagt acgattccat ccagaaactg ctgaacgaga ggagcctctt ccggcaagcc |
|
|
77 |
atcagcccct ccgacagggt gaagctcttt ccccaccgga actcaagcaa gtgcaagtct |
|
|
78 |
aagccccaga ttgctgctct gaaagaggag acagaagaag aggtgcaaga tacaaggctt |
|
|
79 |
tag |
|
|
80 |
|
|
|
81 |
This should translate to: |
|
|
82 |
MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVD |
|
|
83 |
SADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLL |
|
|
84 |
GRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHHIGMQMRIAMFSLI |
|
|
85 |
YKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVALLMGLIWEL |
|
|
86 |
LQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYC |
|
|
87 |
WEEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILR |
|
|
88 |
KIFTTISFCIVLRMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEV |
|
|
89 |
VMENVTAFWEEGFGELFEKAKQNNNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIER |
|
|
90 |
GQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIF |
|
|
91 |
GVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLARAVYKDA |
|
|
92 |
DLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSS |
|
|
93 |
YFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTET |
|
|
94 |
KKQSFKQTGEFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVP |
|
|
95 |
DSEQGEAILPRISVISTGPTLQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAP |
|
|
96 |
QANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESIPAVTTWNTYLRYITVH |
|
|
97 |
KSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITSTSSYY |
|
|
98 |
VFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGI |
|
|
99 |
LNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLR |
|
|
100 |
AYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTAN |
|
|
101 |
WFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWA |
|
|
102 |
VNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKVMIIENSHVKKDDIW |
|
|
103 |
PSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLN |
|
|
104 |
TEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVAD |
|
|
105 |
EVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDP |
|
|
106 |
VTYQIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSL |
|
|
107 |
FRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEETEEEVQDTRL |
|
|
108 |
|
|
|
109 |
The most common cystic fibrosis mutation is ENIIFGVSYDE -> ENIIGVSYDE |
|
|
110 |
CFTR Promoters: |
|
|
111 |
Basal Promoter (attracts the formation of a transcription complex, located within the entire promoter region): gtagtaggtc tttggcatta ggagcttgag cccaga |
|
|
112 |
|
|
|
113 |
Promoter (whole sequence): gtagtaggtc tttggcatta ggagcttgag cccagacggc cctagcaggg accccagcgc ccgagagacc |