Switch to unified view

a b/docs/research/CFTR Annotations.txt
1
All annotations courtesy of the National Library of Medicine
2
Author's note: DNAse hypersensitive loci and most enhancer sequences are not included because, at the current state of DNAnalyzer, will not be of importance.
3
4
5
Coding Sequence (mutations along these sequences are very likely to affect the protein structure in a negative manner): atgcagaggt cgcctctgga aaaggccagc gttgtctcca aacttttttt cagctggacc
6
        agaccaattt tgaggaaagg atacagacag cgcctggaat tgtcagacat ataccaaatc
7
       ccttctgttg attctgctga caatctatct gaaaaattgg aaagagaatg ggatagagag
8
       ctggcttcaa agaaaaatcc taaactcatt aatgcccttc ggcgatgttt tttctggaga
9
       tttatgttct atggaatctt tttatattta ggggaagtca ccaaagcagt acagcctctc
10
       ttactgggaa gaatcatagc ttcctatgac ccggataaca aggaggaacg ctctatcgcg
11
       atttatctag gcataggctt atgccttctc tttattgtga ggacactgct cctacaccca
12
       gccatttttg gccttcatca cattggaatg cagatgagaa tagctatgtt tagtttgatt
13
       tataagaaga ctttaaagct gtcaagccgt gttctagata aaataagtat tggacaactt
14
       gttagtctcc tttccaacaa cctgaacaaa tttgatgaag gacttgcatt ggcacatttc
15
       gtgtggatcg ctcctttgca agtggcactc ctcatggggc taatctggga gttgttacag
16
       gcgtctgcct tctgtggact tggtttcctg atagtccttg ccctttttca ggctgggcta
17
       gggagaatga tgatgaagta cagagatcag agagctggga agatcagtga aagacttgtg
18
       attacctcag aaatgattga aaatatccaa tctgttaagg catactgctg ggaagaagca
19
       atggaaaaaa tgattgaaaa cttaagacaa acagaactga aactgactcg gaaggcagcc
20
       tatgtgagat acttcaatag ctcagccttc ttcttctcag ggttctttgt ggtgttttta
21
       tctgtgcttc cctatgcact aatcaaagga atcatcctcc ggaaaatatt caccaccatc
22
      tcattctgca ttgttctgcg catggcggtc actcggcaat ttccctgggc tgtacaaaca
23
      tggtatgact ctcttggagc aataaacaaa atacaggatt tcttacaaaa gcaagaatat
24
      aagacattgg aatataactt aacgactaca gaagtagtga tggagaatgt aacagccttc
25
      tgggaggagg gatttgggga attatttgag aaagcaaaac aaaacaataa caatagaaaa
26
      acttctaatg gtgatgacag cctcttcttc agtaatttct cacttcttgg tactcctgtc
27
      ctgaaagata ttaatttcaa gatagaaaga ggacagttgt tggcggttgc tggatccact
28
      ggagcaggca agacttcact tctaatggtg attatgggag aactggagcc ttcagagggt
29
      aaaattaagc acagtggaag aatttcattc tgttctcagt tttcctggat tatgcctggc
30
      accattaaag aaaatatcat ctttggtgtt tcctatgatg aatatagata cagaagcgtc
31
      atcaaagcat gccaactaga agaggacatc tccaagtttg cagagaaaga caatatagtt
32
      cttggagaag gtggaatcac actgagtgga ggtcaacgag caagaatttc tttagcaaga
33
      gcagtataca aagatgctga tttgtattta ttagactctc cttttggata cctagatgtt
34
      ttaacagaaa aagaaatatt tgaaagctgt gtctgtaaac tgatggctaa caaaactagg
35
      attttggtca cttctaaaat ggaacattta aagaaagctg acaaaatatt aattttgcat
36
      gaaggtagca gctattttta tgggacattt tcagaactcc aaaatctaca gccagacttt
37
      agctcaaaac tcatgggatg tgattctttc gaccaattta gtgcagaaag aagaaattca
38
      atcctaactg agaccttaca ccgtttctca ttagaaggag atgctcctgt ctcctggaca
39
      gaaacaaaaa aacaatcttt taaacagact ggagagtttg gggaaaaaag gaagaattct
40
      attctcaatc caatcaactc tatacgaaaa ttttccattg tgcaaaagac tcccttacaa
41
      atgaatggca tcgaagagga ttctgatgag cctttagaga gaaggctgtc cttagtacca
42
      gattctgagc agggagaggc gatactgcct cgcatcagcg tgatcagcac tggccccacg
43
      cttcaggcac gaaggaggca gtctgtcctg aacctgatga cacactcagt taaccaaggt
44
      cagaacattc accgaaagac aacagcatcc acacgaaaag tgtcactggc ccctcaggca
45
      aacttgactg aactggatat atattcaaga aggttatctc aagaaactgg cttggaaata
46
      agtgaagaaa ttaacgaaga agacttaaag gagtgctttt ttgatgatat ggagagcata
47
      ccagcagtga ctacatggaa cacatacctt cgatatatta ctgtccacaa gagcttaatt
48
      tttgtgctaa tttggtgctt agtaattttt ctggcagagg tggctgcttc tttggttgtg
49
      ctgtggctcc ttggaaacac tcctcttcaa gacaaaggga atagtactca tagtagaaat
50
      aacagctatg cagtgattat caccagcacc agttcgtatt atgtgtttta catttacgtg
51
      ggagtagccg acactttgct tgctatggga ttcttcagag gtctaccact ggtgcatact
52
      ctaatcacag tgtcgaaaat tttacaccac aaaatgttac attctgttct tcaagcacct
53
      atgtcaaccc tcaacacgtt gaaagcaggt gggattctta atagattctc caaagatata
54
      gcaattttgg atgaccttct gcctcttacc atatttgact tcatccagtt gttattaatt
55
      gtgattggag ctatagcagt tgtcgcagtt ttacaaccct acatctttgt tgcaacagtg
56
      ccagtgatag tggcttttat tatgttgaga gcatatttcc tccaaacctc acagcaactc
57
      aaacaactgg aatctgaagg caggagtcca attttcactc atcttgttac aagcttaaaa
58
      ggactatgga cacttcgtgc cttcggacgg cagccttact ttgaaactct gttccacaaa
59
      gctctgaatt tacatactgc caactggttc ttgtacctgt caacactgcg ctggttccaa
60
      atgagaatag aaatgatttt tgtcatcttc ttcattgctg ttaccttcat ttccatttta
61
      acaacaggag aaggagaagg aagagttggt attatcctga ctttagccat gaatatcatg
62
      agtacattgc agtgggctgt aaactccagc atagatgtgg atagcttgat gcgatctgtg
63
      agccgagtct ttaagttcat tgacatgcca acagaaggta aacctaccaa gtcaaccaaa
64
      ccatacaaga atggccaact ctcgaaagtt atgattattg agaattcaca cgtgaagaaa
65
      gatgacatct ggccctcagg gggccaaatg actgtcaaag atctcacagc aaaatacaca
66
      gaaggtggaa atgccatatt agagaacatt tccttctcaa taagtcctgg ccagagggtg
67
      ggcctcttgg gaagaactgg atcagggaag agtactttgt tatcagcttt tttgagacta
68
      ctgaacactg aaggagaaat ccagatcgat ggtgtgtctt gggattcaat aactttgcaa
69
      cagtggagga aagcctttgg agtgatacca cagaaagtat ttattttttc tggaacattt
70
      agaaaaaact tggatcccta tgaacagtgg agtgatcaag aaatatggaa agttgcagat
71
      gaggttgggc tcagatctgt gatagaacag tttcctggga agcttgactt tgtccttgtg
72
      gatgggggct gtgtcctaag ccatggccac aagcagttga tgtgcttggc tagatctgtt
73
      ctcagtaagg cgaagatctt gctgcttgat gaacccagtg ctcatttgga tccagtaaca
74
      taccaaataa ttagaagaac tctaaaacaa gcatttgctg attgcacagt aattctctgt
75
      gaacacagga tagaagcaat gctggaatgc caacaatttt tggtcataga agagaacaaa
76
      gtgcggcagt acgattccat ccagaaactg ctgaacgaga ggagcctctt ccggcaagcc
77
      atcagcccct ccgacagggt gaagctcttt ccccaccgga actcaagcaa gtgcaagtct
78
      aagccccaga ttgctgctct gaaagaggag acagaagaag aggtgcaaga tacaaggctt
79
      tag
80
81
This should translate to: 
82
             MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVD
83
                     SADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLL
84
                     GRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHHIGMQMRIAMFSLI
85
                     YKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVALLMGLIWEL
86
                     LQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYC
87
                     WEEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILR
88
                     KIFTTISFCIVLRMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEV
89
                     VMENVTAFWEEGFGELFEKAKQNNNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIER
90
                     GQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIF
91
                     GVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLARAVYKDA
92
                     DLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSS
93
                     YFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTET
94
                     KKQSFKQTGEFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVP
95
                     DSEQGEAILPRISVISTGPTLQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAP
96
                     QANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESIPAVTTWNTYLRYITVH
97
                     KSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITSTSSYY
98
                     VFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGI
99
                     LNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLR
100
                     AYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTAN
101
                     WFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWA
102
                     VNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKVMIIENSHVKKDDIW
103
                     PSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLN
104
                     TEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVAD
105
                     EVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDP
106
                     VTYQIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSL
107
                     FRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEETEEEVQDTRL
108
109
The most common cystic fibrosis mutation is ENIIFGVSYDE -> ENIIGVSYDE
110
CFTR Promoters: 
111
    Basal Promoter (attracts the formation of a transcription complex, located within the entire promoter region): gtagtaggtc tttggcatta ggagcttgag cccaga
112
    
113
    Promoter (whole sequence): gtagtaggtc tttggcatta ggagcttgag cccagacggc cctagcaggg accccagcgc ccgagagacc