{ "cells": [ { "cell_type": "markdown", "id": "ffe44327", "metadata": { "id": "ffe44327" }, "source": [ "# Common instructions\n", "\n", "- Aim to attempt all the mandatory questions (marked with a *) in the problem set. \n", "\n", "\n", "- Only attempt optional question after you have attempted all the mandatory questions. \n", "\n", "\n", "- More credit will be given if you have successfully attempted all the mandatory questions, even if you do not attempt a single optional question, as opposed to missing even one mandatory question while attempting all the optional questions.\n", "\n", "\n", "- With the above caveat, attempt as many questions as possible within the time period. Partially attempted questions will get partial credit.\n", "\n", "\n", "- Normally, you should work through the problem set in ascending order (Q1 -> Q4).\n", "\n", "\n", "- Clean, labeled plots and clear data interpretation will boost your score. So too, will the use of functions, meaningful variable names, and readable code.\n", "\n", "\n", "- You have a maximum of 2 days to work on the assignment. We will not consider assignments submitted after the deadline. You are free to search the internet, but are not to discuss with others in any way or form, in pain of immediate disqualification.\n", "\n", "\n", "- Report the websites used to obtain help. Before the deadline, create a single .zip file with all your code submit it in the submission link provided to you in the email. DO NOT include data in your zip file. \n", "\n", "\n", "- You can use any programming language of your choice to solve all or part of the questions, preferably notebooks like Jupyter Notebook, Google Colab etc. We should be able to execute your program(s) to generate the required data and plots.\n", "\n", "\n", "- In case you are unable to complete some parts, clearly indicate how would you go about the task ? What steps would you try etc.\n" ] }, { "cell_type": "markdown", "id": "f82f0100", "metadata": { "id": "f82f0100" }, "source": [ "# Input data\n", "\n", "- The TSV file `SampleData.tsv` has the following columns\n", "\n", " - Sample: Sample IDs (S01, S02, S03...)\n", " - Treatment: Information on sample type \n", " \n", " - HF+: Blood plasma samples collected from coronary disease patients post major surgery who had a heart failure within 3 years of surgery\n", " - HF-: Blood plasma samples collected from coronary disease patients post major surgery who recovered post surgery without heart failure\n", " - HVOL: Blood plasma samples collected from individuals without any discernable coronary disease\n", " \n", " \n", " \n", "- The gzipped file `GSE208194_RawTPM.csv.gz` contains gene expression information for the sample mentioned in `SampleData.tsv` where the file structure looks like\n", "\n", "ENSEMBL ID |S01 |S02 |S03 |S04\n", ":-----------------|:--------|:--------|:--------|:--------\n", "ENSG00000000419.12|2.398878 |12.157726|1.40211 |7.667875\n", "ENSG00000000938.13|3.324077 |13.971038|1.917631 |10.225812\n", "ENSG00000001629.10|12.037059|1.453811 |12.596614|15.799738\n", "ENSG00000001631.15|1.287932 |5.842868 |1.412257 |1.526812\n", "\n", "Where each row is a feature/gene (n=4150) and each column is a sample where the features are measured.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "655809dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | sample | \n", "treatment | \n", "
---|---|---|
0 | \n", "S01 | \n", "HF- | \n", "
1 | \n", "S02 | \n", "HF- | \n", "
2 | \n", "S03 | \n", "HF- | \n", "
3 | \n", "S04 | \n", "HF- | \n", "
4 | \n", "S05 | \n", "HF- | \n", "
... | \n", "... | \n", "... | \n", "
87 | \n", "S88 | \n", "HVOL | \n", "
88 | \n", "S89 | \n", "HVOL | \n", "
89 | \n", "S90 | \n", "HVOL | \n", "
90 | \n", "S95 | \n", "HF- | \n", "
91 | \n", "S96 | \n", "HVOL | \n", "
92 rows × 2 columns
\n", "\n", " | ENSEMBL ID | \n", "S01 | \n", "S02 | \n", "S03 | \n", "S04 | \n", "S05 | \n", "S06 | \n", "S07 | \n", "S08 | \n", "S09 | \n", "... | \n", "S83 | \n", "S84 | \n", "S85 | \n", "S86 | \n", "S87 | \n", "S88 | \n", "S89 | \n", "S90 | \n", "S95 | \n", "S96 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "ENSG00000000419.12 | \n", "2.398878 | \n", "12.157726 | \n", "1.402110 | \n", "7.667875 | \n", "4.198525 | \n", "4.069709 | \n", "5.573672 | \n", "1.263016 | \n", "0.000000 | \n", "... | \n", "13.014835 | \n", "4.824927 | \n", "2.999915 | \n", "7.365408 | \n", "5.487723 | \n", "25.222543 | \n", "17.734630 | \n", "5.057123 | \n", "0.968332 | \n", "3.757043 | \n", "
1 | \n", "ENSG00000000938.13 | \n", "3.324077 | \n", "13.971038 | \n", "1.917631 | \n", "10.225812 | \n", "2.847621 | \n", "1.744719 | \n", "5.592130 | \n", "7.549874 | \n", "8.696941 | \n", "... | \n", "13.342512 | \n", "1.015803 | \n", "9.047389 | \n", "4.224286 | \n", "5.467746 | \n", "17.525907 | \n", "10.903512 | \n", "1.257853 | \n", "1.025641 | \n", "8.876226 | \n", "
2 | \n", "ENSG00000001629.10 | \n", "12.037059 | \n", "1.453811 | \n", "12.596614 | \n", "15.799738 | \n", "17.246205 | \n", "7.376407 | \n", "7.310507 | \n", "7.439008 | \n", "23.898468 | \n", "... | \n", "15.474715 | \n", "24.929106 | \n", "15.789732 | \n", "22.172867 | \n", "15.370133 | \n", "89.995748 | \n", "32.951226 | \n", "12.263058 | \n", "5.968981 | \n", "14.964376 | \n", "
3 | \n", "ENSG00000001631.15 | \n", "1.287932 | \n", "5.842868 | \n", "1.412257 | \n", "1.526812 | \n", "7.261625 | \n", "4.530184 | \n", "0.000000 | \n", "5.598395 | \n", "5.140090 | \n", "... | \n", "4.337378 | \n", "4.804729 | \n", "3.221380 | \n", "3.440060 | \n", "10.331736 | \n", "7.899247 | \n", "11.006974 | \n", "6.447497 | \n", "2.786651 | \n", "5.343678 | \n", "
4 | \n", "ENSG00000002549.12 | \n", "2.914606 | \n", "32.566404 | \n", "2.231871 | \n", "35.444900 | \n", "9.212083 | \n", "5.219885 | \n", "4.155680 | \n", "8.677364 | \n", "29.230862 | \n", "... | \n", "14.737366 | \n", "8.268794 | \n", "7.478753 | \n", "6.658792 | \n", "9.647899 | \n", "28.691720 | \n", "33.123814 | \n", "6.709470 | \n", "1.214385 | \n", "3.023686 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
4145 | \n", "ENSG00000286834.1 | \n", "1.446729 | \n", "101.312666 | \n", "17.106360 | \n", "109.490990 | \n", "33.096990 | \n", "4.787487 | \n", "30.199864 | \n", "17.304912 | \n", "29.437844 | \n", "... | \n", "42.816479 | \n", "6.306396 | \n", "11.959351 | \n", "4.344150 | \n", "13.036536 | \n", "23.370109 | \n", "34.169966 | \n", "5.444215 | \n", "4.262073 | \n", "13.917467 | \n", "
4146 | \n", "ENSG00000287080.1 | \n", "8.543461 | \n", "48.482781 | \n", "11.492971 | \n", "40.763492 | \n", "23.798352 | \n", "8.696239 | \n", "37.818536 | \n", "18.158117 | \n", "34.491123 | \n", "... | \n", "21.921759 | \n", "6.095591 | \n", "1.362975 | \n", "5.928143 | \n", "0.704842 | \n", "2.688918 | \n", "9.531408 | \n", "2.823952 | \n", "0.107894 | \n", "12.110910 | \n", "
4147 | \n", "ENSG00000287160.1 | \n", "3.392143 | \n", "57.089139 | \n", "7.379778 | \n", "67.437197 | \n", "17.519094 | \n", "2.840147 | \n", "10.407136 | \n", "8.292886 | \n", "5.282977 | \n", "... | \n", "45.515759 | \n", "3.203770 | \n", "2.802123 | \n", "1.286441 | \n", "7.848437 | \n", "23.094887 | \n", "19.040575 | \n", "1.324381 | \n", "4.329463 | \n", "9.970802 | \n", "
4148 | \n", "ENSG00000287825.1 | \n", "8.543410 | \n", "26.198111 | \n", "9.731645 | \n", "5.939315 | \n", "13.911370 | \n", "14.807156 | \n", "12.884268 | \n", "0.353449 | \n", "16.483560 | \n", "... | \n", "0.000000 | \n", "30.601166 | \n", "18.953867 | \n", "19.686259 | \n", "20.540434 | \n", "7.114563 | \n", "4.664343 | \n", "20.376778 | \n", "1.601362 | \n", "9.273306 | \n", "
4149 | \n", "ENSG00000288560.1 | \n", "8.361870 | \n", "3.218659 | \n", "2.560132 | \n", "4.123530 | \n", "5.092169 | \n", "7.478186 | \n", "9.504376 | \n", "1.776422 | \n", "1.402694 | \n", "... | \n", "10.143018 | \n", "7.053033 | \n", "19.710067 | \n", "1.791298 | \n", "24.242870 | \n", "4.563028 | \n", "10.941191 | \n", "7.220862 | \n", "1.568828 | \n", "7.706704 | \n", "
4150 rows × 90 columns
\n", "