[bd22c4]: / eda / KAO / README.R

Download this file

338 lines (304 with data), 11.9 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
##### README ###### 

## "0_pathway_toolkit.R"
# description: This script contains 2 function useful for pathway 
#   analysis in R. Priciple is categorical terms are used to make 
#   a master list of id-to-category relationships. Then 2nd function
#   uses the mater list (AKA reference set) when performing enrichment
#   analysis usign fisher exact test, outputs the enrichemnt score/pvalue
#   and adjusted p-value for the categorical terms. This code was originally
#   produced for the dental informatics project. 
# issue: #9
# date created: 11/07/2017
# date last modified: 05/30/2020


## "01_KAO_Establishing_connection_to_db_extracting_timeStamp.R"
# description: Establishes DB connection using RSQLite package and 
#     fetches time stamp information for Raw files.
# Relevant Issue(s): 
# date created: 5/12/20
# date last modified: 5/12/20
# input:
#  - Covid-19 Study DB.sqlite
# output:
#  - 

## "X1_KAO_Updating_GC_keep_status_in_db.R"
# description: Sets 1:10 split GC files keep column to 0 (FALSE)
# date created: 5/13/20
# date last modified: 5/13/20
# input: 
# - Covid-19 Study DB.sqlite
# output: 
# - Covid-19 Study DB.sqlite (modified)

## "02_KAO_Runorder_correction_for_GC_metabolomics_data.R"
# description: Performs run order correction of the GC data and explores results.
#   this analysis is exploratory only and does not modify the db.
# date created: 5/13/20
# date last modified: 5/14/20
# input:
# - Covid-19 Study DB.sqlite

## "X2_KAO_GC_metabolomics_runtime_correction.R"
# description: Performs run order correction and modifies db. 
# date created: 5/14/20
# date last modified: 5/15/20
# input:
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite (modified metabolite_measurements table)

## "X3_KAO_Updating_GC_metabolite_tier_in_DB.R"
# description: extracts the mean tier information by molecule. This tier
#   information is useful for filtering out poor quality metabolites and 
#   is added to the sqlite db metadata table
# data created: 5/15/20
# date last modified: 5/15/20
# input:
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite (modified metadata table)

## "03_KAO_Exploring_GC_feature_quality.R"
# description: Explores 4 metrics of GC-metabolomics feature quality -
#   1) duplicate moleucles, 2) mean tier quality, 3) RSDs of QC sampels 
#   within and between batches, 4) dynamic range. 
# date created: 5/16/20
# date last modified: 5/27/20
# input:
# - Covid-19 Study DB.sqlite

## "X4_KAO_updating_biomolecules_keep_column_GC_metabolites.R"
# description: modifies DB to update metabolite keep column to 
#   denote features which should be excluded from downstream analysis.
#   5/27/20 added more filter - tier information. 
# CAUTION: script iterates and caution should be used when executing. 
# date created: 5/18/20
# date last modified: 5/27/20 
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite (biomolecules 'keep' colulmn updated)


## "04_KAO_Exploring_GC_data_after_feature_filtering.R"
# description: looks at GC data by PCA after features have been
# filtered.
# date created: 5/18/2020
# date last modified: 5/19/2020
# input: 
# - Covid-19 Study DB.sqlite

## "05_KAO_Batch_effects_in_lipidomics_data.R"
# description: looks at batch effect of lipidomics data 
#   and in doing so, catches an initial error in the db 
#   entries for lipidomics features due to the way features 
#   were named resulting in duplicate identifiers. I will 
#   work with Dain to update the lipidomics values. 
# date created 5/19/2020
# date last modified: 5/20/2020
# input:
# - Covid-19 Study DB.sqlite
# - Lipidomics/Lipidomics_quant_results/Final_Results.csv


## X5_KAO_creating_new_lipidomics_table_to_match_original.R
# description: In file 05_KAO_Batch_effects_in_lipidomics_data.R,
#   I found that the biomolecule ids did not match up across 
#   the tables in the data frame. This code creates a csv that 
#   looks like the lipidomics_measurements table, but with 
#   updated biomolecule ids (no duplicates) and batch correction
#   to the lipiomics data - run-time correction similar to the
#   GC metabolomics data. Lipid standardized names are also updated
#   in this document. 
# issue: #7
# date created: 5/20/2020
# date last modified: 5/26/2020
# input:
# - Covid-19 Study DB.sqlite
# - Lipidomics/Lipidomics_quant_results/Final_Results.csv
# output:
# -  "../../data/lipidomics_measurements_20200523.csv"
# - above csv file was used to modify db

## "X6_KAO_Creating_pvalues_table.R"
# description: this script contains a function for likelyhood ratio testing between
#   two linear regression models. This script runs this function on all metabolomics,
#   and proteomics measurements. creates a p_value and then a q_value. These data
#   were added to the databse as pvalues table.
# issue: #4
# date created: 5/26/2020
# date last modified: 6/16/2020
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite, added table pvalues 

## "06_KAO_exploring_pvalue_histograms.R
# description: this is an exploratory data analysis of pvalues 
# generated by LR test. This looks at the overall effect 
# of confounders. 
# date created: 5/27/2020
# date last modified: 5/27/2020
# input: 
# - Covid-19 Study DB.sqlite

## "X7_KAO_updating_metadata_biomolecule_id.R"
# description: updates metadata table with non-duplicate biomolecule ids 
#   for lipidomics features. Also see file:  "X5_KAO_creating_new_lipidomics
#   _table_to_match_original.R" 
# date created: 5/30/2020
# date last modified: 5/30/2020
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite, modified metadata biomolecule_id 

## "07_KAO_Exploring_lipidomics_feature_quality.R"
# description: explore aspects of lipidomics feature quality, updates
#   keep column in the biomolecules table. 
# issue: #7 
# date created: 5/30/2020
# date last modified: 6/1/2020
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite, modified biomolecules keep  

## "08_KAO_Crossomes_correlations.R"
# description: Generates a heatmap proteins x metabolites-lipids 
#   correlations > 0.4 or < -0.4 
# issue: #9
# date created: 6/2/2020
# date last modified: 6/11/2020
# input:
# - Covid-19 Study DB.sqlite 
# - 'P:/All_20200428_COVID_plasma_multiomics/Correlation/cor_4omes_kendall.RData'
# output:
# - "heatmap_cross_ome_correlations_kendall_KAO_v2.pdf

## 09_KAO_crossome_correltions_pearson.R
# description: Generates a heatmap proteins x metabolites-lipids with
#   significant pearson correlation coefficient. 
# issue #9
# date created: 6/3/2020
# date last modified: 6/3/2020
# input:
# - Covid-19 Study DB.sqlite 
# - 'P:/All_20200428_COVID_plasma_multiomics/Correlation/cor_4omes_pearson.RData'
# output:
# - "heatmap_cross_ome_correlations_pearson_KAO_v2.pdf

## X8_KAO_transcriptomics_table_upload.R
# desciptiom: appends transcriptomics data to the db. 
# date created: 6/4/2020
# date last modified: 6/5/2020
# input: 
# - Covid-19 Study DB.sqlite
# - 'P:/All_20200428_COVID_plasma_multiomics/Transcriptomics/genes.l2ec.no_hg.norm.tsv'
# - 'P:/All_20200428_COVID_plasma_multiomics/Transcriptomics/genes.ec.no_hg.norm.tsv'
# output: 
# - Covid-19 Study DB.sqlite modified to include transciptomics_runs and transcriptomics_measurements 

## 10_KAO_hospital_free_days_ANOVA_gelsolin.Rmd
# description: For the gelsolin story, wanted to explore the effect of confounders
#  on hostpital free days. 
# date created: 6/5/20
# date last modified: 6/5/20
# input: 
# - Covid-19 Study DB.sqlite

## 11_KAO_Looking_at_effect_of_DM_status.R
# description: This script uses linear regressaion with response factor of hospital 
#  free days at 45 to see if diabetes (DM) status has any effect. There does not apprear
#  to be any significant effect with diabetes.
# date create: 6/5/20
# date last modified: 6/5/20
# input:
# - Covid-19 Study DB.sqlite

## X9_KAO_Adding_Yuchens_pvalues_into_DB.R
# description: Yuchen performed analysis on HFD for each biomolecue. 
# linear regression stats - anova(lm(biomolecule abundance ~ Hopsital_free_days_45))
# These data are found in Rdata files in regression folder and were added to the 
# pvalues table. 
# issue: #4
# date create: 6/8/20
# date last modified: 6/17/20
# input:
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite modified pvalues table. 

## X10_KAO_adding_GO_terms_to_db.R
# description: Anji extracted GO terms based on uniprot ID. this script adds that 
#  data into the db. 
# date created: 6/8/20
# date last modified: 6/8/20
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite modified metadata table. 

## 12_KAO_GO_term_enrichment_for_significant_p_values.R
# description: GO term encrichmetn for significant p_values 
# date created: 6/9/20
# date last modified: 6/9/20
# input: 
# - Covid-19 Study DB.sqlite
# output: 
# - plots/

## 13_KAO_calculating_FC_for_COVID.R
# description: calculates FC for COVID vs. non-COVID in the same 
#   manner as Ian's webtool (also see dash/plots.py)
# date created: 6/25/20
# date last modified: 6/25/20
# input:
# - Covid-19 Study DB.sqlite
# output: 
# - data/COVID_fc_by_biomolecule_ID.csv

## 14_KAO_figure_2_version_1.R
# description: This is a script which is intended to combine different 
#  omes data into one (or multiple) biological stories. Presents high level
#  view of the data and does GO enrichment. 
# date created: 6/24/20
# date last modified: 7/9/20
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - plots/ 

## X11_Adding_GO_terms_for_transcripts_into_db.R
# decription: I used Uniprot to collect GO terms for transcripts. this script adds
#   those GO terms for biological processes into the db. 
# date created: 6/26/20
# date last modified: 6/26/20
# input:
# - "data/uniprot-genelist.tab", generated 2020-06-25 from uniprot webtool
# -  Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite with modified metadata table 

## 15_Volcano_plots_for_Trent_for_Fig3.R 
# description: Trent provided me a list of proteomics features that were important 
# to specific pathways (coagulation, etc), this script plots those features relative 
# the rest of the proteome in a volcano plot for COVID status. 
# date created: 6/27/20
# date last modified: 7/9/20
# input: 
# - "data/Proteins grouped for Fig 3 Volcano Plots.csv"
# - Covid-19 Study DB.sqlite
# - "data/COVID_fc_by_biomolecule_ID.csv"
# output:
# - plots/

## 16_KAO_merging_CD3.1_results_with_Lipidex_output.R 
# description: for supplementary table with unknown matches to CD3.1 searching. 
#   this script connects CD3.1 results by mz and RT to the lipids unknowns table.
# date created: 7/9/20
# date last modified: 7/9/20
# input:
# - "P:/All_20200428_COVID_plasma_multiomics/Lipidomics/CD3_all_discovery_metabolomics_filtered.csv"
# - Covid-19 Study DB.sqlite
# output:
# - "data/Sup_table_2_merge_unknowns.csv"

## 17_KAO_Dynamic_range_for_each_ome.R
# description: Plotting the distributions of each omic data set. 
# date created: 7/19/20
# date last modified: 7/19/20
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - plot 

## 18_KAO_Comparing_WHO_score_to_HFD.R
# description: A reviewer resquested we incorporate WHO ordianl score into the database. 
# This script looks at how the HFD-45 outcome metric compares to the WHO at 28 days. 
# date created: 8/31/20
# date last modified: 8/31/20
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - plot 

## X6_KAO_Creating_pvalues_table_response_to_reviewer.R
# description: A reviewer asked about the validity of linear regression models given
# that outliers can strongly effect the fits of models. This script provides additional 
# pvalue calculation using a robust linear regression and adds to the database
# date created: 8/31/20
# date last _modifed: 8/31/20
# input: 
# - Covid-19 Study DB.sqlite
# output:
# - Covid-19 Study DB.sqlite, modified pvalues table