Diff of /eda/KAO/README.R [000000] .. [bd22c4]

Switch to side-by-side view

--- a
+++ b/eda/KAO/README.R
@@ -0,0 +1,338 @@
+##### README ###### 
+
+## "0_pathway_toolkit.R"
+# description: This script contains 2 function useful for pathway 
+#   analysis in R. Priciple is categorical terms are used to make 
+#   a master list of id-to-category relationships. Then 2nd function
+#   uses the mater list (AKA reference set) when performing enrichment
+#   analysis usign fisher exact test, outputs the enrichemnt score/pvalue
+#   and adjusted p-value for the categorical terms. This code was originally
+#   produced for the dental informatics project. 
+# issue: #9
+# date created: 11/07/2017
+# date last modified: 05/30/2020
+
+
+## "01_KAO_Establishing_connection_to_db_extracting_timeStamp.R"
+# description: Establishes DB connection using RSQLite package and 
+#     fetches time stamp information for Raw files.
+# Relevant Issue(s): 
+# date created: 5/12/20
+# date last modified: 5/12/20
+# input:
+#  - Covid-19 Study DB.sqlite
+# output:
+#  - 
+
+## "X1_KAO_Updating_GC_keep_status_in_db.R"
+# description: Sets 1:10 split GC files keep column to 0 (FALSE)
+# date created: 5/13/20
+# date last modified: 5/13/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output: 
+# - Covid-19 Study DB.sqlite (modified)
+
+## "02_KAO_Runorder_correction_for_GC_metabolomics_data.R"
+# description: Performs run order correction of the GC data and explores results.
+#   this analysis is exploratory only and does not modify the db.
+# date created: 5/13/20
+# date last modified: 5/14/20
+# input:
+# - Covid-19 Study DB.sqlite
+
+## "X2_KAO_GC_metabolomics_runtime_correction.R"
+# description: Performs run order correction and modifies db. 
+# date created: 5/14/20
+# date last modified: 5/15/20
+# input:
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite (modified metabolite_measurements table)
+
+## "X3_KAO_Updating_GC_metabolite_tier_in_DB.R"
+# description: extracts the mean tier information by molecule. This tier
+#   information is useful for filtering out poor quality metabolites and 
+#   is added to the sqlite db metadata table
+# data created: 5/15/20
+# date last modified: 5/15/20
+# input:
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite (modified metadata table)
+
+## "03_KAO_Exploring_GC_feature_quality.R"
+# description: Explores 4 metrics of GC-metabolomics feature quality -
+#   1) duplicate moleucles, 2) mean tier quality, 3) RSDs of QC sampels 
+#   within and between batches, 4) dynamic range. 
+# date created: 5/16/20
+# date last modified: 5/27/20
+# input:
+# - Covid-19 Study DB.sqlite
+
+## "X4_KAO_updating_biomolecules_keep_column_GC_metabolites.R"
+# description: modifies DB to update metabolite keep column to 
+#   denote features which should be excluded from downstream analysis.
+#   5/27/20 added more filter - tier information. 
+# CAUTION: script iterates and caution should be used when executing. 
+# date created: 5/18/20
+# date last modified: 5/27/20 
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite (biomolecules 'keep' colulmn updated)
+
+
+## "04_KAO_Exploring_GC_data_after_feature_filtering.R"
+# description: looks at GC data by PCA after features have been
+# filtered.
+# date created: 5/18/2020
+# date last modified: 5/19/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+
+## "05_KAO_Batch_effects_in_lipidomics_data.R"
+# description: looks at batch effect of lipidomics data 
+#   and in doing so, catches an initial error in the db 
+#   entries for lipidomics features due to the way features 
+#   were named resulting in duplicate identifiers. I will 
+#   work with Dain to update the lipidomics values. 
+# date created 5/19/2020
+# date last modified: 5/20/2020
+# input:
+# - Covid-19 Study DB.sqlite
+# - Lipidomics/Lipidomics_quant_results/Final_Results.csv
+
+
+## X5_KAO_creating_new_lipidomics_table_to_match_original.R
+# description: In file 05_KAO_Batch_effects_in_lipidomics_data.R,
+#   I found that the biomolecule ids did not match up across 
+#   the tables in the data frame. This code creates a csv that 
+#   looks like the lipidomics_measurements table, but with 
+#   updated biomolecule ids (no duplicates) and batch correction
+#   to the lipiomics data - run-time correction similar to the
+#   GC metabolomics data. Lipid standardized names are also updated
+#   in this document. 
+# issue: #7
+# date created: 5/20/2020
+# date last modified: 5/26/2020
+# input:
+# - Covid-19 Study DB.sqlite
+# - Lipidomics/Lipidomics_quant_results/Final_Results.csv
+# output:
+# -  "../../data/lipidomics_measurements_20200523.csv"
+# - above csv file was used to modify db
+
+## "X6_KAO_Creating_pvalues_table.R"
+# description: this script contains a function for likelyhood ratio testing between
+#   two linear regression models. This script runs this function on all metabolomics,
+#   and proteomics measurements. creates a p_value and then a q_value. These data
+#   were added to the databse as pvalues table.
+# issue: #4
+# date created: 5/26/2020
+# date last modified: 6/16/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite, added table pvalues 
+
+## "06_KAO_exploring_pvalue_histograms.R
+# description: this is an exploratory data analysis of pvalues 
+# generated by LR test. This looks at the overall effect 
+# of confounders. 
+# date created: 5/27/2020
+# date last modified: 5/27/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+
+## "X7_KAO_updating_metadata_biomolecule_id.R"
+# description: updates metadata table with non-duplicate biomolecule ids 
+#   for lipidomics features. Also see file:  "X5_KAO_creating_new_lipidomics
+#   _table_to_match_original.R" 
+# date created: 5/30/2020
+# date last modified: 5/30/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite, modified metadata biomolecule_id 
+
+## "07_KAO_Exploring_lipidomics_feature_quality.R"
+# description: explore aspects of lipidomics feature quality, updates
+#   keep column in the biomolecules table. 
+# issue: #7 
+# date created: 5/30/2020
+# date last modified: 6/1/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite, modified biomolecules keep  
+
+## "08_KAO_Crossomes_correlations.R"
+# description: Generates a heatmap proteins x metabolites-lipids 
+#   correlations > 0.4 or < -0.4 
+# issue: #9
+# date created: 6/2/2020
+# date last modified: 6/11/2020
+# input:
+# - Covid-19 Study DB.sqlite 
+# - 'P:/All_20200428_COVID_plasma_multiomics/Correlation/cor_4omes_kendall.RData'
+# output:
+# - "heatmap_cross_ome_correlations_kendall_KAO_v2.pdf
+
+## 09_KAO_crossome_correltions_pearson.R
+# description: Generates a heatmap proteins x metabolites-lipids with
+#   significant pearson correlation coefficient. 
+# issue #9
+# date created: 6/3/2020
+# date last modified: 6/3/2020
+# input:
+# - Covid-19 Study DB.sqlite 
+# - 'P:/All_20200428_COVID_plasma_multiomics/Correlation/cor_4omes_pearson.RData'
+# output:
+# - "heatmap_cross_ome_correlations_pearson_KAO_v2.pdf
+
+## X8_KAO_transcriptomics_table_upload.R
+# desciptiom: appends transcriptomics data to the db. 
+# date created: 6/4/2020
+# date last modified: 6/5/2020
+# input: 
+# - Covid-19 Study DB.sqlite
+# - 'P:/All_20200428_COVID_plasma_multiomics/Transcriptomics/genes.l2ec.no_hg.norm.tsv'
+# - 'P:/All_20200428_COVID_plasma_multiomics/Transcriptomics/genes.ec.no_hg.norm.tsv'
+# output: 
+# - Covid-19 Study DB.sqlite modified to include transciptomics_runs and transcriptomics_measurements 
+
+## 10_KAO_hospital_free_days_ANOVA_gelsolin.Rmd
+# description: For the gelsolin story, wanted to explore the effect of confounders
+#  on hostpital free days. 
+# date created: 6/5/20
+# date last modified: 6/5/20
+# input: 
+# - Covid-19 Study DB.sqlite
+
+## 11_KAO_Looking_at_effect_of_DM_status.R
+# description: This script uses linear regressaion with response factor of hospital 
+#  free days at 45 to see if diabetes (DM) status has any effect. There does not apprear
+#  to be any significant effect with diabetes.
+# date create: 6/5/20
+# date last modified: 6/5/20
+# input:
+# - Covid-19 Study DB.sqlite
+
+## X9_KAO_Adding_Yuchens_pvalues_into_DB.R
+# description: Yuchen performed analysis on HFD for each biomolecue. 
+# linear regression stats - anova(lm(biomolecule abundance ~ Hopsital_free_days_45))
+# These data are found in Rdata files in regression folder and were added to the 
+# pvalues table. 
+# issue: #4
+# date create: 6/8/20
+# date last modified: 6/17/20
+# input:
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite modified pvalues table. 
+
+## X10_KAO_adding_GO_terms_to_db.R
+# description: Anji extracted GO terms based on uniprot ID. this script adds that 
+#  data into the db. 
+# date created: 6/8/20
+# date last modified: 6/8/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite modified metadata table. 
+
+## 12_KAO_GO_term_enrichment_for_significant_p_values.R
+# description: GO term encrichmetn for significant p_values 
+# date created: 6/9/20
+# date last modified: 6/9/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output: 
+# - plots/
+
+## 13_KAO_calculating_FC_for_COVID.R
+# description: calculates FC for COVID vs. non-COVID in the same 
+#   manner as Ian's webtool (also see dash/plots.py)
+# date created: 6/25/20
+# date last modified: 6/25/20
+# input:
+# - Covid-19 Study DB.sqlite
+# output: 
+# - data/COVID_fc_by_biomolecule_ID.csv
+
+## 14_KAO_figure_2_version_1.R
+# description: This is a script which is intended to combine different 
+#  omes data into one (or multiple) biological stories. Presents high level
+#  view of the data and does GO enrichment. 
+# date created: 6/24/20
+# date last modified: 7/9/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - plots/ 
+
+## X11_Adding_GO_terms_for_transcripts_into_db.R
+# decription: I used Uniprot to collect GO terms for transcripts. this script adds
+#   those GO terms for biological processes into the db. 
+# date created: 6/26/20
+# date last modified: 6/26/20
+# input:
+# - "data/uniprot-genelist.tab", generated 2020-06-25 from uniprot webtool
+# -  Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite with modified metadata table 
+
+## 15_Volcano_plots_for_Trent_for_Fig3.R 
+# description: Trent provided me a list of proteomics features that were important 
+# to specific pathways (coagulation, etc), this script plots those features relative 
+# the rest of the proteome in a volcano plot for COVID status. 
+# date created: 6/27/20
+# date last modified: 7/9/20
+# input: 
+# - "data/Proteins grouped for Fig 3 Volcano Plots.csv"
+# - Covid-19 Study DB.sqlite
+# - "data/COVID_fc_by_biomolecule_ID.csv"
+# output:
+# - plots/
+
+## 16_KAO_merging_CD3.1_results_with_Lipidex_output.R 
+# description: for supplementary table with unknown matches to CD3.1 searching. 
+#   this script connects CD3.1 results by mz and RT to the lipids unknowns table.
+# date created: 7/9/20
+# date last modified: 7/9/20
+# input:
+# - "P:/All_20200428_COVID_plasma_multiomics/Lipidomics/CD3_all_discovery_metabolomics_filtered.csv"
+# - Covid-19 Study DB.sqlite
+# output:
+# - "data/Sup_table_2_merge_unknowns.csv"
+
+## 17_KAO_Dynamic_range_for_each_ome.R
+# description: Plotting the distributions of each omic data set. 
+# date created: 7/19/20
+# date last modified: 7/19/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - plot 
+
+## 18_KAO_Comparing_WHO_score_to_HFD.R
+# description: A reviewer resquested we incorporate WHO ordianl score into the database. 
+# This script looks at how the HFD-45 outcome metric compares to the WHO at 28 days. 
+# date created: 8/31/20
+# date last modified: 8/31/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - plot 
+
+## X6_KAO_Creating_pvalues_table_response_to_reviewer.R
+# description: A reviewer asked about the validity of linear regression models given
+# that outliers can strongly effect the fits of models. This script provides additional 
+# pvalue calculation using a robust linear regression and adds to the database
+# date created: 8/31/20
+# date last _modifed: 8/31/20
+# input: 
+# - Covid-19 Study DB.sqlite
+# output:
+# - Covid-19 Study DB.sqlite, modified pvalues table
\ No newline at end of file