[cb2392]: / GoogleCloud / Submodule05_IRI_Case_Study.ipynb

Download this file

695 lines (694 with data), 33.6 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f63519ce-ca5a-4f64-93a5-c5491b58d46b",
   "metadata": {},
   "source": [
    "<img src=\"images/RIINBRE-Logo.jpg\" width=\"400\" height=\"400\"><img src=\"images/MIC_Logo.png\" width=\"600\" height=\"600\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc55d591-83ea-4db3-9c26-23831e09008b",
   "metadata": {},
   "source": [
    "# Analysis of Biomedical Data for Biomarker Discovery\n",
    "<a id=\"top5\"></a>\n",
    "## Submodule 5: Rat Renal Ischemia Reperfusion Injury Case Study\n",
    "### Dr. Christopher L. Hemme\n",
    "### Director, [RI-INBRE Molecular Informatics Core](https://web.uri.edu/riinbre/mic/)\n",
    "### The University of Rhode Island College of Pharmacy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ca07ca3-7e42-42aa-ac12-36f85cd8cd4a",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56673148",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This Jupyter Notebook introduces a case study on rat renal ischemia-reperfusion injury (IRI) to explore biomarker discovery principles.  It provides background on kidney function, common biomarkers like serum creatinine (SCr) and blood urea nitrogen (BUN), and the mechanisms of renal IRI and its role in acute kidney injury (AKI).  The notebook also discusses the potential of treprostinil as a therapeutic agent for renal IRI.  The core of the notebook focuses on creating an \"experimental object\" in R, a structured list containing raw data, metadata, and placeholders for future analysis results.  It demonstrates loading SCr and BUN data, experimental metadata, and proteomic data into this object.  The proteomic data is preprocessed by removing rows with excessive missing values (using a custom `filtered_matrix` function) and log2 transformation. Finally, the populated experimental object is saved for use in subsequent analyses.  The notebook also includes embedded quizzes and references for further reading."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9ff7fac",
   "metadata": {},
   "source": [
    "## Learning Objectives\n",
    "\n",
    "1. **Understanding Renal Ischemia Reperfusion Injury (IRI):** The notebook introduces the biological context of renal IRI, its role in acute kidney injury (AKI), and the current challenges in treating it. It covers the physiological processes involved in IRI, including mitochondrial dysfunction and apoptosis.  It also discusses the potential of drug repurposing, specifically using Treprostinil, as a therapy for renal IRI.\n",
    "\n",
    "2. **Introduction to Biomarkers in Kidney Injury:** The notebook explains the role of established biomarkers like serum creatinine (SCr) and blood urea nitrogen (BUN) in assessing kidney function and the need for more specific biomarkers for identifying disease progression and types of kidney injuries.\n",
    "\n",
    "3. **Creating and Working with Experimental Objects in R:** A key objective is to teach how to create and populate an \"experimental object\" (an R list) to store and organize diverse data types associated with an experiment, including raw data, metadata, and analysis results.  This promotes consistent data handling and facilitates reproducible research.\n",
    "\n",
    "4. **Data Loading and Preprocessing in R:** The notebook demonstrates practical data manipulation skills in R, including:\n",
    "    * Loading data from CSV files using `read_csv`.\n",
    "    * Converting data types (e.g., to factors).\n",
    "    * Handling missing data by filtering rows with a custom function.\n",
    "    * Log2 transforming data for normalization.\n",
    "    * Subsetting and manipulating data frames.\n",
    "\n",
    "5. **Using R Packages for Data Science:** The notebook introduces and uses key R packages like `tidyverse` and `magrittr` for data manipulation and piping, illustrating their utility in data science workflows.\n",
    "\n",
    "6. **Saving R Objects:**  The notebook demonstrates how to save the created experimental object as an RDS file for later use, emphasizing good data management practices."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e9ee66e",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "1. **Create a Vertex AI Notebooks instance:** Choose your machine type and other configurations.\n",
    "2. **Select the R kernel:**  Make sure you choose the appropriate R environment when creating or opening the notebook.\n",
    "3. **Install R Packages:** The notebook itself handles the installation of the necessary R packages (`tidyverse`, `magrittr`, `IRdisplay`)\n",
    "4. **Data Storage:**  Upload your data to a Cloud Storage bucket and ensure your service account has access to read the data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fad852c-746f-4665-80ad-bf34dc68798d",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a21578cb-4a82-4409-8d0b-287086bb4d61",
   "metadata": {},
   "source": [
    "For this module, we will use as our case study a rat renal ischemia and reperfusion injury (IRI) model that consists of established metabolic biomarkers commonly used in clinical studies and proteomic data generated for the study.  This data for this module is provided by Dr. Nisanne Ghonem, Department of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island.  In this chapter we will explore the rat renal IRI model and load our initial data into a custom experimental object (in our case, an R list) which we will then use in later chapters for analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3440e627-8984-4f92-9637-41bd479222c0",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>&#9995; Tip:</b> Blue boxes will indicate helpful tips.</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de503954-faa3-4caf-a187-ade2d6ac46c7",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>&#127891; Note:</b> Used for interesting asides or notes.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d37299fd-ac17-44b8-8af3-64ab55d55559",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>&#9997; Reference:</b> This box indicates a reference for an attached figure or table.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59fff243-9787-4690-8924-ecf71b2c7fa9",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-danger\">\n",
    "<b>&#128721; Caution:</b> A red box indicates potential hazards or pitfalls you may encounter.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1a533d6-5fa4-48f2-bae3-23220b03d75f",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6d28305-05ea-462a-ab9b-fbe8948c116c",
   "metadata": {},
   "source": [
    "### Rat Renal IRI Model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f59ca0c-20ca-4bc6-b395-b350e3b0d89a",
   "metadata": {},
   "source": [
    "### Background"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f3bd6e7-1351-4750-a0ab-279f08838ab5",
   "metadata": {},
   "source": [
    "The kidney is a remarkable organ that carries out a variety of functions in vertebrates.  The primary functions of the kidney is to regulate various bodily fluids, to maintain electrolyte balance in the blood, and to filter toxins out of the blood.  For this reason, blood tests are commonly used to quickly and easily assay the functions of the kidneys and the liver.  Several serum biomarkers have been identified to indicate liver or kidney damage.  For kidney injury, two common serum biomarkers are serum creatinine (SCr) and blood urea nitrogen (BUN).  In healthy individuals, concentrations of SCr and BUN are 0.4-0.6 mg/dL and 8-18 mg/dL, respectively.  SCr concentrations >1 mg/dL are generally considered indicative of kidney dysfunction.  While these biomarkers are indicative of kidney injury, they alone do not indicate the type of injury that has occurred.  For this reason, there is an interest in identifying additional biomarkers that may be more precise in identifying disease progression and which may be more strongly correlated to specific types of kidney injuries or diseases."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06a8a369-3e45-4148-8b61-89f48786822b",
   "metadata": {},
   "source": [
    "<img src=\"images/Kidney_Disease.png\" width=\"600\" height=\"600\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1141ae67-f7cf-4c50-b086-69f6679fd5b7",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>&#9997; Reference:</b> Creatinine (<a href=\"https://pubchem.ncbi.nlm.nih.gov/compound/588\">PubChem CID:588</a>)<br>\n",
    "Created in <a href=\"https://biorender.com/\">BioRender</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac45ccd1-d337-47d3-a192-ecb8b7c429fe",
   "metadata": {},
   "source": [
    "### Renal Ischemia Reperfusion Injury (IRI) and Its Role in Acute Kidney Injury (AKI)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a4fe893-7d3c-4623-834f-c2d77640c9d5",
   "metadata": {},
   "source": [
    "Over time, kidneys can lose the ability to filter toxins from the blood.  This damage can be caused by a variety of factors, including chronic hypertension, e.g., high blood pressure, diabetes, and several kidney-specific diseases.  When the kidney has lost 90% of its function, the patient is said to be in end-stage renal disease and requires dialysis support to remove toxins from the blood.  At this point, kidney transplantation from a healthy donor is the only way to restore normal kidney function.  However, kidney transplantation itself puts great stress on the donor kidney.  In addition to the risks of rejection by the host immune system, a kidney from a healthy donor is temporarily removed from the supply of oxygenated blood (ischemia).  When the kidney graft is reattached in the new host, the sudden flow of oxygenated blood (reperfusion) can cause inflammation and oxidative stress to the transplanted organ, leading to acute kidney injury (AKI).  The inflammatory cascade in AKI can include increases in neutrophil and leukocyte activation and increased concentrations of reactive oxygen species, pro-inflammatory cytokines, and chemokines.  Severe cases of AKI can lead to mitochondrial dysfunction, apoptosis (programmed cell death), or total organ failure.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e860919f-20c4-4a54-be51-6ef8dcd54c56",
   "metadata": {},
   "source": [
    "<img src=\"images/Renal_IRI.png\" width=\"600\" height=\"600\"><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21f81fe5-7fe7-4926-a159-fdc055c362df",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>&#9997; Reference:</b> Created in <a href=\"https://biorender.com/\">BioRender</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e8a5cd8-b3d8-4506-8edc-80ef2e52b979",
   "metadata": {},
   "source": [
    "In healthy cells, mitochondria produce the majority of ATP using oxidative phosphorylation employ a variety of quality control functions to ensure a stable number of healthy mitochondria and adequate energy reserves.  While we often think of mitochondria as individual, isolated organelles, they actually form a complex network regulated through biogenesis (new organelles), fission (organelle division) and fusion (organelle combination).  This allows mitochondria to share resources, ensure a stable number of healthy mitochondria, and remove damaged mitochondria.  During ischaemia, ATP stores are depleted, resulting a variety of effects.  Mitochondrial network balance is disrupted as the lack of ATP inhibits mitochondria biogenesis, and the balance in the mitochondrial network shifts to fission resulting in fragmentation of the network which can impair kidney repair.  This in turn leads to mitochondrial outer membrane permeabilization (MOMP) which allows release of cytochrome c into the cytosol.  This, coupled with disruption of Na<sup>2</sup> gradients and increased concentrations of reactive oxygen species (ROS), triggers damage to the mitochondria and apoptosis (programmed cell death) of the renal cell. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3048b92-74c7-485b-9755-89d2af563e54",
   "metadata": {},
   "source": [
    "<img src=\"images/Mitochondria.png\" width=\"600\" height=\"600\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c757be3-3ba5-46f8-8385-9d14e2e2af4c",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>&#9997; Reference:</b> Created in <a href=\"https://biorender.com/\">BioRender</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f23edda2-6b9e-47d0-8db9-6fd7c7e7841e",
   "metadata": {},
   "source": [
    "### Drug Repurposing: Treprostinil as a Possible Therapy for Renal IRI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff4ac1e-190f-40b4-a102-da41be44677b",
   "metadata": {},
   "source": [
    "At this time, there is no approved pharmacological treatment for renal IRI.  The development of new drugs is an expensive and arduous process, and gaining regulatory approval adds more time and expense to the process.  For this reason, many researchers search for currently available drugs which have been FDA approved for use and which can be repurposed to treat conditions for which they weren't originally designed.\n",
    "\n",
    "Prostacyclin (PGI<sub>2</sub>) is a member of the prostaglandin family and is a potent vasodilator.  PGI<sub>2</sub> increases blood flow during renal failure, preserves intrarenal oxygenation during AKI, reduces pro-inflammatory cytokines and adhesion molecules during AKI, and inhibits AKI-induced proximal tubular cell apoptosis.  For this reason, PGI<sub>2</sub> analogs have been tested as possible therapeutic treatments for reducing renal IRI.  One such drug, treprostinil (Remodulin®), is FDA approved, chemically stable at room temperature, and has an increased potency and longer elimination half-life than other commercially available PGI<sub>2</sub> analogs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c665aadc-45cf-42e8-aaa6-491210722691",
   "metadata": {},
   "source": [
    "<img src=\"images/prosticyclin_treprostinil.png\" width=\"500\" height=\"500\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "208f678e-d02d-48f9-b930-ac6aefd212c7",
   "metadata": {},
   "source": [
    "In this chapter, we will utilize rat renal IRI data generated by Dr. Ghonem's laboratory, which explores the use of treprostinil as a therapeutic agent for treatment of renal IRI.  The data includes the quantitative biomarkers SCr and BUN, which are well-established biomarkers for kidney injury, and will also use proteomic data to identify potential biomarkers in the kidney cells themselves.  We will use a variety of methods to compare the biomarkers including exploratory analysis (PCA, clustering, heatmaps, etc.), linear and logistic regression, and machine learning.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6d039d9-1aa5-47ff-8c77-36e4b36edfc2",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>&#9997; Reference:</b> Creatinine (<a href=\"https://pubchem.ncbi.nlm.nih.gov/compound/588\">PubChem CID:588</a>)<br>\n",
    "Created in <a href=\"https://biorender.com/\">BioRender</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c84cba8d-36fe-4572-9074-83fbbd9b4fa7",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0155e8c7-2f00-409a-b61a-cb98e3dd5142",
   "metadata": {},
   "source": [
    "## Creating an Experimental Object"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59f02050-fbf0-4332-8c0d-ee8ef921cf8d",
   "metadata": {},
   "source": [
    "### What is an Experimental Object?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86c09f9b-5a16-438d-b331-708820d81cd4",
   "metadata": {},
   "source": [
    "A scientific experiment includes a variety of different types of data, which may include:\n",
    "<ul>\n",
    "    <li>Generated experimental data</li>\n",
    "    <li>Normalized or batch corrected data</li>\n",
    "    <li>Metadata (also called phenotypic data) about the experiment</li>\n",
    "    <li>Experimental protocols (e.g. instrumentation used to generate the data, dates data was generated, etc.)</li>\n",
    "    <li>Results of analyses (e.g. regression tables, etc.)</li>\n",
    "    <li>Plots, graphs, and other visualizations</li>\n",
    "</ul>\n",
    "\n",
    "When working with related data, it is good practice to collect the data into a single data structure that we will call an experimental object.  By collecting our experimental data into an experimental object, we ensure that it's in a consistently formatted data structure and that multiple experimental objects can be analyzed in the same way (such as in an automated pipeline) which minimizes human error.  You can even go one step further and collect related experimental objects into a project-type data structure (similar to the NCBI BioProjects concept).  BioConductor offers custom data structures such as the <i>SummarizedExperiment</i> object, which is designed for omics-style data.  On its most basic level though, an experimental object is just a simple list.  Using a list allows us to customize how our experimental object is organized.  For example, we might want a simple experimental list that only contains the raw data and metadata, or we might want a complex list that holds all of the experimental output of our analyses.  As with all lists, it is up to you to remember how the data is organized."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ca0af03-49ff-403a-960a-3f3fd8687a4c",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>&#127891; Note:</b> For those of you who are programmers, our experimental object is not strictly speaking an object as you would use in object-oriented programming, but it's conceptually the same idea in that we're trying to gather related data into a single collection.  There's no reason, though, why you couldn't modify the experimental object into a true OOP-style object. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82277579-f0d3-448a-ac63-3d27cf74f045",
   "metadata": {},
   "source": [
    "We'll begin by loading some required packages.  *tidyverse* is a series of packages built around the idea of tidy data and is often used in data science to complement or replace base R.  *magittr* is a package that implements piping functionality, that is, it allows functions to be chained together such that the output for one operation becomes the input for a second.  This greatly simplifies many operations in R such as data transformation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e37d2199-b5e4-405f-9ba3-eebfd20097cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "packages <- c(\"tidyverse\", \"magrittr\")\n",
    "installed_packages <- packages %in% rownames(installed.packages())\n",
    "if (any(installed_packages == FALSE)) {install.packages(packages[!installed_packages])}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5180b1d-1f75-4378-b5be-d2f82292da97",
   "metadata": {},
   "outputs": [],
   "source": [
    "require(\"tidyverse\")\n",
    "require(\"magrittr\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f54e4e22-5f1a-4bda-b564-0596d463f8f2",
   "metadata": {},
   "source": [
    "Let's create a structure for our list that we can later populate with data.  You can examine the structure of any R data type using the <i>str</i> function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "098c13b5-718b-4494-855c-23d1b6cf0239",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build empty experimental object list\n",
    "exp_obj <- list(\n",
    "    data = list(\n",
    "        biomarkers = NA,\n",
    "        proteomics = list(\n",
    "            log2 = NA,\n",
    "            norm = NA\n",
    "        )\n",
    "    ),\n",
    "    metadata = NA\n",
    ")\n",
    "# Examine the structure of the list\n",
    "str(exp_obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf2fa03c-5fd5-4c86-b5de-7c77e2927d56",
   "metadata": {},
   "source": [
    "Now we'll start loading data.  First, we will load data for two continuous variable biomarkers, serum creatinine (SCr) and blood urea nitrogen (BUN).  Second, we will load the experiment metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88ff3915-4c10-40de-85ae-bc20c6b19a4b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load biomarker data\n",
    "\n",
    "# Download data directory from storage bucket\n",
    "system(\"gsutil cp -r gs://nigms-sandbox/nosi-uri/data .\")\n",
    "\n",
    "exp_obj$data$biomarkers <- read_csv(\"data/Renal_IRI_Proteomics/IRI_Biomarkers.csv\", show_col_types=FALSE)\n",
    "# Load metadata and convert covariate columns to factors\n",
    "exp_obj$metadata <- read_csv(\"data/Renal_IRI_Proteomics/metadata.csv\", col_types=\"cfff\")\n",
    "# Reset levels for the Treatment covariate\n",
    "exp_obj$metadata$Treatment <- factor(exp_obj$metadata$Treatment, levels=c('CTRL', 'SHAM', 'PLB', 'TRE'))\n",
    "str(exp_obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6b51d9f-04a2-4a03-92b8-3d2b49fddb8e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "    <b>&#9995; Tip:</b> <i>read_csv</i> is the <i>readr</i> version of the base R <i>read.csv</i> function.  <i>read_csv</i> reads comma separated data into a tibble and tries to predict the column data types.  In our case, most of our columns will be covariates in regression models and will need to be assigned as factors, so we manually do that with the <i>col_types</i> argument (\"cfff\" means \"character factor factor factor\" for our four columns).</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95617da8-db45-4c62-a587-faf0ba9878b8",
   "metadata": {},
   "source": [
    "Next we will load the proteomics data so that we can save it for later.  R can be picky about how columns are named, so to avoid any issues, we will load sample names that will keep R happy.  We will also remove some of the extra columns from our data frame so that we only have the <b>Genes</b> and sample columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b49e2a0c-e296-443c-ab33-c5680de99e5e",
   "metadata": {},
   "outputs": [],
   "source": [
    "prot_df <- read_csv(\"data/Renal_IRI_Proteomics/IRI_Proteomics_2019.csv\", show_col_types=FALSE)\n",
    "sample_names <- readRDS(file = \"data/Renal_IRI_Proteomics/IRI_sample_names.rds\")\n",
    "prot_df <- prot_df[, c(\"Genes\", sample_names)]\n",
    "prot_df <- prot_df %>%\n",
    "   mutate(across(all_of(sample_names), as.numeric))\n",
    "prot_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae19c0b7-a2c7-4df8-8b31-eb320e07deec",
   "metadata": {},
   "source": [
    "Not every row will contain useful information.  Missing data is always a problem, so we will remove low information rows using a custom function created for this analysis called <i>filtered_matrix</i>.  The function will take our original matrix, a cutoff value (between 0 and 100, in our case 80) and our id column (<b>Genes</b>) and will return a matrix with rows removed that contain more than 80% missing data.  In this way, we can set the stringency of our filtering using a single function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61cbeac2-dc9e-4e4c-8aba-b841637038c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "filtered_matrix <- function(data_df, data_cols, cutoff, id_col) {\n",
    "    \n",
    "  data_matrix <- function(data_df, id_col) {\n",
    "    data_mat <- as.matrix(data_df[-1])\n",
    "    rownames(data_mat) <- data_df[, id_col][[1]]\n",
    "    data_mat\n",
    "  }\n",
    "    \n",
    "  data_df[data_df == 0] <- NA\n",
    "  filtered_rows <- 100*rowSums(is.na(data_df[, data_cols]))/length(data_cols) > cutoff\n",
    "  data_mat <- data_matrix(data_df[!filtered_rows,], id_col)\n",
    "  data_mat\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d03a42f4-ce36-4719-8c96-19f128e8ab23",
   "metadata": {},
   "source": [
    "Finally, we will log2 transform the data.  Log transformation is common for proteomics and metabolomics data because the concentrations of individual features (e.g. proteins) can vary across a wide range and often don't follow a normal distribution.  We will perform additional normalization of the data in the proteomics chapters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "962cdf6c-944f-4f17-ba09-ea997643ef17",
   "metadata": {},
   "outputs": [],
   "source": [
    "prot_mat <- filtered_matrix(prot_df, sample_names, 80, \"Genes\")\n",
    "prot_mat <- log2(prot_mat)\n",
    "prot_mat"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1b489bd-8831-4961-8fde-926367488955",
   "metadata": {},
   "source": [
    "Let's store our log2 transformed data and take one last look at our final experimental object one more time before we finish."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "814dba75-4ed0-45ab-9f95-4955ef8732c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "exp_obj$data$proteomics$log2 <- prot_mat\n",
    "str(exp_obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f65b213-ed37-4c7e-a12f-564c760df4eb",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09837367-9b84-4456-bd86-fa1b20393335",
   "metadata": {},
   "source": [
    "Now let's save our experimental object as a file so that we can load it in later chapters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c1024fc-c082-4f41-a642-71ee587085f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "system(\"mkdir ./data/Saved_Data\")\n",
    "saveRDS(exp_obj, file = \"data/Saved_Data/exp_obj.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96715b74-d62d-4ed1-9480-d7190ece8a39",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dfcf247",
   "metadata": {},
   "source": [
    "<p><span style=\"font-size: 30px\"><b>Quizzes</b></span> <span style=\"float : inline;\">(run the command below to display the quizzes)</span> </p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "afa3a195",
   "metadata": {},
   "outputs": [],
   "source": [
    "IRdisplay::display_html('<iframe src=\"quizes/Chapter5_Quizes.html\" width=100% height=450></iframe>')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b95072cf",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca87a632",
   "metadata": {},
   "outputs": [],
   "source": [
    "sessionInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac5169b0",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac5a6453",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This notebook introduced the biological context of renal ischemia reperfusion injury (IRI), its implications in acute kidney injury (AKI), and the potential of treprostinil as a therapeutic intervention.  We established a foundational understanding of the mechanisms involved in renal IRI, highlighting the roles of mitochondrial dysfunction, inflammation, and oxidative stress.  Furthermore, we constructed a custom experimental object in R, efficiently organizing the biomarker and proteomic data, along with essential metadata, which will streamline downstream analyses in subsequent chapters.  This structured approach ensures data consistency and reproducibility as we explore the data using various statistical and machine learning techniques to identify potential biomarkers and evaluate the efficacy of treprostinil in mitigating renal IRI."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eae1b2ea",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "\n",
    "Remember to move to the next notebook or shut down your instance if you are finished."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c1808d8-c522-463a-8cc6-eac5fd2290d2",
   "metadata": {},
   "source": [
    "<div style=\"display: flex; justify-content: center; margin-top: 20px; width: 100%;\"> \n",
    "    <div style=\"display: flex; justify-content: space-between; width: 50%;\"> \n",
    "        <div> \n",
    "            <a href=https://github.com/NIGMS/Analysis-of-Biomedical-Data-for-Biomarker-Discovery/blob/master/GoogleCloud/Submodule04_Intro_to_Exploratory_Analysis.ipynb#overview>Previous section</a>                                            \n",
    "        </div> \n",
    "        <div> \n",
    "            <a href=\"#top5\">Top of this page</a>                                                      \n",
    "        </div> \n",
    "        <div> \n",
    "            <a href=https://github.com/NIGMS/Analysis-of-Biomedical-Data-for-Biomarker-Discovery/blob/master/GoogleCloud/Submodule06_Linear_and_Logistic_Regression_Biomarkers.ipynb#overview>Next section</a>\n",
    "        </div> \n",
    "    </div>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99f80099-1044-4d04-a212-5d84b4a22d15",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8b4a3ad-acf8-4c8b-94f2-11f2c0c30284",
   "metadata": {},
   "source": [
    "[Shiva N, Sharma N, Kulkarni YA, Mulay SR, Gaikwad AB. Renal ischemia/reperfusion injury: An insight on in vitro and in vivo models. Life Sci. 2020 Sep 1;256:117860. doi: 10.1016/j.lfs.2020.117860. Epub 2020 Jun 11. PMID: 32534037.][shiva]<br>\n",
    "[Hou J, Tolbert E, Birkenbach M, Ghonem NS. Treprostinil alleviates hepatic mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Nov;143:112172. doi: 10.1016/j.biopha.2021.112172. Epub 2021 Sep 21. PMID: 34560548; PMCID: PMC8550798.][hou]<br>\n",
    "[Ding M, Tolbert E, Birkenbach M, Gohh R, Akhlaghi F, Ghonem NS. Treprostinil reduces mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Sep;141:111912. doi: 10.1016/j.biopha.2021.111912. Epub 2021 Jul 15. PMID: 34328097; PMCID: PMC8429269.][ding]<br>\n",
    "[Mayo Clinic Kidney Transplant FAQs][mayo]<br>\n",
    "[Malek M, Nematbakhsh M. Renal ischemia/reperfusion injury; from pathophysiology to treatment. J Renal Inj Prev. 2015 Jun 1;4(2):20-7. doi: 10.12861/jrip.2015.06. PMID: 26060833; PMCID: PMC4459724.][malek]<br>\n",
    "[Tang C, Cai J, Yin XM, Weinberg JM, Venkatachalam MA, Dong Z. Mitochondrial quality control in kidney injury and repair. Nat Rev Nephrol. 2021 May;17(5):299-318. doi: 10.1038/s41581-020-00369-0. Epub 2020 Nov 24. PMID: 33235391; PMCID: PMC8958893.][tang]\n",
    "\n",
    "[ding]: https://pubmed.ncbi.nlm.nih.gov/34328097/ \"Ding M, Tolbert E, Birkenbach M, Gohh R, Akhlaghi F, Ghonem NS. Treprostinil reduces mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Sep;141:111912. doi: 10.1016/j.biopha.2021.111912. Epub 2021 Jul 15. PMID: 34328097; PMCID: PMC8429269.\"\n",
    "[hou]: https://pubmed.ncbi.nlm.nih.gov/34560548/ \"Hou J, Tolbert E, Birkenbach M, Ghonem NS. Treprostinil alleviates hepatic mitochondrial injury during rat renal ischemia-reperfusion injury. Biomed Pharmacother. 2021 Nov;143:112172. doi: 10.1016/j.biopha.2021.112172. Epub 2021 Sep 21. PMID: 34560548; PMCID: PMC8550798.\"\n",
    "[shiva]: https://pubmed.ncbi.nlm.nih.gov/32534037/ \"Shiva N, Sharma N, Kulkarni YA, Mulay SR, Gaikwad AB. Renal ischemia/reperfusion injury: An insight on in vitro and in vivo models. Life Sci. 2020 Sep 1;256:117860. doi: 10.1016/j.lfs.2020.117860. Epub 2020 Jun 11. PMID: 32534037.\"\n",
    "[mayo]: https://www.mayoclinic.org/tests-procedures/kidney-transplant/about/pac-20384777 \"Mayo Clinic Kidney Transplant FAQs\"\n",
    "[malek]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4459724/ \"Malek M, Nematbakhsh M. Renal ischemia/reperfusion injury; from pathophysiology to treatment. J Renal Inj Prev. 2015 Jun 1;4(2):20-7. doi: 10.12861/jrip.2015.06. PMID: 26060833; PMCID: PMC4459724.\"\n",
    "[tang]: https://pubmed.ncbi.nlm.nih.gov/33235391/ \"Tang C, Cai J, Yin XM, Weinberg JM, Venkatachalam MA, Dong Z. Mitochondrial quality control in kidney injury and repair. Nat Rev Nephrol. 2021 May;17(5):299-318. doi: 10.1038/s41581-020-00369-0. Epub 2020 Nov 24. PMID: 33235391; PMCID: PMC8958893.\""
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R (Local)",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.4.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}