b/tutorials/2_Labeling.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "43f4d50c-4e7b-4652-9701-be9366ff70c4",
+   "metadata": {},
+   "source": [
+    "# Labeling\n",
+    "\n",
+    "A core component of FEMR is labeling patients.\n",
+    "\n",
+    "Labels within FEMR follow the [label schema within MEDS](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L70).\n",
+    "\n",
+    "Per MEDS, each label consists of three attributes:\n",
+    "\n",
+    "* `patient_id` (int64): The identifier for the patient to predict on\n",
+    "* `prediction_time` (datetime.datetime): The timestamp for when the prediction should be made. This indicates what features are allowed to be used for prediction.\n",
+    "* `boolean_value` (bool): The target to predict\n",
+    "\n",
+    "Additional types of labels will be added to MEDS over time, and then supported here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c6ac5c41-bc99-4731-ad82-7152274c67e1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import shutil\n",
+    "import os\n",
+    "\n",
+    "TARGET_DIR = 'trash/tutorial_2'\n",
+    "\n",
+    "if os.path.exists(TARGET_DIR):\n",
+    "    shutil.rmtree(TARGET_DIR)\n",
+    "\n",
+    "os.mkdir(TARGET_DIR)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e98dd85",
+   "metadata": {},
+   "source": [
+    "# Demonstration of some example labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8d9e2ccd-71c2-4ae0-897b-7ec022f9fdf4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/esteinberg/miniconda3/envs/debug_document_femr/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "# We can construct these labels manually\n",
+    "\n",
+    "import femr.labelers\n",
+    "import datetime\n",
+    "import meds\n",
+    "\n",
+    "# Predict False on March 2nd, 1994\n",
+    "example_label = {'patient_id': 100, 'prediction_time': datetime.datetime(1994, 3, 2), 'boolean_value': False}\n",
+    "\n",
+    "# Predict True on March 2nd, 2009\n",
+    "example_label2 = {'patient_id': 100, 'prediction_time': datetime.datetime(2009, 3, 2), 'boolean_value': True}\n",
+    "\n",
+    "\n",
+    "# Multiple labels are stored using a list\n",
+    "labels = [example_label, example_label2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e77b1bfc-8d2d-4f79-b855-f90b3a73736e",
+   "metadata": {},
+   "source": [
+    "# Generating labels programatically within FEMR\n",
+    "\n",
+    "One core feature of FEMR is the ability to algorithmically generate labels through the use of a labeling function class.\n",
+    "\n",
+    "The core for FEMR's labeling code is the abstract base class [Labeler](https://github.com/som-shahlab/femr/blob/main/src/femr/labelers/core.py#L40).\n",
+    "\n",
+    "Labeler has one abstract methods:\n",
+    "\n",
+    "```python\n",
+    "def label(self, patient: meds.Patient) -> List[meds.Label]:\n",
+    "    Generate a list of labels for a patient\n",
+    "```\n",
+    "\n",
+    "Note that the patient is assumed to be the [MEDS Patient schema](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L18).\n",
+    "\n",
+    "Once this method is implemented, the apply function becomes available for generating labels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "9ac22dbe-ef34-468a-8ab3-673e58e5a920",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 3040.98 examples/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'patient_id': 100, 'prediction_time': datetime.datetime(1992, 7, 15, 0, 0), 'boolean_value': False}\n",
+      "{'patient_id': 101, 'prediction_time': datetime.datetime(1992, 8, 20, 0, 0), 'boolean_value': False}\n",
+      "{'patient_id': 102, 'prediction_time': datetime.datetime(1991, 4, 13, 0, 0), 'boolean_value': True}\n",
+      "{'patient_id': 103, 'prediction_time': datetime.datetime(1990, 10, 19, 0, 0), 'boolean_value': False}\n",
+      "{'patient_id': 104, 'prediction_time': datetime.datetime(1990, 6, 15, 0, 0), 'boolean_value': True}\n",
+      "{'patient_id': 105, 'prediction_time': datetime.datetime(1990, 6, 29, 0, 0), 'boolean_value': True}\n",
+      "{'patient_id': 106, 'prediction_time': datetime.datetime(1992, 5, 25, 0, 0), 'boolean_value': True}\n",
+      "{'patient_id': 107, 'prediction_time': datetime.datetime(1992, 5, 29, 0, 0), 'boolean_value': False}\n",
+      "{'patient_id': 108, 'prediction_time': datetime.datetime(1991, 10, 20, 0, 0), 'boolean_value': True}\n",
+      "{'patient_id': 109, 'prediction_time': datetime.datetime(1991, 6, 25, 0, 0), 'boolean_value': True}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from typing import List\n",
+    "import femr.pat_utils\n",
+    "import datasets\n",
+    "\n",
+    "class IsMaleLabeler(femr.labelers.Labeler):\n",
+    "    # Dummy labeler to predict gender at birth\n",
+    "    \n",
+    "    def label(self, patient: meds.Patient) -> List[meds.Label]:\n",
+    "        is_male = any('Gender/M' == measurement['code'] for event in patient['events'] for measurement in event['measurements'])\n",
+    "        return [{\n",
+    "            'patient_id': patient['patient_id'], \n",
+    "            'prediction_time': femr.pat_utils.get_patient_birthdate(patient),\n",
+    "            'boolean_value': is_male,\n",
+    "        }]\n",
+    "    \n",
+    "dataset = datasets.Dataset.from_parquet(\"input/meds/data/*\")\n",
+    "\n",
+    "labeler = IsMaleLabeler()\n",
+    "labeled_patients = labeler.apply(dataset)\n",
+    "\n",
+    "for i in range(10):\n",
+    "    print(labeled_patients[100 + i])\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "20bd7859",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We can use pyarrow to save these labels to a csv\n",
+    "import pyarrow\n",
+    "import pyarrow.csv\n",
+    "\n",
+    "table = pyarrow.Table.from_pylist(labeled_patients, schema=meds.label)\n",
+    "pyarrow.csv.write_csv(table, \"trash/tutorial_2/labels.csv\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}