# Labeling

A core component of FEMR is labeling patients.

Labels within FEMR follow the [label schema within MEDS](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L70).

Per MEDS, each label consists of three attributes:

* `patient_id` (int64): The identifier for the patient to predict on
* `prediction_time` (datetime.datetime): The timestamp for when the prediction should be made. This indicates what features are allowed to be used for prediction.
* `boolean_value` (bool): The target to predict

Additional types of labels will be added to MEDS over time, and then supported here.

In [1]:
import shutil
import os

TARGET_DIR = 'trash/tutorial_2'

if os.path.exists(TARGET_DIR):
    shutil.rmtree(TARGET_DIR)

os.mkdir(TARGET_DIR)

# Demonstration of some example labels

In [2]:
# We can construct these labels manually

import femr.labelers
import datetime
import meds

# Predict False on March 2nd, 1994
example_label = {'patient_id': 100, 'prediction_time': datetime.datetime(1994, 3, 2), 'boolean_value': False}

# Predict True on March 2nd, 2009
example_label2 = {'patient_id': 100, 'prediction_time': datetime.datetime(2009, 3, 2), 'boolean_value': True}


# Multiple labels are stored using a list
labels = [example_label, example_label2]

  from .autonotebook import tqdm as notebook_tqdm


# Generating labels programatically within FEMR

One core feature of FEMR is the ability to algorithmically generate labels through the use of a labeling function class.

The core for FEMR's labeling code is the abstract base class [Labeler](https://github.com/som-shahlab/femr/blob/main/src/femr/labelers/core.py#L40).

Labeler has one abstract methods:

```python
def label(self, patient: meds.Patient) -> List[meds.Label]:
    Generate a list of labels for a patient
```

Note that the patient is assumed to be the [MEDS Patient schema](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L18).

Once this method is implemented, the apply function becomes available for generating labels.

In [3]:
from typing import List
import femr.pat_utils
import datasets

class IsMaleLabeler(femr.labelers.Labeler):
    # Dummy labeler to predict gender at birth
    
    def label(self, patient: meds.Patient) -> List[meds.Label]:
        is_male = any('Gender/M' == measurement['code'] for event in patient['events'] for measurement in event['measurements'])
        return [{
            'patient_id': patient['patient_id'], 
            'prediction_time': femr.pat_utils.get_patient_birthdate(patient),
            'boolean_value': is_male,
        }]
    
dataset = datasets.Dataset.from_parquet("input/meds/data/*")

labeler = IsMaleLabeler()
labeled_patients = labeler.apply(dataset)

for i in range(10):
    print(labeled_patients[100 + i])



Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 3040.98 examples/s]

{'patient_id': 100, 'prediction_time': datetime.datetime(1992, 7, 15, 0, 0), 'boolean_value': False}
{'patient_id': 101, 'prediction_time': datetime.datetime(1992, 8, 20, 0, 0), 'boolean_value': False}
{'patient_id': 102, 'prediction_time': datetime.datetime(1991, 4, 13, 0, 0), 'boolean_value': True}
{'patient_id': 103, 'prediction_time': datetime.datetime(1990, 10, 19, 0, 0), 'boolean_value': False}
{'patient_id': 104, 'prediction_time': datetime.datetime(1990, 6, 15, 0, 0), 'boolean_value': True}
{'patient_id': 105, 'prediction_time': datetime.datetime(1990, 6, 29, 0, 0), 'boolean_value': True}
{'patient_id': 106, 'prediction_time': datetime.datetime(1992, 5, 25, 0, 0), 'boolean_value': True}
{'patient_id': 107, 'prediction_time': datetime.datetime(1992, 5, 29, 0, 0), 'boolean_value': False}
{'patient_id': 108, 'prediction_time': datetime.datetime(1991, 10, 20, 0, 0), 'boolean_value': True}
{'patient_id': 109, 'prediction_time': datetime.datetime(1991, 6, 25, 0, 0), 'boolean_value': 




In [4]:
# We can use pyarrow to save these labels to a csv
import pyarrow
import pyarrow.csv

table = pyarrow.Table.from_pylist(labeled_patients, schema=meds.label)
pyarrow.csv.write_csv(table, "trash/tutorial_2/labels.csv")