PacemakerTranslocations / Git / [7371fa] /README_pm_transloc

Datasets:
Helen-Ryan/
PacemakerTranslocations
Downloads: 2
[7371fa]: / README_pm_transloc_data.txt
History
Download this file
145 lines (83 with data), 9.1 kB

This README_pm_transloc_data.txt file was generated on 02-14-2022 by Christopher S. Dunham

GENERAL INFORMATION

1. Dataset Title: Data from: Pacemaker translocations and power laws in 2D stem cell-derived cardiomyocyte cultures.

2. Author Information:
        Corresponding Author
                Name: Christopher S. Dunham
                Institution: University of California, Los Angeles
                Email: csdunham@chem.ucla.edu

        Co-author 1
                Name: Madelynn E. Mackenzie
                Institution: University of California, Los Angeles

        Co-author 2
                Name: Dr. Haruko Nakano
                Institution: University of California, Los Angeles

        Co-author 3
                Name: Alexis R. Kim
                Institution: University of California, Los Angeles

        Co-author 4
                Name: Michal B. Juda
                Institution: University of California, Los Angeles

        Co-author 5
                Name: Prof. Atsushi Nakano
                Institution: University of California, Los Angeles

        Co-author 6 (and Corresponding Author 2)
                Name: Dr. Adam Z. Stieg
                Institution: California NanoSystems Institute, University of California, Los Angeles
Email: stieg@cnsi.ucla.edu                

        Co-author 7
Name: Prof. James K. Gimzewski
Institution: University of California, Los Angeles

3. Date of data collection: 2015-2021

4. Recommended citation for this dataset: Dunham, Christopher et al. (2022), Data from: Pacemaker translocations and power laws in 2D stem cell-derived cardiomyocyte cultures, Dryad, Dataset, https://doi.org/10.5068/D1PD72

DATA & FILE OVERVIEW

1. Dataset Description

These data were generated to investigate stem cell-derived cardiomyocyte pacemaker translocations. Pacemaker translocations are defined as instances of spatiotemporal instability of the position of the pacemaker region, i.e. movement of the pacemaker region within a cardiomyocyte culture, as determined by analysis using microelectrode arrays (MEAs). Upon determining the quiescent period of the pacemaker translocation – that is, how long the culture’s identifiable pacemaker region remains situated in one position – the data were then analyzed using the “powerlaw” Python module available through pip and PyPI. 

This dataset contains only pacemaker translocation quiescent periods recorded across 30 MEA recordings representing 3 distinct differentiations. The recordings, which are not technically necessary to replicate the findings presented in this publication, are available in a separate repository located on Box at the following link: https://ucla.box.com/v/pacemaker-transloc-data. Should this link not work for any reason, please contact either Christopher S. Dunham or Adam Z. Stieg via the email addresses provided in the Author Information section.

2. File List
        Required Files
File 1 Name: MEA_Translocations_Dictionary.txt
File 1 Description: This file contains a Python dictionary variable that you may enter into either a Jupyter Notebook or terminal application, based on your personal preference. Each key in the dictionary represents a dataset from the optional file, untreated_MEA120_only.xlsx, described below. The keys are formatted as follows: “Key_Recording#_#OfBeatsInRecording”. Thus, for the first entry in the dictionary, “Key_2_613” represents translocation quiescent periods detected for the second recording in the batch, and in this recording, 613 beats were detected. Each value, presented in the form of a Python list (array), contains the quiescent period length, in beats, measured using the algorithm described in the original article.  In addition, the list (array) form of the data is also provided, and is named as the “MEA120_translocs” variable below the dictionary.

        Optional Files
        File 2 Name: PLoS_Pacemaker_Translocations_Parameter_List.xlsx
File 2 Description: This (optional) spreadsheet contains the file names of each recording associated with each differentiation, as they are listed in the Box directory. The spreadsheet contains information for the minimum peak height and minimum peak distance used by the ‘find_peaks’ algorithm (see documentation for SciPy’s ‘find_peaks’) to identify beats. The number of channels excluded from analysis for each 120 electrode MEA are shown in the “# Excluded Channels” column, and the age of the differentiated cells at the time of plating is shown under the “Plate Age” column, in days. Finally, the “Comments” column includes information about silenced electrodes and whether a given recording was unusable. This column also includes information regarding whether the recordings needed to be truncated, i.e. whether only a narrower recording window was required in order to analyze the data.

        File 3 Name: untreated_MEA120_only.xlsx
File 3 Description: Batch analysis file. This file serves as a concise summary of the parameter file (File 2) and also as a batch file for use in a custom, free and open source software that will be introduced as part of a later publication. This README will be updated following the release of the software.

METHODOLOGICAL INFORMATION

The data shown in this article can be replicated through the use of Python 3, a Jupyter Notebook, and installation of the Matplotlib, NumPy, SciPy, and powerlaw libraries (using pip, conda, or PyPI for the library installations).

Following import of modules and assignment of the dictionary provided in the MEA_Translocations_Dictionary.txt file, the dictionary must first be flattened to yield a list with all of the translocation quiescent periods.  This can be achieved by writing a function to flatten the list, such as the following:

# Flatten a dictionary, return a list.
def flatten_dict(data_dict: dict) -> list:
        temp_dict = data_dict
        temp_list = []
        # Iterate through dictionary values, append them to the new list.
        for vals in temp_dict.values():
                    temp_list.append(vals)
           # For each nested list inside of temp_dist, acquired from temp_dict,
        # flatten it to yield a one-dimensional list (array).
        flattened_dict_as_list = [val for sublist in temp_list for val in sublist]
        # Return these values to the variable that called the function.    
        return flattened_dict_as_list

Once you are working with the flattened values (as a one-dimensional list, or array, with all of the translocation quiescent periods contained inside), single-occurrence values can be either retained or eliminated, depending on your preference. In the article, we chose to remove single-occurrence quiescent periods. This can be done using a function like the one below:

# Removes events that occur only once in data. 
# Returns new list of events occurring more than once.
def remove_single_events(data) -> list:
        # Get unique values, counts of those values
        unique_batch_vals, unique_batch_counts = np.unique(data, return_counts=True)
        # Store one-off indices, or the indices at which a singular occurrence was found.
        one_offs = np.where(unique_batch_counts <= 1)
        # Find, using np.where, only those indices in values and counts 
# where the number of counts is greater than 1.
        repeat_vals_only = unique_batch_vals[np.where(unique_batch_counts > 1)]
        repeat_counts_only = unique_batch_counts[np.where(unique_batch_counts > 1)] 
# Generate a new, flattened list of those values that occur more than once.         
        data2 = [val for val in data if val in repeat_vals_only]
        # Return these values to the variable that called the function.
        return data2

This data can now be used with the ‘powerlaw’ module. First, you must assign a variable that uses the powerlaw module’s Fit method, such as:

        PL_data = pl.Fit(data, discrete=True, xmax=xmax)

Here, “data” refers to the flattened array containing only those values that occur more than once. The parameter “discrete” is part of the Fit method and must be used for data that are expected to only take on discrete values; hence, it is set to “True” here. Finally, the “xmax” parameter can be set to either None or given a value. If you are assessing the data for a doubly truncated power law, as we did, then you must provide a value for xmax. If you do not, you may yield values that are inaccurate. See the original article for our xmax value and justification.

At this point, we recommend that you read through the powerlaw documentation for the rest of the analysis, including how to generate PDF and CCDF plots directly from the library, as well as how to yield the distribution comparisons. 

For ease of access, the powerlaw documentation can be found here: https://pythonhosted.org/powerlaw/

Additionally, and again for ease of access, the powerlaw publication can be found here:
https://doi.org/10.1371/journal.pone.0085777