medaCy / Git / Diff of /CONTRIBUTING.md

Models:
philipB/
medaCy
Downloads: 1
Diff of /CONTRIBUTING.md [000000] .. [6c353a]
Switch to side-by-side view

--- a
+++ b/CONTRIBUTING.md
@@ -0,0 +1,79 @@
+# Contributing to medaCy
+MedaCy seeks to create a unified platform to streamline research efforts in medical text mining while also providing an 
+interface to easily apply models to real world problems. Due to this, contributions to medaCy are often consequences 
+and direct by-products of active research projects. However, if not for the contributions, bug fixes/reports, 
+and suggestions of practioners - medaCy could not grow and thrive.
+
+This contribution guide is designed to inform:
+
+1. **Researchers** in how they can efficiently utilize medaCy to make their work more reachable by practioners.
+2. **Practioners** in how they can tune medaCy's cutting-edge functionalities to their specific application.
+
+## Table of contents
+1. [Issues and Bug Reports](#issues-and-bug-reports)
+2. [Development Set-up](#development-environment-setup)
+3. [Running Unit Tests](#running-unit-tests)
+
+## Issues And Bug Reports
+Please do a search before posting an issue/bug report - your problem may already be solved! If your search comes up for 
+not - congratulations, you may have something to contribute!
+
+## Development Environment Setup
+At it's most basic one can fork medaCy, clone down their fork, and use their favorite text editor to develop. 
+However, some up-front set-up effort goes a long way towards streamlining the contribution process and keeping organized.
+This section details a suggested set-up for efficient development, testing, and experimentation with medaCy utilizing 
+[PyCharm](https://www.jetbrains.com/pycharm/).
+
+**Assumptions of this section:**
+-  You are working in a UNIX based operating system.
+-  Part 2 assumes you have Pycharm Professional installed - Pycharm Professional is provided with the Jetbrains 
+University License. (this isn't entirely necessary but the useful Remote Host feature is disabled on the Community Edition)
+
+**Part 1: Development Installation**
+1. If you are shaky with git - [this link](https://nvie.com/posts/a-successful-git-branching-model/) provides an 
+excellent description of the branching model medaCy follows to organize contributions.
+2. Fork medaCy and copy the clone link.
+3. On your machine, insure you have Python 3 installed. Set-up a [virtual environment](https://docs.python.org/3/library/venv.html) 
+and activate it.
+4. Run the bash commands: `python --version` and `pip list`. Upgrade pip to the latest version as suggested. 
+Your python version should be above 3.4 and your installed packages should be few in number - if both of these 
+conditions do not hold return to *Step 3*.
+5. In a directory separate from the one created by the virtual envirorment set-up command, clone down your fork of medaCy.
+6. Whilst inside your cloned fork, insure you are in at-least the *development* branch or a branch of the *development* branch. 
+This can be verified by running `git status` and branching can be done with `git checkout <branch-name>`
+7. Run `pip install -e .` This will install medaCy in editable mode inside of your virtual environment and will take 
+several minutes to install dependencies - medaCy stands on the shoulders of giants! Errors one is likely to encounter 
+here include the installation of sci-py and numpy. Google search the errors as they are easily fixable via the installation 
+of some extra dependencies. Likely, your python installation is missing C headers required by scipy.
+
+**Part 2: Developing with PyCharm**
+PyCharm can streamline development efforts - especially if you are developing locally and running medaCy on a remote 
+machine for model building.
+
+**Part 3: Logging**
+
+MedaCy uses the [logging](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial) module to allow users 
+insight into how medaCy is handling their data. Insure you are logging critical steps in any functionality you implement 
+at the appropriate logging levels to make it easy for users to debug.
+
+## Running Unit Tests
+All components of medaCy have associated unit tests. Please insure these all pass before submitting pull requests. 
+When medaCy runs unit tests, it first automatically installs the [END dataset](https://github.com/NanoNLP/medaCy_dataset_end) 
+then uses it to test various functionalities of the package. Some tests involve building a model over the dataset - these 
+may take some time to complete.
+
+After installing medaCy for development, make sure that `pytest` is installed. Then:
+
+1) For quick testing of the whole framework, run: \
+    `python setup.py test`.
+1) For more fine-grained testing on individual files with colorful log output run: \
+    `pytest -s tests/tools/test_data_manager.py -o log_cli=True --log-cli-level=INFO`.
+
+    This will show log output during tests and allow you to adust logging level for the test file being run. 
+    Read the pytest documentation for details.
+
+Note that some of the unit tests require knowledge about the configuration of your machine, and that those tests will
+be skipped if those configuration settings are not specified in the config.json file. These settings include
+the location of a MetaMap binary file on your machine, which GPU core to use for certain tests, and the location
+of a word embeddings file. It may be that your contributions will not affect functionality that depend on these features,
+however, all pull requests will be tested against the full unit test suite.