|
a |
|
b/README.md |
|
|
1 |
# Predicted risk of lung cancer based on the UK Biobank risk prediction model |
|
|
2 |
This repository provides Stata code to calculate the predicted risk of |
|
|
3 |
lung cancer based on supplied information and the UK Biobank risk |
|
|
4 |
prediction model. |
|
|
5 |
|
|
|
6 |
## How to use this code |
|
|
7 |
You will need a copy of Stata, version 12 or higher, with the |
|
|
8 |
user-written commands stpm2 and rcsgen installed (`ssc install stpm2`, `ssc install rcsgen`). Open stata |
|
|
9 |
and change directory to the root of this repository. Running |
|
|
10 |
``` |
|
|
11 |
do predict_lca_risk.do |
|
|
12 |
``` |
|
|
13 |
will calculate the predicted risk of lung cancer. By default this will |
|
|
14 |
take covariate information from the file 'input.csv', calculate the |
|
|
15 |
2-year cumulative probability of lung cancer for each record therein, |
|
|
16 |
and save the results in the file 'output.csv'. These default input, |
|
|
17 |
output, and time-horizons can be changed by editing the parameters in |
|
|
18 |
the configuration block at the beginning of the file |
|
|
19 |
'predict\_lca\_risk.do'. |
|
|
20 |
|
|
|
21 |
## Input specification |
|
|
22 |
The input file must be an ASCII plain text CSV file, and must contain |
|
|
23 |
the following variables: |
|
|
24 |
|
|
|
25 |
variable name | description | type | valid values |
|
|
26 |
--------------|:------------|:-----|:------------- |
|
|
27 |
age | age in years | real | [40,70] |
|
|
28 |
smoke\_stat | smoking status | integer | 0 (never smoker), 1 (former smoker), 2 (current smoker) |
|
|
29 |
male | male sex | integer | 0 (female), 1 (male) |
|
|
30 |
previous\_cancer | Previously diagnosed with an invasive cancer | integer | 0 (no), 1 (yes) |
|
|
31 |
fhist\_lungca | one or more first-degree relatives who have been diagnosed with lung cancer | integer | 0 (no), 1 (yes) |
|
|
32 |
emph\_bronch | history of emphysema or bronchitis | integer | 0 (no), 1 (yes) |
|
|
33 |
allergy | history of hayfever, allergic rhinitis, or eczema | integer | 0 (no), 1 (yes) |
|
|
34 |
fev1\_max | forced expiratory volume in one second (FEV1) | real | [0.5,10] |
|
|
35 |
smkfmr\_quitage | for former smokers only, the age at which they quit | real | less than current age |
|
|
36 |
ncig | for current and former smokers only, the average number of cigarettes smoked per day | real | (0,80] |
|
|
37 |
stop1day\_easy | for current smokers only, how difficult would it be to not smoke for one day | integer | 0 (difficult or very difficult), 1 (easy or very easy) |
|
|
38 |
|
|
|
39 |
Other variables can be included in the input file, but they are |
|
|
40 |
ignored by the program. |
|
|
41 |
|
|
|
42 |
## Output file |
|
|
43 |
The output file contains the same variables as the input file, with |
|
|
44 |
the addition of the variable 'cif\_lung' which contains the predicted |
|
|
45 |
probability (Cumulative Incidence Function evaluated at a given time |
|
|
46 |
horizon) for each input observation. |