a b/README.md
1
# Predicted risk of lung cancer based on the UK Biobank risk prediction model
2
This repository provides Stata code to calculate the predicted risk of
3
lung cancer based on supplied information and the UK Biobank risk
4
prediction model.
5
6
## How to use this code
7
You will need a copy of Stata, version 12 or higher, with the
8
user-written commands stpm2 and rcsgen installed (`ssc install stpm2`, `ssc install rcsgen`). Open stata
9
and change directory to the root of this repository. Running
10
```
11
do predict_lca_risk.do
12
```
13
will calculate the predicted risk of lung cancer. By default this will
14
take covariate information from the file 'input.csv', calculate the
15
2-year cumulative probability of lung cancer for each record therein,
16
and save the results in the file 'output.csv'. These default input,
17
output, and time-horizons can be changed by editing the parameters in
18
the configuration block at the beginning of the file
19
'predict\_lca\_risk.do'.
20
21
## Input specification
22
The input file must be an ASCII plain text CSV file, and must contain
23
the following variables:
24
25
variable name | description | type | valid values
26
--------------|:------------|:-----|:-------------
27
age | age in years | real | [40,70]
28
smoke\_stat | smoking status | integer | 0 (never smoker), 1 (former smoker), 2 (current smoker)
29
male | male sex | integer | 0 (female), 1 (male)
30
previous\_cancer | Previously diagnosed with an invasive cancer | integer | 0 (no), 1 (yes)
31
fhist\_lungca | one or more first-degree relatives who have been diagnosed with lung cancer | integer | 0 (no), 1 (yes)
32
emph\_bronch | history of emphysema or bronchitis | integer | 0 (no), 1 (yes)
33
allergy | history of hayfever, allergic rhinitis, or eczema | integer | 0 (no), 1 (yes)
34
fev1\_max | forced expiratory volume in one second (FEV1) | real | [0.5,10]
35
smkfmr\_quitage | for former smokers only, the age at which they quit | real | less than current age
36
ncig | for current and former smokers only, the average number of cigarettes smoked per day | real | (0,80]
37
stop1day\_easy | for current smokers only, how difficult would it be to not smoke for one day | integer | 0 (difficult or very difficult), 1 (easy or very easy)
38
39
Other variables can be included in the input file, but they are
40
ignored by the program.
41
42
## Output file
43
The output file contains the same variables as the input file, with
44
the addition of the variable 'cif\_lung' which contains the predicted
45
probability (Cumulative Incidence Function evaluated at a given time
46
horizon) for each input observation.