|
a |
|
b/data/README.md |
|
|
1 |
# Pension Dataset Description |
|
|
2 |
|
|
|
3 |
The `pension` dataset is used to analyze various factors influencing retirement savings and financial assets. The dataset contains several variables, each representing demographic and financial attributes of individuals. |
|
|
4 |
|
|
|
5 |
## Variables |
|
|
6 |
|
|
|
7 |
### 1. **D**: Contribution to 401(k) Plan (`pension$p401`) |
|
|
8 |
- Binary indicator variable. |
|
|
9 |
- `1`: Individual contributes to a 401(k) retirement plan. |
|
|
10 |
- `0`: Individual does not contribute to a 401(k) retirement plan. |
|
|
11 |
|
|
|
12 |
### 2. **Z**: Eligibility for 401(k) Plan (`pension$e401`) |
|
|
13 |
- Binary indicator variable. |
|
|
14 |
- `1`: Individual is eligible for a 401(k) retirement plan. |
|
|
15 |
- `0`: Individual is not eligible for a 401(k) retirement plan. |
|
|
16 |
|
|
|
17 |
### 3. **Y**: Net Total Financial Assets (`pension$net_tfa`) |
|
|
18 |
- Continuous variable. |
|
|
19 |
- Represents the individual's total financial assets, adjusted for liabilities. |
|
|
20 |
|
|
|
21 |
### 4. **X**: Covariates |
|
|
22 |
- A matrix of individual-level demographic and financial features. The variables included are: |
|
|
23 |
|
|
|
24 |
| Variable | Description | |
|
|
25 |
|----------------------|--------------------------------------| |
|
|
26 |
| **Age** | Age of the individual. | |
|
|
27 |
| **Benefit pension** | Binary indicator for benefit pension.| |
|
|
28 |
| **Education** | Years of education completed. | |
|
|
29 |
| **Family size** | Number of family members. | |
|
|
30 |
| **Home owner** | Binary indicator for home ownership.| |
|
|
31 |
| **Income** | Annual income (continuous variable).| |
|
|
32 |
| **Male** | Binary indicator for gender. | |
|
|
33 |
| **Married** | Binary indicator for marital status.| |
|
|
34 |
| **IRA** | Binary indicator for having an Individual Retirement Account (IRA).| |
|
|
35 |
| **Two earners** | Binary indicator for dual-income households.| |
|
|
36 |
|
|
|
37 |
## Data Structure |
|
|
38 |
The dataset contains: |
|
|
39 |
- Binary variables for 401(k) contributions and eligibility (`D` and `Z`). |
|
|
40 |
- A continuous variable for financial assets (`Y`). |
|
|
41 |
- A set of covariates (`X`) covering demographic and financial information. |
|
|
42 |
|
|
|
43 |
## Usage |
|
|
44 |
The dataset can be used for: |
|
|
45 |
- Analyzing the relationship between 401(k) eligibility/contribution and financial assets. |
|
|
46 |
- Studying the effects of demographic factors on retirement savings behavior. |
|
|
47 |
- Building models to predict financial asset accumulation based on demographic features. |
|
|
48 |
|
|
|
49 |
## Example R Code |
|
|
50 |
Here is an example of how to load and prepare the data: |
|
|
51 |
|
|
|
52 |
```R |
|
|
53 |
data(pension) |
|
|
54 |
|
|
|
55 |
D = pension$p401 |
|
|
56 |
Z = pension$e401 |
|
|
57 |
Y = pension$net_tfa |
|
|
58 |
X = model.matrix(~ 0 + age + db + educ + fsize + hown + inc + male + marr + pira + twoearn, data = pension) |
|
|
59 |
var_nm = c("Age","Benefit pension","Education","Family size","Home owner","Income","Male","Married","IRA","Two earners") |
|
|
60 |
colnames(X) = var_nm |
|
|
61 |
|