--- a +++ b/data/README.md @@ -0,0 +1,61 @@ +# Pension Dataset Description + +The `pension` dataset is used to analyze various factors influencing retirement savings and financial assets. The dataset contains several variables, each representing demographic and financial attributes of individuals. + +## Variables + +### 1. **D**: Contribution to 401(k) Plan (`pension$p401`) + - Binary indicator variable. + - `1`: Individual contributes to a 401(k) retirement plan. + - `0`: Individual does not contribute to a 401(k) retirement plan. + +### 2. **Z**: Eligibility for 401(k) Plan (`pension$e401`) + - Binary indicator variable. + - `1`: Individual is eligible for a 401(k) retirement plan. + - `0`: Individual is not eligible for a 401(k) retirement plan. + +### 3. **Y**: Net Total Financial Assets (`pension$net_tfa`) + - Continuous variable. + - Represents the individual's total financial assets, adjusted for liabilities. + +### 4. **X**: Covariates + - A matrix of individual-level demographic and financial features. The variables included are: + +| Variable | Description | +|----------------------|--------------------------------------| +| **Age** | Age of the individual. | +| **Benefit pension** | Binary indicator for benefit pension.| +| **Education** | Years of education completed. | +| **Family size** | Number of family members. | +| **Home owner** | Binary indicator for home ownership.| +| **Income** | Annual income (continuous variable).| +| **Male** | Binary indicator for gender. | +| **Married** | Binary indicator for marital status.| +| **IRA** | Binary indicator for having an Individual Retirement Account (IRA).| +| **Two earners** | Binary indicator for dual-income households.| + +## Data Structure +The dataset contains: +- Binary variables for 401(k) contributions and eligibility (`D` and `Z`). +- A continuous variable for financial assets (`Y`). +- A set of covariates (`X`) covering demographic and financial information. + +## Usage +The dataset can be used for: +- Analyzing the relationship between 401(k) eligibility/contribution and financial assets. +- Studying the effects of demographic factors on retirement savings behavior. +- Building models to predict financial asset accumulation based on demographic features. + +## Example R Code +Here is an example of how to load and prepare the data: + +```R +data(pension) + +D = pension$p401 +Z = pension$e401 +Y = pension$net_tfa +X = model.matrix(~ 0 + age + db + educ + fsize + hown + inc + male + marr + pira + twoearn, data = pension) +var_nm = c("Age","Benefit pension","Education","Family size","Home owner","Income","Male","Married","IRA","Two earners") +colnames(X) = var_nm +