Assignment 4 (20%)
Your task is to produce a short report analysing some data using a quarto template with a targets workflow. You may use any data you choose provided it includes at least 1000 observations. Possible sources of data include:
- Human Mortality Database
- Human Fertility Database
- UN Data
- IMF Data
- ABS Census Data
- Australian Electoral Commission data
- US General Social Survey
- HILDA Survey
- Johns Hopkins COVID data
The source data must be in as close to raw form as possible. e.g., csv or xlsx files obtained from the data custodians. You are not to use data that is already in an R package.
Your analysis should include the following elements:
- Reading the raw data into R.
- Cleaning and wrangling the data into a form suitable for analysis.
- At least two plots highlighting interesting features of the data.
- At least one statistical model fitted to the data (or a subset of the data). This could be a linear model, a GLM, a GAM, a GAMM, a time series model, or any other model you think is appropriate. There are no marks awarded for model complexity — you should use a model that is appropriate for the data.
- A discussion of the results of the model and how they relate to the plots.
You must use the targets package to manage the workflow, renv to manage the package environment, with the analysis described in a quarto report.
Due: 24 May 2024
Submit