- Categorical data - notebook
- Simpson's paradox - notebook
- Contingency tables - notebook
- Discrete distributions - notebook
- Maximum likelihood estimation - notebook
- Goodness of fit - notebook
- Linear regression - notebook
- Marginal effects - notebook
- GLM: Count data - notebook
- GLM: Logistic regression - notebook
[TBA]
| # | Topic | R | Python | Julia | Jupyter |
|---|---|---|---|---|---|
| 1 | Categorical data | .R | .py | .jl | .ipynb |
| 2 | Simpson's paradox | .R | -- | -- | -- |
| 3 | Contingency tables | .R | .py | .jl | .ipynb |
| 4 | Discrete distributions | .R | .py | .jl | .ipynb |
| 5 | Maximum likelihood estimation | .R | .py | .jl | .ipynb |
| 6 | Optimization methods | -- | -- | -- | .ipynb |
| 7 | Goodness of fit | .R | .py | .jl | .ipynb |
| 8 | Linear regression | .R | .py | .jl | .ipynb |
| 9 | Interactions and scaling | -- | -- | -- | .ipynb |
| 10 | Marginal effects | .R | .py | .jl | .ipynb |
| 11 | GLM: Count data | .R | .py | .jl | .ipynb |
| 12 | GLM: Logistic regression | -- | -- | -- | .ipynb |
Submit solutions as a single HTML file via Moodle.
| # | Topic | HTML | QMD | Jupyter | Deadline |
|---|---|---|---|---|---|
| 1 | Vacancy analysis (categorical data, distributions, MLE) | html | qmd | ipynb | 2026-03-31 23:59 |
| 2 | Retail store analysis (GOF, linear regression, marginal effects) | html | qmd | -- | TBA |
| 3 | TBA | -- | -- | -- | TBA |
- R:
distributions3,maxLik,rootSolvevcd,fitdistrplusmarginaleffects,modelsummarycarsee,performance,patchworkgeepack
- Python:
scipy,numpy,pandaspingouin,matplotlib,statsmodels
- Julia:
Distributions.jl,DataFrames.jl,Optim.jl,Roots.jlHypothesisTests.jl,StatsBase.jlFreqTables.jl,CSV.jlEffects.jlGLM.jl
Source:
- id -- company identifier
- woj -- region (województwo) id (02, 04, ..., 32)
- public -- is the company public (1) or private (0)?
- size -- size of the company (small = up to 9 employees, medium = 10 to 49, big = over 49)
- nace -- NACE (PKD) sections (1 letter)
- nace_division -- NACE (PKD) division (2-digits, https://www.biznes.gov.pl/pl/klasyfikacja-pkd)
- vacancies -- how many vacancies the company reported?
Sample rows from the dataset
id woj public size nace nace_division vacancies
1: 27350 14 1 Large O 84 2
2: 26705 14 1 Large O 84 1
3: 257456 24 1 Large O 84 2
4: 183657 16 1 Medium O 84 0
5: 200042 18 1 Medium O 84 0
---
57476: 244800 08 1 Medium P 85 0
57477: 62309 08 1 Medium R 93 0
57478: 106708 08 0 Medium B 08 0
57479: 62264 08 0 Medium B 08 0
57480: 255865 08 0 Small C 23 0R version 4.4.2 (2024-10-31)Python 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 08:22:19) [Clang 14.0.6 ]Julia Version 1.11.3
Commit d63adeda50d (2025-01-21 19:42 UTC)