Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #38 +/- ##
===========================================
- Coverage 72.81% 38.53% -34.28%
===========================================
Files 3 3
Lines 103 205 +102
===========================================
+ Hits 75 79 +4
- Misses 28 126 +98
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@john-p-ryan What was the |
I just kept it at 'none', which defaults to scipy's 'scott', which is a common rule of thumb. |
We want to estimate the distribution of income$z$ by combining a Kernel Density Estimate and a Pareto distribution estimated on the upper tail. The issue is that we need the density estimate to be twice differentiably continuous. The final density estimate is as follows:
Where$A$ and $B$ are scaling factors and $s$ is a smoothing function that satisfies $s(t_1) = 0$ , $s(t_2) = 1$ , $s'(t_1) = s'(t_2) = s''(t_1) = s''(t_2) = 0$ , and $s'(z) \geq 0$ for $z \in [t_1, t_2]$ . For example, $s(y) = 6x^5 - 15x^4 + 10x^3$ for $y = \frac{z - t_1}{t_2-t_1}$ . The estimation algorithm is as follows:
One way to do step 4 computationally is to just first use$A=1$ and estimate $B = \frac{f_{KDE}(t_1)}{f_{Pareto}(t_1)}$ , then divide the whole thing by the integral of $f$ over the whole interval.
This is what we get for the distribution:
Here is$f'$ :
You can slightly see the transition in$f'$ , but it looks pretty smooth. However, the small dip in $f'$ causes a big dip in the resulting weights:
I tried playing with the cutoffs as well as the KDE bw but this seems to be a persistent issue. I am trying to figure out what's causing this and if there's another way to smooth the transition, perhaps more forcefully.