Statistical-Inference/03-estimation.qmd at master · ShKlinkenberg/Statistical-Inference · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
```{r setup, include=FALSE}
source("R_setup.R")
```

# Estimating a Parameter {#sec-param-estim}

> Which Population Values Are Plausible?

> Key concepts: point estimate, interval estimate, confidence (level), precision, standard error, critical value, confidence interval.


### Summary {-}

::: {.callout-important appearance="simple"}

Given our sample, what are plausible population values?

:::

In this chapter, we set out to make educated guesses of a population value (parameter, often called "the true value") based on our sample. This type of guessing is called _estimation_. Our first guess will be a single value for the population value. We merely guess that the population value is equal to the value of the sample statistic. This guess is the most precise guess that we can make, but, most likely, it is wrong.

Our second guess uses the sampling distribution to make a statement about the approximate population value. In essence, we calculate an interval that we are confident will contain the population value. We can increase our confidence by widening the interval, but this decreases the precision of our guess.

## Point Estimate

If we have to name one value for the population value, our best guess is the value of the sample statistic. For example, if 18% of the candies in our sample bag are yellow, our best guess for the proportion of yellow candies in the population of all candies from which this bag was filled, is .18. What other number can we give if we only have our sample? This type of guess is called a _point estimate_ and we use it a lot.

The sample statistic is the best estimate of the population value only if the sample statistic is an unbiased estimator of the population value. As we have learned in @sec-unbiased-est, the true population value is equal to the mean of the sampling distribution for an unbiased estimator. The mean of the sampling distribution is the expected value for the sample.

In other words, an unbiased estimator neither systematically overestimates the population value, nor does it systematically underestimate the population value. With an unbiased estimator, then, there is no reason to prefer a value higher or lower than the sample value as our estimate of the population value.

Even though the value of the statistic in the sample is our best guess, it is very unlikely that our sample statistic is exactly equal to the population value (parameter). The recurrent theme in our discussion of random samples is that a random sample differs from the population because of chance during the sampling process. The precise population value is highly unlikely to actually appear in our sample.

The sample statistic value is our best point estimate but it is nearly certain to be wrong. It may be slightly or far off the mark but it will hardly ever be spot on. For this reason, it is better to estimate a range within which the population value falls. Let us turn to this in the next section.

## Interval Estimate for the Sample Statistic

The sampling distribution of a continuous sample statistic tells us the probability of finding a range of scores for the sample statistic in a random sample. For example, the average weight of candies in a sample bag is a continuous random variable. The sampling distribution tells us the probability of drawing a sample with average candy weight between 2.0 and 3.6 grams. We can use this range as our _interval estimate_.

Note that we are reasoning from sampling distribution to sample now. This is not what we want to do in actual research, where we want to reason from sample to sampling distribution to population. We get to that in @sec-ci-parameter. For now, assume that we know the true sampling distribution.

Remember that the average or expected value of a sampling distribution is equal to the population value if the estimator is unbiased. For example, the mean weight of yellow candies averaged over a very large number of samples is equal to the mean weight of yellow candies in the population. For an interval estimate, we now select the sample statistic values that are closest to the average of the sampling distribution.

Between which boundaries do we find the sample statistic values that are closest to the population value? Of course, we have to specify what we mean by "closest". Which part of all samples do we want to include? A popular proportion is 95%, so we want to know the boundary values that include 95% of all samples that are closest to the population value. For example, between which boundaries is average candy weight situated for 95% of all samples that are closest to the average candy weight in the population?

::: {#fig-ci-borders}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/ci-borders/" width="100%" height="290px" style="border:none;">
</iframe>
```
Within which interval do we find the sample results that are closest to the population value?
:::

@fig-ci-borders shows the sampling distribution of average sample candy weight.


Say, for instance, that 95% of all possible samples in the middle of the sampling distribution have an average candy weight ranging from 1.6 to 4.0 grams. The proportion .95 can be interpreted as a probability. Our sampling distribution tells us that we have 95% probability that the average weight of yellow candies lies between 1.6 and 4.0 grams in a random sample that we draw from this population.

We now have boundary values, that is, a range of sample statistic values, and a probability of drawing a sample with a statistic falling within this range. The probability shows our _confidence_ in the estimate. It is called the _confidence level_ of an interval estimate.


## Precision, Standard Error, and Sample Size {#sec-precisionsesamplesize}

The width of the estimated interval represents the _precision_ of our estimate. The wider the interval, the less precise our estimate. With a less precise interval estimate, we will have to take into account a wider variety of outcomes in our sample.

::: {#fig-interval-level}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/interval-level/" width="100%" height="310px" style="border:none;">
</iframe>
```
How does the confidence level affect the precision of an interval estimate?
:::


If we want to predict something, we value precision. We would rather conclude that the average weight of candies in the next sample we draw is between 2.0 and 3.6 grams than between 1.6 and 4.0 grams. If we would be satisfied with a very imprecise estimate, we need not do any research at all. With relatively little knowledge about the candies that we are investigating, we could straightaway predict that the average candy weight is between zero and ten grams. The goal of our research is to find a more precise estimate.

There are several ways to increase the precision of our interval estimate, that is, to obtain a narrower interval for our estimate. The easiest and least useful way is to decrease our confidence that our estimate is correct. If we lower the confidence that we are right, we can discard a large number of other possible sample statistic outcomes and focus on a narrower range of sample outcomes around the true population value.

This method is not useful because we sacrifice our confidence that the range includes the outcome in the sample that we are going to draw. What is the use of a more precise estimate if we are less certain that it predicts correctly? Therefore, we usually do not change the confidence level and leave it at 95% or thereabouts (90%, 99%). It is important to be quite sure that our prediction will be right.

### Sample sizes {#sec-sample-sizes}

A less practical but very useful method of narrowing the interval estimate is increasing sample size. If we buy a larger bag containing more candies, we get a better idea of average candy weight in the population and a better idea of the averages that we should expect in our sample.

::: {#fig-interval-size}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/interval-size/" width="100%" height="310px" style="border:none;">
</iframe>
```
How does sample size affect the precision of an interval estimate?
:::

@fig-interval-size shows a sampling distribution of average candy weight in candy sample bags. The size of the horizontal arrow represents the precision of the interval estimate: the shorter the arrow, the more precise the interval estimate.


As you may have noticed while playing with @fig-interval-size, a larger sample yields a narrower, that is, more precise interval. You may have expected intuitively that larger samples give more precise estimates because they offer more information. This intuition is correct.

In a larger sample, an observation above the mean is more likely to be compensated by an observation below the mean. Just because there are more observations, it is less likely that we sample relatively high scores but no or considerably fewer scores that are relatively low.

The larger the sample, the more the distribution of scores for a variable in the sample will resemble the distribution of scores for this variable in the population. As a consequence, a sample statistic value will be closer to the population value for this statistic.

Larger samples resemble the population more closely, and therefore large samples drawn from the same population are more similar. The result is that the sample statistic values in the sampling distribution are less varied and more similar. They are more concentrated around the true population value. The middle 95% of all sample statistic values are closer to the centre, so the sampling distribution is more peaked.

### Standard error {#sec-standard-error}

The concentration of sample statistic values, such as average candy weight in a sample bag, around the centre (mean) of the sampling distribution is expressed by the standard deviation of the sampling distribution. Up until now, we have only paid attention to the centre of the sampling distribution, its mean, because it is the expected value in a sample and it is equal to the population value if the estimator is unbiased.

Now, we start looking at the standard deviation of the sampling distribution as well, because it tells us how precise our interval estimate is going to be. The sampling distribution's standard deviation is so important that it has received a special name: the _standard error_.

::: {#fig-se-point-est}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/se-point-est/" width="100%" height="310px" style="border:none;">
</iframe>
```
How does sample size affect the standard error?
:::


The word _error_ reminds us that the standard error represents the size of the error that we are likely to make (on average under many repetitions) if we use the value of the sample statistic as a point estimate for the population value.

Let us assume, for instance, that the standard error of the average weight of candies in samples is 0.6. Loosely stated, this means that the average difference between true average candy weight and average candy weight in a sample is 0.6 if we draw very many samples from the same population.

The smaller the standard error, the more the sample statistic values resemble the true population value, and the more precise our interval estimate is with a given confidence level, for instance, 95%. Because we like more precise interval estimates, we prefer small standard errors over high standard errors.

It is easy to obtain smaller standard errors: just increase sample size. See @fig-interval-size, where larger samples yield more peaked sampling distributions. In a peaked distribution, values are closer to the mean and the standard error is smaller. In our example, average candy weights in larger sample bags are closer to the average candy weight in the population.

In practice, however, it is both time-consuming and expensive to draw a very large sample. Usually, we want to settle on the optimal size of the sample, namely a sample that is large enough to have interval estimates at the confidence level and precision that we need but as small as possible to save on time and expenses. We return to this matter in @sec-power.

The standard error may also depend on other factors, such as the variation in population scores. In our example, more variation in the weight of candies in the population produces a larger standard error for average candy weight in a sample bag. If there are more very heavy candies and very light candies, it is easier to draw a sample with several heavy candies or with several very light candies. Average weight in these sample bags will be too high or too low. We cannot influence the variation in candy weights in the population, so let us ignore this factor influencing the standard error.


## Critical Values {#sec-crit-values}

In the preceding section, we learned that the standard error is related to the precision of the interval estimate. A larger standard error yields a less precise estimate, that is, with a wider interval estimate.

We are interested in the interval that includes a particular percentage of all samples that can be drawn, usually the 95% of all samples that are closest to the population value. In our current example, the 95% of all samples with average candy weight that is closest to average candy weight in the population (2.8 grams).

In theoretical probability distributions like the normal distribution, the percentage of samples is related to the standard error. If we know the standard error, we know the interval within which we find the 95% of samples that are closest to the population value.

::: {#fig-crit-values}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/crit-values/" width="100%" height="360px" style="border:none;">
</iframe>
```
Standardized sample outcomes and the standard error.
:::

@fig-crit-values shows the sampling distribution of average candy weight per sample bag. It contains two horizontal axes, one with average candy weight in grams (bottom) and one with average candy weight in standard errors, also called _z_ scores (top).


In @fig-crit-values, we approximate the sampling distribution with a theoretical probability distribution, namely the normal distribution. The theoretical probability distribution links probabilities (areas under the curve) to sample statistic outcome values (scores on the horizontal axis). For example, we have 2.5% probability of drawing a sample bag with average candy weight below 1.2 grams or 2.5% probability of drawing a sample bag with average candy weight over 4.4 grams.

### Standardization and _z_ scores

The average candy weights that are associated with 2.5% and 97.5% probabilities in @fig-crit-values depend on the sample that we have drawn. As you may notice while playing with @fig-interval-size, changing the size of the sample also changes the average candy weights that mark the 2.5% and 97.5% probabilities.

We can simplify the situation if we _standardize_ the sampling distribution: Subtract the mean of the sampling distribution from each sample mean in this distribution, and divide the result by the standard error. Thus, we transform the sampling distribution into a distribution of standardized scores. The mean of the new standardized variable is always zero.

If we use the normal distribution for standardized scores, which is called the _standard-normal distribution_ or _z distribution_, there is a single _z_ value that marks the boundary between the top 2.5% and the bottom 97.5% of any sample. This _z_ value is 1.96. If we combine this value with -1.96, separating the bottom 2.5% of all samples from the rest, we obtain an interval [-1.96, 1.96] containing 95% of all samples that are closest to the mean of the sampling distribution.

In a standard-normal or _z_ distribution, 1.96 is called a _critical value_. Together with its negative (-1.96), it separates the 95% sample statistic outcomes that are closest to the parameter, hence that are most likely to appear, from the 5% that are furthest away and least likely to appear. There are also critical _z_ values for other probabilities, for instance, 1.64 for the middle 90% of all samples and 2.58 for the middle 99% in a standard-normal distribution.

### Interval estimates from critical values and standard errors {#sec-int-est-sample-mean}

Critical values in a theoretical probability distribution tell us the boundaries, or range, of the interval estimate expressed in standard errors. In a normal distribution, 95% of all sample means are situated no more than 1.96 standard errors from the population mean.

If the standard error is 0.5 and the population mean is 2.8 grams, we have 95% probability that the mean candy weight in a sample that we draw from this population lies between 1.82 grams (this is 1.96 times 0.5 subtracted from 2.8) and 3.78 grams.

Critical values make it easy to calculate an interval estimate if we know the standard error. Just take the population value and add the critical value times the standard error to obtain the upper limit of the interval estimate. Subtract the critical value times the standard error from the population value to obtain the lower limit.

::: {.callout-important appearance="simple"}

- Lower limit of the interval estimate = population value -- critical value * standard error.
- Upper limit of the interval estimate = population value + critical value * standard error.

:::

(Standard) normal distributions make life easier for us, because there is a fixed critical value for each probability, such as 1.96 for 95% probability, which is well-worth memorizing.


## Confidence Interval for a Parameter {#sec-ci-parameter}

Working through the preceding sections, you may have realized that estimating the value of a statistic in a new sample with a specific precision and probability is not our ultimate goal, as it does not fully represent reality. In reality, we do not know the population parameter, and the primary objective of statistics is to estimate this unknown population parameter.

For example, we don't care much about the average weight of candies in our sample bag or in the next sample bag that we may buy. We want to say something about the average weight of candies in the population. How can we do this?

In addition, you may have realized that, if we want to construct the sampling distribution of sample means, we first need to know the precise population value, for instance, average candy weight in the population. After all, the average of the sampling distribution is equal to the population mean for an unbiased estimator. In the preceding paragraphs, we acted as if we knew the sampling distribution of sample means.

::: {#fig-exactapproachfigure}

```{r exactapproachfigure, eval=TRUE, echo=FALSE, out.width="300px", fig.pos='H', fig.align='center'}
knitr::include_graphics("figures/exactapproach.png")
```
Probabilities of a sample with a particular number of yellow candies if 20 per cent of the candies are yellow in the population.
:::

In the exact approach to the sampling distribution of the proportion of yellow candies in a sample bag (@fig-exactapproachfigure), for instance, we first need to know the proportion of yellow candies in the population. If we know the population proportion, we can calculate the exact probability of getting a sample bag with a particular proportion of yellow candies. But we don't know the population proportion of yellow candies; we want to estimate it.

In the candy weight example, we first need to know the mean of population candy weight, then we can construct a theoretical probability distribution of sample means. But we do not know the population mean; we want to estimate it.

```{r normal-param, eval=FALSE, echo=FALSE, fig.pos='H', fig.align='center', fig.cap="Which population statistics do we need to know if we want to use the standard normal distribution as sampling distribution?"}
# show an X axis labeled 'sample mean' with a scale from 0 to 5 grams ; add sample mean as labelled dot at 3.4 ; add a normal curve with population mean 2.8, a standard error 0.1 (equivalent to sample standard deviation of 1.0 and a sample size of 25) ; show checkboxes for sample size, sample mean, sample standard deviation, standard error, population mean, and population standard error with their (fixed) values ; select all checkboxes ; if the user deselects a checkbox, either nothing happens or a series of 5 normal curves is shown for different values of the statistic associated with the checkbox.
# - sample size: only needed if standard error is deselected
# - sample mean: not necessary
# - sample standard deviation: if population standard deviation or standard error is selected, the sample standard deviations is not needed ; if both standard deviations and the standard error are not selected, show normal curves with different standard deviations
# - standard error: not needed if sample size and sample or population standard deviation are selected ; otherwise, show normal curves with different standard deviations
# - population mean: if deselected, show normal curves for different values
# - population standard deviation: see sample standard deviation

![Example layout](figures/CI_2.png)

@fig-normal-param shows a standard normal distribution that we could use to approximate the sampling distribution of average candy weight in a sample (bag) of candies. This theoretical distribution can be used only if some of the statistics (see the check boxes) are known. Uncheck a box if you think this statistic is _not_ needed to use the standard normal distribution. If you are right, nothing happens. If you are wrong, you will see a series of normal curves for different values of the statistic. Then, check the statistic again and try another statistic.

1. Which mean is required for using the standard normal distribution? Which characteristic of the distribution is fixed by this statistic?

2. Do you have to know the standard deviation of the sample as well as that  of the population?

3. Which values do you have to know _only if_ you don't know the standard error? In other words, which statistics affect the standard error (you have learned about before)? Deselect the standard error and experiment with the other options.
```

A theoretical probability distribution can only be used as an approximation of a sampling distribution if we know some characteristics of the population. We know that the sampling distribution of sample means always has the bell shape of a normal distribution or _t_ distribution. However, knowing the shape is not sufficient for using the theoretical distribution as an approximation of the sampling distribution.

We must also know the population mean because it specifies where the centre of the sampling distribution is located. So, we must know the population mean to use a theoretical probability distribution to estimate the population mean.

By the way, we also need the standard error to know how peaked or flat the bell shape is. The standard error can usually be estimated from the data in our sample. But let us not worry about how the standard error is being estimated and focus on estimating the population mean.

### Reverse reasoning from one sample mean  {#sec-fixed-pop-values}

In the previous chapters, we were reasoning from population mean to sampling distribution of sample means, then to a single sample mean. Based on the known population mean and standard error, we could draw an interval estimate around the population mean of the sampling distribution (@fig-crit-values). 95% of all sample means would fall within this interval estimate around the known population mean.

In practice though, the population mean is unknown. We only have the sample mean derived from our collected data. Using this sample mean, we would like to estimate the population mean.

::: {#fig-pop-ci-sampling}

```{=html}
<iframe src="https://sharon-klinkenberg.shinyapps.io/pop-ci-sampling/" width="100%" height="630px" style="border:none;">
</iframe>
```
Confidence intervals from replications
:::

Instead of checking whether a sample mean is inside or outside of the interval estimate of the **population mean** (@sec-crit-values), we use the interval estimate around that one **sample mean**, to check whether this interval catches the population mean. These two ways are equivalent, but the second way does not require us to know the population mean. This is because the interval estimate around the sample means catches the population mean in 95% of the times, regardless if the population mean is known or not. We call such an interval estimate around the sample mean a **95% confidence interval**.

The middle plot of @fig-pop-ci-sampling, shows the 95% confidence interval for a single sample, from the population distribution in the top graph. The green line indicates that the interval estimate around the sample mean catches the population mean. The average candy weight in this samples of $N=30$ is 2.95 grams and the lower and upper boundary for 95% confidence interval are 2.59 grams and 3.32 grams. We use the blue vertical dashed line to indicate the population mean, which in reality we do not know. Though, in this simulation, we can see that the green 95% confidence interval catches the population mean.

Now, increase the number of times of samples (replications) by adjusting the slider in @fig-pop-ci-sampling. The middle plot now shows how many of the samples with a 95% confidence intervals catch the population mean, indicated by the green lines. The red lines indicate that the 95% confidence interval does not catch the population mean. The percentage that catch the population mean approaches 95% as the number of samples gets higher.

Now, also increase the sample size in @fig-pop-ci-sampling by moving the slider. We see that, in the middle plot, all confidence intervals become narrower. We therefore are much more confident in our estimation of the population mean with a larger sample size.

Note that the confidence intervals are not of the same width, and that the upper and lower bound for each sample is different. This is because the sample mean is different for each sample, and the standard error is calculated from the sample. It is therefore incorrect to say that you are 95% confident that the population mean is within the specific lower and upper bound of your sample. Instead, you are 95% confident that the interval estimate around the sample mean catches the population mean. This is a subtle difference, but indicates that only repeated sampling is the rationale for the confidence that the population mean is within the interval estimate.

::: {.callout-important appearance="simple"}

> If we were to repeat the experiment over and over, then 95% of the time the confidence intervals contain the true mean.
>
> --- @hoekstra2014robust

It is very important that we understand that the confidence level 95% is NOT the probability that the population parameter has a particular value, or that it falls within the interval.

In classic statistics (so called "Frequentists"), the population parameter is _not_ a random variable but a fixed, unknown number, which does not have a probability.

A confidence interval around a sample statistics from solely one data collection either catches (100%) or does not catch (0%) the population parameter. We just do not know which one is the case. That's why we cannot make any conclusion about the population mean based on one confidence interval.

Only when we replicate data collection many many times, we do know that about 95% of all 95% confidence intervals will catch the population mean.

:::

Now imagine that you have a large sample for your research project. Looking at @fig-pop-ci-sampling, this would mean that if you would run the same research a hundred times, you would find the population mean within the 95% confidence interval in 95 of these hundred times. That is very reasuring, isn't it?

In the bottom plot of @fig-pop-ci-sampling, we revisit @sec-samp-dist, to illustrate how these 95% confidence intervals around the sample means are related to the sampling distribution. Each confidence interval in the middle plot is an approximation of the width of the sampling distribution in the bottom plot. The larger the  samples size, the narrower the confidence intervals are, and the narrower the sampling distribution becomes in the bottom plot. The histogram represents the sample means from the number replications. Each sample mean comes from one replication.

We also see the theoretical approximation of this sampling distribution (as was discussed in @sec-theoretical-approx) of sample means, which is a normal distribution. As the number of replications and sample size increases, the shape of the histogram gets closer to the shape of the theoretical approximation of the sampling distribution (green line), which in turn also gets narrower.

This sampling distribution is theoretically approximated by a normal distribution whose mean is the population mean and standard deviation is the standard error of the sample. To make our life easier, we can convert the sampling distribution, which is a normal distribution, to the standard normal distribution by converting to a z-score. The critical z value 1.96 and -1.96 together marks the upper and lower limit of the interval containing 95% of all samples with means closest to the population mean.

As a consequence, we are able to calculate 95% confidence interval around a sample mean by adding and subtracting 1.96 standard errors from that sample mean.

::: {.callout-important appearance="simple"}

- Confidence interval lower limit = sample value -- critical value * standard error.
- Confidence interval upper limit = sample value + critical value * standard error.

For example, the 95%-confidence interval for a sample mean:

- Lower limit = sample mean - 1.96 * standard error.
- Upper limit = sample mean + 1.96 * standard error.

:::


Haven't we seen this calculation before? Yes we did, in @sec-int-est-sample-mean, where we estimated the interval around population mean for sample means. We now simply reverse the application, using the interval of sample mean to estimate the population mean instead of the other way around.

```{block2, type='rmdneyman'}
Jerzy Neyman introduced the concept of a confidence interval in 1937:

"In what follows, we shall consider in full detail the problem of estimation by interval. We shall show that it can be solved entirely on the ground of the theory of probability as adopted in this paper, without appealing to any new principles or measures of uncertainty in our judgements". [@RefWorks:3929: 347]

Photo of Jerzy Neyman by Ohonik, Commons Wikimedia, CC BY-SA 4.0]
```

### Confidence intervals with bootstrapping {#sec-bootstrap-confidenceinterval}

If we approximate the sampling distribution with a theoretical probability distribution such as the normal (_z_) or _t_ distribution, critical values and the standard error are used to calculate the confidence interval (see @sec-fixed-pop-values).

There are theoretical probability distributions that do not work with a standard error, such as the _F_ distribution or chi-squared distribution. If we use those distributions to approximate the sampling distribution of a continuous sample statistic, for instance, the association between two categorical variables, we cannot use the formula for a confidence interval (@sec-fixed-pop-values) because we do not have a standard error. We must use bootstrapping to obtain a confidence interval.

```{r bootstrap-ci, eval=FALSE, echo=FALSE, fig.pos='H', fig.align='center', fig.cap="How do we construct confidence intervals with bootstrapping?"}
# Adapt app bootstrapping: Show Initial Sample and Sampling Distribution, buttons Bootstrap 1000 samples and Draw new initial sample. Increase vertical size of sampling distribution but reduce range from 0 to 0.5.
# Add vertical lines for 95% confidence interval limits calculated from percentiles in the bootstrapped sampling distribution and limits calculated from the standard error (= standard deviation of the bootstrapped sampling distribution) times the critical value 1.96 (using a normal approximation).
# Finally, display the average of the bootstrapped sampling distribution as well as the true population proportion (0.2).

# In questions, point out the symmetry of the normal approximation to the confidence interval versus the asymmetry of the percentile-based confidence interval.
```

As you might remember from @sec-boot-approx, we simulate a sampling distribution if we bootstrap a statistic, for instance median candy weight in a sample bag. We can use this sampling distribution to construct a confidence interval. For example, we take the values separating the bottom 2.5% and the top 2.5% of all samples in the bootstrapped sampling distribution as the lower and upper limits of the 95% confidence interval. We will encounter the bootstrapping method for confidence intervals around regression coefficient of mediator again in @sec-mediation.

It is also possible to construct the entire sampling distribution in exact approaches to the sampling distribution. Both the standard error and percentiles can be used to create confidence intervals. This can be very demanding in terms of computer time, so exact approaches to the sampling distribution usually only report _p_ values (see @sec-pvalue), not confidence intervals.


## Confidence Intervals in SPSS {#sec-SPSS-CI}

Now that you have learned about confidence intervals, it is useful to know how to add these confidence intervals to all analyses that you will execute in the future.

### Instruction

In the video below we will show you how to set confidence intervals in SPSS for several analyses. Think about t-tests (@sec-probmodels), ANOVA's (@sec-anova) and regression analyses (@sec-moderationcat, -@sec-moderationcont, -@sec-confounder, and -@sec-mediation). From now on, we will ask of you to always report the confidence intervals of your results in addition to the test statistic, p-value and effect size. This provides a more complete picture of the research findings, we will elaborate more on this in the next Chapter.

When we execute a t-test the 95% confidence interval is already set as default option. We would only have to change this setting if we would like a confidence interval with a different confidence level (e.g., 90%). In @fig-interval-level we showcased the effect of the confidence level on the precision of the interval.

In the case of an ANOVA, the confidence interval is not necessary for the F-statistic. However if we conduct a post-hoc test we do automatically get a confidence interval of 95%. This time we can change the confidence interval by adjusting the significance level in the `Post Hoc` dialog. When we use General Linear Model, which we do for two-way ANOVA, we can find the significance level option under the `options` dialog.

When we conduct a regression analysis, we have to specifically ask for and turn on the option of confidence intervals. This can be done in the `Statistics` dialog. We have the opportunity to change the confidence level.

Lastly, when we want a confidence interval for a correlation coefficient we need to use the option `bootstrap`. Bootstrapping has been introduced in @sec-boot-approx. Be aware that in every analysis we have the opportunity to bootstrap the intervals, how to add bootstrapping is shown in this video: @vid-SPSSbootstrap1 in @sec-boot-spss.

_Note_ The confidence level is the relative width of the interval in terms of percentages, e.g. 90%, 95% or 99%. The confidence interval is the absolute width of the interval, thus the values belonging to the percentages, e.g. 4.3 to 5.6.

Watch the video below for the step by step instructions, details and additional information.

::: {#vid-SPSSconflevel}

{{< video https://www.youtube.com/embed/GoGrsHpfIWM
    width="100%"
    height="315"
>}}

Displays the 95% confidence intervals in SPSS.
:::


```{r, echo=FALSE, eval=FALSE}
# SPSS usually displays the 95% confidence interval automatically if you use the standard-normal (_z_) or _t_ distribution.
# * t tests on one or two means: 95% CI displayed. Adjust confidence level under options.
# * post-hoc tests with 1WAY-ANOVA: 95% CI by default, adjust significance level (under Post-Hoc) to change confidence level (1 - significance level)
# * post-hoc tests with 2WAY-ANOVA: 95% CI by default, adjust significance level (under Options) to change confidence level (1 - significance level)
# * t test on regression coefficient: ask for CI under Statistics, level can be adjusted there.
# Bootstrapping: always a confidence interval
# * example: media with Frequencies
# * correlation: only with bootstrapping
# No confidence intervals:
# * non-parametric tests (just categorical variables - irrelevant)
```


<!-- ## Test Your Understanding

```{r estimation, fig.pos='H', fig.align='center', fig.cap="Point and interval estimates, confidence intervals.", echo=FALSE, out.width="775px", screenshot.opts = list(delay = 5), dev="png"}
# Combination of apps interval-level, interval-size, and crit-values: A normal curve (M = 2.8, SE = 0.1 (SD population = 0.5 and sample size = 25)) as sampling distribution with two vertical lines marking interval limits (initially set at 2.5%/95%/2.5%) ; percentages indicating the area under the curve within and outside the limits ; double arrow from mean to interval limits indicating the interval estimate ; double x-axis: the first in grams and the second in standard errors, both axes labelled ; 2 sliders: confidence level and sample size ; any slider change will change the position of the interval limits (to represent the selected confidence level) ; changing confidence level also changes the percentages ; changing sample size also changes the scale of the x-axis in standard errors ; the value of the standard error is shown to highlight its relation with sample size.
knitr::include_app("https://sharon-klinkenberg.shinyapps.io/estimation/", height="258px")
```

@fig-estimation shows the sampling distribution of average candy weight in a sample bag, which is a normal distribution.


-->

## Take-Home Points

* If a sample statistic is an unbiased estimator, we can use it as a point estimate for the value of the statistic in the population.

* A point estimate may come close to the population value but it is almost certainly not correct.

* A 95% confidence interval is an interval estimate of the population value. We are 95% confident that the population value lies within this interval. Note that confidence is not a probability!

* A larger sample or a lower confidence level yields a narrower, that is, a more precise confidence interval.

* A larger sample yields a smaller standard error, which yields a more precise confidence interval because the limits of a 95% confidence interval fall one standard error times the critical value below and above the value of the sample statistic.