-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtest_data.Rmd
More file actions
117 lines (90 loc) · 2.69 KB
/
test_data.Rmd
File metadata and controls
117 lines (90 loc) · 2.69 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Testing the data from the CliRes database"
output:
html_document:
theme: cosmo
toc: true
editor_options:
chunk_output_type: console
css: style.css
---
```{r general_options, include = FALSE}
knitr::knit_hooks$set(
margin = function(before, options, envir) {
if (before) par(mgp = c(1.5, .5, 0), bty = "n", plt = c(.105, .97, .13, .97))
else NULL
},
prompt = function(before, options, envir) {
options(prompt = if (options$engine %in% c("sh", "bash")) "$ " else "> ")
})
knitr::opts_chunk$set(margin = TRUE, prompt = TRUE, comment = "",
collapse = TRUE, cache = FALSE, autodep = TRUE,
dev.args = list(pointsize = 11), fig.height = 3.5,
fig.width = 4.24725, fig.retina = 2, fig.align = "center")
options(width = 250)
```
## Packages
Installing the required packages:
```{r}
required <- c("dplyr", "magrittr", "readr")
to_install <- setdiff(required, row.names(installed.packages()))
if (length(to_install)) install.packages(to_install)
```
Loading `magrittr`:
```{r}
library(magrittr)
```
## Loading the data
```{r}
viparc <- readr::read_csv("https://raw.githubusercontent.com/viparc/clires_data/master/data/viparc.csv",
col_types = paste(c("cii", rep("l", 6), rep("d", 45), "lil"), collapse = ""))
```
## Testing for missing weeks and missing values
No missing weeks:
```{r}
viparc %>%
dplyr::group_by(farm, flock) %>%
dplyr::arrange(week) %>%
dplyr::summarise(a = length(unique(diff(week)))) %>%
dplyr::ungroup() %>%
dplyr::filter(a > 1)
```
Almost all the variables have missing values:
```{r}
sapply(viparc, function(x) any(is.na(x)))
```
But missing values concern only 2 weeks:
```{r}
weeks <- sort(unique(unlist(lapply(viparc, function(x) which(is.na(x))))))
viparc[weeks, c("farm", "flock", "week")]
```
## Checking for potential outliers
There does not seem te be any outliers in the number of chicken:
```{r}
hist(viparc$nb_chicken, col = "grey", main = NA, xlab = "number of chicken", ylab = "number of weeks")
```
There seems to be an error on the amount of oxytetracycline during one of the
weeks of observation:
```{r}
viparc %>%
dplyr::select(dplyr::ends_with("_g")) %>%
unlist() %>%
{.[. > 0]} %>%
sort(TRUE) %>%
head(20)
```
It concerns the following week:
```{r}
viparc %>%
dplyr::filter(oxytetracycline_g > 200) %>%
dplyr::select(farm, flock, week)
```
With this data point removed, it seems to make a little bit more sense:
```{r}
viparc %>%
dplyr::select(dplyr::ends_with("_g")) %>%
unlist() %>%
{.[. > 0]} %>%
{.[. < 300]} %>%
hist(0:200, col = "grey", main = NA, xlab = "antibiotic use (g)", ylab = "number of weeks")
```