-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathProjectMysql.sql
More file actions
386 lines (293 loc) · 10.1 KB
/
ProjectMysql.sql
File metadata and controls
386 lines (293 loc) · 10.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
-- tool for data preparing. The goal is to replace '' by NULL in order to avoid future issues
UPDATE covid_19.patient
SET
source = NULL
WHERE
source = '';
-----------------------------------------------------------------------------------------------------------------
-- Display the Dataset :
SELECT
*
FROM
covid_19.patient;
-----------------------------------------------------------------------------------------------------------------
-- 1) Data Informations
-- this dataset was submitted on 15th of March 2020.
-- (https://www.kaggle.com/lperez/coronavirus-france-dataset)
-- That's why we're going to analyze the situation for the day 15/03 .
-- there are a lot of NULL values in this dataset,
-- it can be avoid to have good results and there are severals column unusable
-- like released date (very important indicator)
-- what's the birth year range ?
SELECT
MIN(birth_year), MAX(birth_year), ROUND(avg(birth_year))
FROM
covid_19.patient;
-- Birth year is between 1922 and 2019 with average 1966.
-- what's the age range ?
SELECT
2020 - MAX(birth_year) AS age_min,
2020 - MIN(birth_year) AS age_max,
2020 - ROUND(AVG(birth_year)) AS avg_age
FROM
covid_19.patient;
-- Age is between 1 and 98 with average 54 years old.
-- how much patient in this dataset?
SELECT
COUNT(id)
FROM
covid_19.patient;
-- In this dataset, there is 2072 patients.
-----------------------------------------------------------------------------------------------------------------
-- 2) Data analyze
-- how much case by region on the 15/03?
SELECT
region, COUNT(region) AS numberOfCases
FROM
covid_19.patient
GROUP BY region
ORDER BY COUNT(region) DESC;
-- Grand-Est (461 cases) and Ile-de-France (440 cases) regions are
-- the most affected by this epidemic.
-- display the region most affected by this epidemic
SELECT
region, MAX(numberOfCases)
FROM
(SELECT
region, COUNT(region) AS numberOfCases
FROM
covid_19.patient
GROUP BY region
ORDER BY COUNT(region) DESC) AS previousQuery;
-- Grand-Est with 461 cases
-- what's the different infection reason?
SELECT DISTINCT
(infection_reason)
FROM
covid_19.patient;
-- we can see that infection reasons are multiple.
-- What's the count for each infection_reason on the 15/03?
SELECT
infection_reason, COUNT(infection_reason)
FROM
covid_19.patient
GROUP BY infection_reason
ORDER BY COUNT(infection_reason) DESC;
-- we can see that contact with patient and
-- visit to Mulhouse religious gathering was the main epidemic spreading reasons.
-- then, there is visit to Italy and Egypt as infection reasons.
-- We will now try to retrace epidemic spreading :
SELECT
confirmed_date, infection_reason
FROM
covid_19.patient
WHERE
infection_reason IS NOT NULL
ORDER BY confirmed_date;
-- the first case was confirmed the 27th of january due to an Italy travel.
-- so we can see that confirmed case came from Italy in first time.
-- the epidemic quickly became locally present in other regions.
-- this is mainly due to the religious meeting in Mulhouse between the 17th & 21th of february.
-- when the first cases for the religious gathering was confirmed ?
SELECT
confirmed_date, infection_reason
FROM
covid_19.patient
WHERE
infection_reason LIKE '%Mulhouse%'
ORDER BY confirmed_date;
-- The first case for the religious gathering was confirmed the 4th of march,
-- 11 days after the end of the gathering.
-- We can see that the first symptoms take time to be felt.
-- We now understand why it was already too late to contain the progression of the virus.
-- How much patient exactly visit Italy before being sick ?
SELECT
infection_reason, COUNT(infection_reason)
FROM
covid_19.patient
WHERE
infection_reason LIKE '%Italy%'
OR '%Milan%'
OR '%Lombardy%'
OR 'Italian'
ORDER BY confirmed_date;
-- There is exactly 23 persons who visited Italy before being sick.
-- how much patient exactly visited Mulhouse religious meeting ?
SELECT
infection_reason, COUNT(infection_reason)
FROM
covid_19.patient
WHERE
infection_reason LIKE '%Mulhouse%'
ORDER BY confirmed_date;
-- There is exactly 51 persons who visited the Mulhouse religious meeting before being sick.
-- how much new case for each day until 15/03?
SELECT
confirmed_date, COUNT(confirmed_date)
FROM
covid_19.patient
GROUP BY confirmed_date
ORDER BY confirmed_date;
-- This confirm that Mulhouse meeting is the biggest trigger of the epidemic
-- because after 4th of March the number of new cases indradays jumped from 45
-- to several hundred.
-- this dataset was submitted on the 15th of March 2020.
-- So, we can't have the current situation.
-- On the 15/03 what was the number of cases for each statues ?
SELECT
SUM(CASE
WHEN status IS NULL THEN 1
END) AS nullValue,
SUM(CASE
WHEN status = 'hospital' THEN 1
END) AS hospital,
SUM(CASE
WHEN status = 'home isolation' THEN 1
END) AS home_isolation,
SUM(CASE
WHEN status = 'deceased' THEN 1
END) AS deceased,
SUM(CASE
WHEN status = 'released' THEN 1
END) AS released
FROM
covid_19.patient;
-- What was the percentage of hospitalization ?
SELECT
(SUM(CASE
WHEN status = 'hospital' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_hospitalization
FROM
covid_19.patient;
-- The result is 75 %. I think this result is not true. I think it's due to
-- high number of NULL values.
-- What was the percentage of deceased ?
SELECT
(SUM(CASE
WHEN status = 'deceased' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_deceased
FROM
covid_19.patient;
-- Oficial values (4th of May) annonce a deceased rate less than 3 % in France.
-- This Dataset gives 4.39 % . This is close to reality.
-- This is difficult to have the good rate because of NULL values...
-- What about health cases ?
SELECT
SUM(CASE
WHEN health IS NULL THEN 1
END) AS nullValue,
SUM(CASE
WHEN health = 'good' THEN 1
END) AS good,
SUM(CASE
WHEN health = 'critical' THEN 1
END) AS critical,
SUM(CASE
WHEN health = 'deceased' THEN 1
END) AS deceased,
SUM(CASE
WHEN health = 'cured' THEN 1
END) AS cured
FROM
covid_19.patient;
-- On the 15/03 patient are mostly in good shape.
-- Difficult to give other conclusions because of NULL values.
-- What was the percentage of deceased cases who had more than 60 Years old ?
SELECT
(SUM(CASE
WHEN status = 'deceased' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_old_deceased
FROM
covid_19.patient
WHERE
birth_year <= 2020 - 60;
-- the percentage of deceased cases for people aged of 60 Years old or more is 36 % .
-- What was the percentage of deceased cases who had between 60 & 65 Years old ?
SELECT
(SUM(CASE
WHEN status = 'deceased' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_old_deceased
FROM
covid_19.patient
WHERE
birth_year BETWEEN 2020 - 65 AND 2020 - 60;
-- the percentage of deceased cases for people between 60 & 65 Years old is 33 % .
-- What was the percentage of deceased cases who had more than 80 Years old ?
SELECT
(SUM(CASE
WHEN status = 'deceased' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_old_deceased
FROM
covid_19.patient
WHERE
birth_year <= 2020 - 80;
-- the percentage of deceased cases for people aged of 80 Years old or more is 62 % .
-- we can see a strong correlation between deceased rate and age.
-- what was the percentage of old people (>=60 years old) in hospital ?
SELECT
(SUM(CASE
WHEN status = 'hospital' THEN 1
END) / COUNT(status)) * 100 AS percentage_of_old_in_hospital
FROM
covid_19.patient
WHERE
birth_year <= 2020 - 60;
-- In hospital, we can observe that 53 % have more than 60 years old.
-- what was the percentage of male in hospital ?
SELECT
(SUM(CASE
WHEN sex = 'male' THEN 1
END) / COUNT(sex)) * 100 AS percentage_of_male_in_hospital
FROM
covid_19.patient
WHERE
status = 'hospital';
-- Studies annonce that man are mostly affected than woman. There is no evidence
-- of this on the Dataset. We will try an other thing later.
-- What was the percentage of people with good health in hospital ?
SELECT
(SUM(CASE
WHEN health = 'good' THEN 1
END) / COUNT(health)) * 100 AS percentage_of_good_in_hospital
FROM
covid_19.patient
WHERE
status = 'hospital';
-- The percentage of people with good health in hospital was 88 %
-- What was the percentage of people with critical health in hospital ?
SELECT
(SUM(CASE
WHEN health = 'critical' THEN 1
END) / COUNT(health)) * 100 AS percentage_of_critical_in_hospital
FROM
covid_19.patient
WHERE
status = 'hospital';
-- In this dataset, there is 9 % of critical cases in hospital for covid 19.
-- What was the percentage of male in hopital with critical health ?
SELECT
(SUM(CASE
WHEN sex = 'male' THEN 1
END) / COUNT(sex))*100 as hospital_critical_percentage_of_male
FROM
covid_19.patient
WHERE
status = 'hospital'
AND health = 'critical';
-- result is 50 %
-- There is a lack of data, so we can't confirm that males are more than females
-- in a critical situation at hospital.
-- Conclusion:
-- To conclude there are a lot of true facts resulting of this Dataset.
-- This dataset is not perfect and old (published 2 month ago),
-- that's why we can't draw precise result but we have the general trend.
-- So, in this conclusion I will try to resume the main points with certainties.
-- 1) Principle factors of pandemic trigger are : contact with patient, visit to Italy,
-- and Mulhouse gethering.
-- 2) The most affected region of France are: Grand-Est & Ile-De-France.
-- 3) The percentage of deceased cases for people aged of 60 Years old or more is higher
-- than normal.
-- 4) The percentage of deceased cases for people aged of 80 Years old is very high.
-- So, deceased risk increase with age.
-- 5) There is 50 % old (> 60 y) and 50 % young (< 60 y) in hospital.
-- 6) People are mostly with good status in hospital.