-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathminiProject2.Rmd
More file actions
263 lines (225 loc) · 10.1 KB
/
miniProject2.Rmd
File metadata and controls
263 lines (225 loc) · 10.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "Percentage Difference in Contributions in and out of the State"
author: "Christina Lyu & Sicong Li (using Github^[Github link: https://github.com/ChristinaLyu/miniProject2.git])"
date: "start: 3/7/2018;
last modified: 3/23/2018"
output:
html_document:
code_folding: hide
---
The amount of campaign funding can indicate a candidate’s ability to find, communicate with and cultivate supporters. In the 2012 presidential election, President Obama, the Democratic Party, and supportive outside groups raised more than $1.1 billion in his victorious campaign. In recent years, political engagement across state lines has increased dramatically, and only Alaska and Hawaii impose any limits on out-of-state contributions, while there is no state limits out-of-state expenditures.
Therefore, we want to analyze the difference between inflow and outflow of the contributions in each state and the total contribtions in each state.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyverse)
load("house_elections.rda")
load("candidates.rda")
load("committees.rda")
load("contributions.rda")
library(ggplot2)
library(ggthemes)
library(maps)
```
```{r percentage, warning = FALSE, message = FALSE}
#join two data frames: committees and contributions
joined <-
committees %>%
#use left join to keep the information in contributions and put the information in committees in the left
left_join(contributions, by = "cmte_id")
#create new dataframe group_id with the comittees grouped by the name and state
group_id <-
joined %>%
#group the committees by name, and if they have the same name group them by their state
group_by(cmte_id, cmte_state) %>%
#summarise the dataframe
summarise(
#number of contributions made
amount = n(),
#get the number of contributions made to inside of the state
same = sum(ifelse(state == cmte_state, 1, 0)),
#contributions to outside of the state
diff = amount - same)
```
```{r warning = FALSE, message = FALSE}
group_summary <-
group_id %>%
#select the rows where same is not NA
filter(!is.na(same)) %>%
#put committees for each state together
group_by(cmte_state) %>%
#summarise the groups with the total contributions, and the inState, outState contributions and calculate the percentage
summarise(total = sum(amount),
inState = sum(same),
outState = sum(diff),
percentageIn = inState / total,
percentageOut = outState / total)
group_summary
group_summary %>%
#summarise the previous dataframe by the mean of the percentage in State and out of State
summarise(all = sum(total),
inS = sum(inState),
outS = sum(outState),
meanIn = inS / all,
meanOut = outS / all
)
group_summary %>%
#select the rows with the fewest or the most inState contributions
filter(percentageIn == max(percentageIn) |
percentageOut == max(percentageOut))
```
From the result of the dataset, among the states we are analyzing, the District of Columbia has the largest contribution in and out of the total. Overall, there is a mean of 15.61% contributions to committees in the same state and a mean of 84% contributions to other states. Therefore, most states contribute to committees out of state. From the dataset, we can also get the information that most of the contributions in West Virginia are made to committees within the state, and all of the contributions in Puerto Rico are made to committees outside the state.
```{r amount, warning = FALSE, message = FALSE}
percentage <-
joined %>%
#select rows in which transaction_amt is not NA
filter(!is.na(transaction_amt)) %>%
#group the dataframe by the state
group_by(cmte_state) %>%
#add a colomn called total representing the total amount of transactions
mutate(total = sum(transaction_amt))
getPercentage <- function(data) {
data %>%
#add columns of sumOut representing the total amount of transaction outside the state and its percentage in the total
mutate(sumName = sum(transaction_amt),
percentName = sumName / total) %>%
select(cmte_state, total, sumName, percentName)
}
percentageIn <-
percentage %>%
filter(cmte_state == state) %>%
getPercentage() %>%
select(cmte_state, sumName, percentName)
colnames(percentageIn) <- c("cmte_state", "sumIn", "percentIn")
#remove duplicate rows
percentageIn <-
percentageIn[!duplicated(percentageIn), ]
percentageOut <-
percentage %>%
#select the rows where the donation is made to the different states as the committee
filter(cmte_state != state) %>%
getPercentage()
colnames(percentageOut) <- c("cmte_state", "total", "sumOut", "percentOut")
#remove duplicate rows
percentageOut <-
percentageOut[!duplicated(percentageOut), ]
#join the two dataframe
percentage_all <- percentageIn %>%
full_join(percentageOut, by = "cmte_state")
```
```{r, message = FALSE, warning = FALSE, fig.width=15}
#making a bar plot for each state and the percentage of transmissions within the state
bar_plot <-
percentage_all %>%
ggplot(aes(x = factor(cmte_state))) +
geom_bar(aes(x = factor(cmte_state), y = percentIn), stat = "identity", fill = "red") +
ggtitle("Percentage of Contribution Inflow")+
#make the font and the bold of the title
theme(plot.title = element_text(family = "Century Gothic", color="Black", face="bold", size=34, hjust=0))
bar_plot
```
To analyze the data, we joined dataset *committees* and *contributions* together by committees ID. Then we calculated the sum of transaction inside and outside each state. Looking at the bar plot, we can see that except four regions, which are Arizona, Hawaii, Idaho, and the Northern Mariana Islands, most of the regions have contributions inflow smaller than 50%. This indicates that over half of the money transaction in most of the states is out-of-state transactions. Considering there are 55 regions in total in our dataset, the in-state transaction is the highest among the all, which makes sense as people cares more about their rights in their states.
There is no transaction inflow in Puerto Rico. We think this could be explained by the political voting right of Puerto Rico. Residents in Puerto Rico are not entitled to electoral votes for President. In the U.S. House of Representatives, Puerto Rico is entitled to a Resident Commissioner, a delegate who is not allowed to vote on the floor of the House.
Among all the states who do contribute to committees within the states, Iowa has the lowest rate. We found it is noteworthy as Iowa is one of the swing states.
Also, among the top ten regions with the largest percentIn, four of them (Hawaii, Maine, Nebraska and California) are leaning or solid Democratic. Three of them (Arizona, Idaho and North Dakota) are leaning or solid Republican. Two of them (Guam and the Northern Mariana Islands) are regions that can't vote (Commonwealth/Territory), and Ohio is a swing state.
```{r}
#make a list of state names
state_list <- c("Alabama",
"Alaska",
"Arizona",
"Arkansas",
"California",
"Colorado",
"Connecticut",
"Delaware",
"Florida",
"Georgia",
"Hawaii",
"Idaho",
"Illinois",
"Indiana",
"Iowa",
"Kansas",
"Kentucky",
"Louisiana",
"Maine",
"Maryland",
"Massachusetts",
"Michigan",
"Minnesota",
"Mississippi",
"Missouri",
"Montana",
"Nebraska",
"Nevada",
"New Hampshire",
"New Jersey",
"New Mexico",
"New York",
"North Carolina",
"North Dakota",
"Ohio",
"Oklahoma",
"Oregon",
"Pennsylvania",
"Rhode Island",
"South Carolina",
"South Dakota",
"Tennessee",
"Texas",
"Utah",
"Vermont",
"Virginia",
"Washington",
"West Virginia",
"Wisconsin",
"Wyoming")
#get the states map from map_data
states <- map_data("state")
```
```{r, message = FALSE, warning = FALSE, fig.width = 15, fig.height = 12}
#change all the state name to lowercase to corresponde with the state names in the dataset, because I found the list from the internet and copied it to the code
i <- 1
stateLower <- list()
for (state in state_list){
lower <- tolower(state)
stateLower[[i]] <- lower
i <- i + 1
}
#change the state name to teh abbreviation of the state as appeared in cmte_state
for(i in seq_along(stateLower)){
states[ , 5 ][ states[ , 5 ] == stateLower[i] ] <- state.abb[grep(state_list[i], state.name)]
}
#add the column to the states dataset
states$cmte_state <- states$region
map_per <- inner_join(states, percentage_all, by = "cmte_state")
#remove duplicate rows
map_per <-
map_per[!duplicated(map_per), ]
#make a function to draw maps with color density
mapGraph <- function(fillV, color1, color2, title) {
ggplot() +
#add a theme
theme(legend.position="none") +
#draw the map and fill it with color
geom_map(data=map_per, map=map_per, aes(map_id=cmte_state, x = long, y = lat, fill=fillV)) +
#make the color the same color indicated when calling the function
scale_fill_gradient(low = color1, high = color2, guide = "colourbar") +
coord_equal() +
ggtitle(title)+
#make the font and the bold of the title
theme(plot.title = element_text(family = "Century Gothic", color="Black", face="bold", size=45, hjust=0))
}
#draw the percentageIn map
gg <-
mapGraph(map_per$percentIn, "yellow", "red", "Percentage of Contribution Inflow")
gg
```
We then mapped the percentIn on a geographical map. It shows that Maine and states in the west coast overall have relatively higher percentIn, while states in the central America have relatively lower percentIn. However, among those states, North Dakota stands out from the states surrounded with the percentIn of 44%.
```{r, message = FALSE, warning = FALSE, fig.height = 12, fig.width=15}
#draw the total map
gg2 <-
mapGraph(map_per$total, "yellow", "black", "Total Contribution to Campaign Committees")
gg2
```
In this graph, we mapped the total contributions to political committees. The five states with the largest total contributions are District of Columbia, Virginia, Texas, California and New York, while the five states with the least total contributions are Northern Mariana Islands, Guam, Virgin Islands, Maine, and Alaska. We figured out that developed states are more likely to make more contributions, while regions of territories or not well-developed are less likely to contribute to US political activities.