-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathREADME.Rmd
More file actions
262 lines (186 loc) · 10 KB
/
README.Rmd
File metadata and controls
262 lines (186 loc) · 10 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
output: github_document
editor_options:
markdown:
wrap: 72
---
<!-- README.md is generated from README.Rmd. Please edit that file and knit again -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
warning = FALSE, # avoid warnings and messages in the output
message = FALSE,
collapse = TRUE,
fig.width = 5,
fig.height = 6,
dpi = 96,
comment = "#>",
fig.path = "man/figures/README-"
)
par(mar=c(3,3,1,1)+.1)
```
```{r, echo=FALSE}
library(candisc)
```
<!-- badges: start -->
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
[](https://cran.r-project.org/package=candisc)
[](https://friendly.r-universe.dev)
[](https://cran.r-project.org/package=candisc)
[](https://friendly.github.io/candisc/)
<!-- badges: end -->
# candisc <img src="man/figures/logo.png" align="right" height="160px" />
**Visualizing Generalized Canonical Discriminant and Canonical Correlation Analysis**
<!-- Version 1.1.0 -->
Version `r getNamespaceVersion("candisc")`
## Description
This package includes functions for computing and visualizing
generalized canonical discriminant analyses
and canonical correlation analysis
for a multivariate linear model (MLM). The goal is to provide ways of visualizing
such models in a low-dimensional space corresponding to dimensions
(linear combinations of the response variables) of maximal relationship
to the predictor variables.
Traditional canonical discriminant analysis is restricted to a one-way MANOVA
design and is equivalent to canonical correlation analysis between a set of quantitative
response variables and a set of dummy variables coded from the factor variable.
The `candisc` package generalizes this to multi-way MANOVA designs
for all terms in a multivariate linear model (i.e., an `mlm` object),
computing canonical scores and vectors for each term (giving a `"candiscList"` object).
For `mlm`s with more than a few response variables, these methods often provide a
much simpler interpretation of the nature of effects in low-D _canonical space_ than
heplots for pairs of responses or an HE plot matrix of all responses in _variable space_.
The `candisc` package originated as a low-D cousin of the
[heplots package](https://friendly.github.io/heplots/index.html), designed to
provide visualization methods in low-dimensional space.
### Visualization methods
The graphic functions are designed to provide low-rank (1D, 2D, 3D) visualizations of
terms in a `"mlm"` via the `plot.candisc()` method, which plots the observations
in _canonical space_, together with **variable vectors** showing the relations of the
response `y` variables to the canonical variables `Can1`, `Can2`. This is the same idea
as that of a **biplot** (Gabriel, 1971).
The HE plot `heplot.candisc()` and `heplot3d.candisc()` methods use a similar framework,
but replace the observations and groupwise data ellipses in the plot with
representations of the **H** ellipsoid, representing between-group variation
in the means and the **E** ellipsoid reflecting the pooled within-group variation.
Analogously, a multivariate linear (regression) model with quantitative predictors can also be
represented in a reduced-rank space by means of a canonical correlation
transformation of the Y and X variables to _uncorrelated_ **canonical variates**, named with the prefix
`Ycan` and `Xcan`. Computation for this analysis is provided by `cancor()`
and related methods. Visualization of these results in canonical space
are provided by the `plot.cancor()`, `heplot.cancor()`
and `heplot3d.cancor()` methods.
### Discriminant analysis
Some of these visualization methods have now been extended to linear and quadratic discriminant analysis,
using `MASS:lda()` or `MASS:qda()`.
* Provides a simplified interface to prediction, in `predict_discrim()`.
* A new plotting method, `plot_discrim()`, provides `ggplot2` plots of the classification regions and decision boundaries in data space and in discriminant space.
* `cor_lda()` calculates correlations between the observed variables and the discriminant dimensions.
### Variable ordering
The relations among response variables in linear models can also be
useful for "effect ordering"
(Friendly & Kwan (2003))
for *variables* in other multivariate data displays, such as heatmaps or "corrgrams" (Friendly, 2002) of correlations.
to make the
displayed relationships more coherent. The function `varOrder()`
implements a collection of these methods.
## Installation
The current official release of the `candisc` package can be installed from CRAN. The most recent
development version can be installed from R-universe or this Github repo.
| | |
|---------------------|----------------------------------------------------------------------------|
| CRAN version | `install.packages("candisc")` |
| R-universe | `install.packages('candisc', repos = c('https://friendly.r-universe.dev')` |
| GitHub version | `remotes::install_github("friendly/candisc")` |
## Vignettes
* A new vignette, `vignette("diabetes", package="candisc")`, illustrates some of the methods of this package, with a dataset
on forms of diabetes.
* Another vignette, `vignette("painters", package="candisc")`, applies these methods to a dataset on the aesthetic ratings of classical
painters.
* A more comprehensive collection of examples, illustrating multivariate regression and MANOVA methods, is contained in the vignettes for the `heplots` package. Use `browseVignettes(package = "heplots")` to see them all.
+ [HE plot MANOVA Examples](https://friendly.github.io/heplots/articles/HE_manova.html)
+ [HE plot MMRA Examples](https://friendly.github.io/heplots/articles/HE_mmra.html)
* [Datasets in the heplots package](https://friendly.github.io/heplots/articles/datasets.html)
## Datasets
In addition to the datasets in the heplots package, `candisc` includes a few more related to the
statistical and graphical methods implemented here.
The table below classifies these with
**method tags**. Their names are linked to their documentation with graphical output on the
`pkgdown` website, [<http://friendly.github.io/candisc>].
```{r datasets, echo=FALSE}
library(here)
library(dplyr)
library(tinytable)
#dsets <- read.csv(here::here("extra", "datasets.csv"))
dsets <- read.csv("https://raw.githubusercontent.com/friendly/candisc/master/extra/datasets.csv")
# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/candisc/reference/"
dsets <- dsets |>
mutate(dataset = glue::glue("[{dataset}]({refurl}{dataset}.html)"))
tinytable::tt(dsets)
```
## Examples
These examples will get you started.
### Iris data
Using the `iris` data, fit the multivariate model for `Species`. You can feed this to `candisc()` to get the
canonical discriminant equivalent.
```{r iris-candisc}
iris.mod <- lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~ Species, data=iris)
car::Anova(iris.mod)
iris.can <- candisc(iris.mod, data=iris)
iris.can
```
This says that although there are 4 dimensions in the `iris` data, only two are needed to fully account for the differences
among the group means relative to within-group variation. Both are significant by the likelihood ratio tests, but the first accounts for 99.1% of these
differences.
Methods are available to extract coefficients (`coef()`) and scores of observations on the canonical dimensions
(`scores()`).
```{r iris-coef}
# get coefficients
coef(iris.can)
scores(iris.can) |> str()
```
Correlations between the observed variables and the canonical dimensions are given by the `$structure` component of the
object.
```{r}
iris.can$structure
```
### Plotting methods
The basic plot for a `"candisc"` object is a scatterplot of the `scores()name` for the observations on the canonical variates,
which are the linear combinations of the observed variables. For ease of interpretation, it plots vectors
for the responses in the model showing their structure correlations with the canonical dimension.
The following plot illustrates some of the options.
```{r iris-canplot}
#-- assign colors and symbols corresponding to species
iris.colors <- c("red", "darkgreen", "blue")
iris.pch <- 15:17
plot(
iris.can,
col = iris.colors,
pch = iris.pch,
ellipse = TRUE,
var.lwd = 2,
cex.lab = 1.4)
```
The `heplot()` method for a "candisc" object replaces the individual scores by a **H** ellipse showing variation of the
means and an **E** ellipse reflecting the pooled within-group variation.
```{r iris-heplot}
heplot(
iris.can,
fill = TRUE, fill.alpha = 0.1,
prefix = "Canonical dimension",
var.col = "black",
var.lwd = 2,
scale = 35,
lab.cex = 1.25)
```
## Citation
To cite package `candisc` in publications use:
Friendly M., Fox J. (2025). candisc: Visualizing Generalized Canonical Discriminant and Canonical Correlation
Analysis_. R package version 1.0.0, <https://CRAN.R-project.org/package=heplots>.
For the theory on which these methods are based, also cite:
Friendly, M. (2007). “HE plots for Multivariate General Linear Models.” _Journal of Computational and Graphical
Statistics_, *16*(2), 421-444. <https://doi.org/10.1198/106186007X208407>.
## References
Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. _The American Statistician_, **56**(4), 316–324. <https://doi.org/10.1198/000313002533>
Friendly, M., & Kwan, E. (2003). Effect Ordering for Data Displays. _Computational Statistics and Data Analysis_, **43**(4), 509–539. <https://doi.org/10.1016/S0167-9473(02)00290-6>
Gabriel, K. R. (1971). The Biplot Graphic Display of Matrices with Application to Principal Components Analysis. _Biometrics_, **58**(3), 453–467.