Skip to content

Commit 7a7bc00

Browse files
committed
Merge branch 'main' of https://github.com/MoenMi/notes
2 parents 8f375ef + d54f633 commit 7a7bc00

3 files changed

Lines changed: 112 additions & 2 deletions

File tree

_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ parts:
7878
- file: classes/cs483/2-probability-univariate-models
7979
- file: classes/cs483/3-probability-multivariate-models
8080
- file: classes/cs483/4-statistics
81+
- file: classes/cs483/9-linear-discriminant-analysis
8182
- file: classes/cs483/13-neural-networks-tabular
8283
- file: classes/cs483/16-exemplar-based-methods
8384
- file: classes/cs491/overview

classes/cs483/1-intro.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,36 @@ This can be minimized to compute the **maximum likelihood estimate (MLE)**.
7272

7373
### 1.2.2 - Regression
7474

75+
If we want to predict a real-valued quantity $y \in \mathbb{R}$ instead of a class label $y \in \{ 1, \dots, C \}$, this is known as **regression**.
76+
77+
Regression is very similar to classification, but we need to use a different loss function. The most common choice is to use quadratic loss:
78+
79+
$$ \ell_2(y, \hat{y}) = (y - \hat{y})^2 $$
80+
81+
The empirical risk when using quadratic loss is equal to the **mean squared error (MSE)**:
82+
83+
$$ \text{MSE}(\boldsymbol{\theta}) = \frac{1}{N} \sum^N_{n=1} (y_n - f(\boldsymbol{x}_n; \boldsymbol{\theta}))^2 $$
84+
85+
In regression problems, we typically assume that the output distribution is normal.
86+
87+
#### Linear Regression
88+
89+
A **simple linear regression (SLR)** model model takes the following form:
90+
91+
$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x $$
92+
93+
We can adjust $\beta_0$ and $\beta_1$ to find the values that minimize the squared errors.
94+
95+
If we have multiple input features, we can use a **multiple linear regression (MLR)** model:
96+
97+
$$ f(x; \boldsymbol{\theta}) = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n $$
98+
99+
#### Polynomial Regression
100+
101+
102+
103+
#### Deep Neural Networks
104+
75105

76106

77107
### 1.2.3 - Overfitting and generalization
@@ -158,11 +188,19 @@ $$ \text{TFIDF}_{ij} = \log(\text{TF}_{ij} + 1) \times \text{IDF}_i $$
158188

159189
#### Word embeddings
160190

161-
**Word embeddings** map each sparse one-hot vector, $\boldsymbol{x}_{nt} \in \{0, 1\}^V$, to a lower-dimensional dense vector, $\boldsymbol{e}_{nt} \in \mathbb{R}^K$ using $\boldsymbol{e}_{nt} = \textbf{E} \boldsymbol{x}_{nt}$, where $\textbf{E} \in \mathbb{R}^{K \times V}$ is learned such that
191+
**Word embeddings** map each sparse one-hot vector, $\boldsymbol{x}_{nt} \in \{0, 1\}^V$, to a lower-dimensional dense vector, $\boldsymbol{e}_{nt} \in \mathbb{R}^K$ using $\boldsymbol{e}_{nt} = \textbf{E} \boldsymbol{x}_{nt}$, where $\textbf{E} \in \mathbb{R}^{K \times V}$ is learned such that semantically similar words are placed close by. Once we have an embedding matrix, we can represent a variable-length text document as a **bag of word embeddings**. We can then convert this to a fixed length vector by summing the embeddings
162192

163-
#### Dealing with novel words
193+
$$ \bar{\boldsymbol{e}}_n = \sum^T_{t=1} \boldsymbol{e}_{nt} = \textbf{E} \tilde{\boldsymbol{x}}_n $$
194+
195+
where $\tilde{\boldsymbol{x}}_n$ is the bag of words representation. We can use this inside of a logistic regression classifier. The overall model has the form
196+
197+
$$ p(y = c | \boldsymbol{x}_n, \boldsymbol{\theta}) = \text{softmax}_c(\textbf{WE} \tilde{\boldsymbol{x}}_n) $$
164198

199+
We often use a **pre-trained word embedding** matrix $\textbf{E}$, in which case the model is linear in $\textbf{W}$, which simplifies parameter estimation.
200+
201+
#### Dealing with novel words
165202

203+
If the model encounters a novel word at test time, it is known as **out of vocabulary (OOV)**. A standard heuristic to solve this problem is to replace all novel words with the special symbol **UNK**. This loses information, since we may be able to deduce info from suffixes/root words. To address this, we can break the words down into their substructure, and then take **subword units** or **wordpieces**. These are often created using a method called **byte-pair encoding**, which is a form of data compression that creates new symbols to represent common substrings.
166204

167205
### 1.5.5 - Handling missing data
168206

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# 9 - Linear Discriminant Analysis
2+
3+
## 9.1 - Introduction
4+
5+
In this chapter, we consider models of the following form:
6+
7+
$$ p(y = c | \boldsymbol{x}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{x} | y = c, \boldsymbol{\theta})p(y = c | \boldsymbol{\theta})}{\sum_{c'} p(\boldsymbol{x} | y = c', \boldsymbol{\theta}) p(y = c' | \boldsymbol{\theta})} $$
8+
9+
The term $p(y = c | \boldsymbol{\theta})$ is the prior over class labels, and the term $p(\boldsymbol{x} | y = c, \boldsymbol{\theta})$ is called the **class conditional density** for class $c$.
10+
11+
## 9.2 - Gaussian discriminant analysis
12+
13+
### 9.2.1 - Quadratic decision boundaries
14+
15+
16+
17+
### 9.2.2 - Linear decision boundaries
18+
19+
20+
21+
### 9.2.3 - The connection between LDA and logistic regression
22+
23+
24+
25+
### 9.2.4 - Model fitting
26+
27+
28+
29+
### 9.2.5 - Nearest centroid classifier
30+
31+
32+
33+
### 9.2.6 - Fisher’s linear discriminant analysis *
34+
35+
36+
37+
## 9.3 - Naive Bayes classifiers
38+
39+
### 9.3.1 - Example models
40+
41+
42+
43+
### 9.3.2 - Model fitting
44+
45+
46+
47+
### 9.3.3 - Bayesian naive Bayes
48+
49+
50+
51+
### 9.3.4 - The connection between naive Bayes and logistic regression
52+
53+
54+
55+
## 9.4 - Generative vs discriminative classifiers
56+
57+
### 9.4.1 - Advantages of discriminative classifiers
58+
59+
60+
61+
### 9.4.2 - Advantages of generative classifiers
62+
63+
64+
65+
### 9.4.3 - Handling missing features
66+
67+
68+
69+
### 9.5 Exercises -
70+
71+

0 commit comments

Comments
 (0)