Learning

2 minute read

Published: September 22, 2019

Learning

Learning

Probably Approximately Correct Learning (PAC)

Key assumptions:

Training and testing data are generated i.i.d according to an fixed but unknonwn distribution $\mathbf D$ (i.e. “averge error” made by $h\in H$)
“quality” of a hypothesis (classification function) $h \in H$ should take unknown distributin $\mathbf D$ into account
risk functional

\[\text{err}_{\mathbf D}(h)=\mathbf D\{(x,y)\in X\times \{-1,1\} \vert h(x) \neq y \}\]

Generalization Error of Pac Model

Let $S={(x_1,y_1),\dots (x_l,y_l) }$ be a set of training example according to $\mathbf D$.
Treat the generalization error $\text{err}_{\mathbf D}(h_s)$ as a random variable depending on the random selection of $S$.
Find a bound of the trial of the distribution of random variable $\text{err}_{\mathbf D}(h_s)$ in the form $\varepsilon=\varepsilon(l,H,\delta)$
$\varepsilon=\varepsilon(l,H,\delta)$ is a function of $l$, $H$ and $\delta$ where $1-\delta$ is a confidence level of the error bound which is given by learner.

Probably Approximately Correct

\[\text{Pr}(\{ \text{err}_{\mathbf D}(h) > \varepsilon=\varepsilon(l,H,\delta) \})< \delta\]

$\varepsilon(l,H,\delta)$ does not depend on the unknown distributin $\mathbf D$.

Find Minimum expected risk

The expected misclassification error made by $h \in H$ is

\[R[h]=\int_{X\times\{ -1,1\}} \frac{1}{2} \vert h(x)-y \vert dp(x,y)\]

The ideal hypothesis minimization

The ideal hypothesis $h^{*}_{\text{opt}}$ should has the smallest expected risk

\[R[h^{*}_{\text{opt}}]\leq R[h] \quad \forall h \in H.\]

Empirical Risk Hypothesis (ERM)

Replace the expected risk over $p(x,y)$ by an average over the training example.
The empirical risk

\[R_{\text{emp}}[h^*_{\text{opt}}]\leq R_{\text{emp}}[h] \quad \forall h\in H\]

only focusing on empirical risk will cause overfitting.

VC Confidence

The following inequality will held with probability $1-\delta$

\[R[h]\leq R_{\text{emp}}[h]+\sqrt{\frac{v(log2l/v)+1-log(\delta/4)}{l}}\]

Let say

\[E(v,l,\delta)=\sqrt{\frac{v(log2l/v)+1-log(\delta/4)}{l}}\]

Remarks:

As $l \rightarrow \infty$, $E(v,l,\delta) \rightarrow 0$.
while hypothesis space is larger, the change of overfitting is increase. Since the probability of errors increase while we have much more selections from $H$.
$E(v,l,\delta) $ increase as long as $\delta$ decreases, which the probability $1-\delta$ larger.

The Structural Risk Minimization(SRM)

The structural risk will be lees than or equal the empirical risk(training error)+VC(error) bound
$\Vert w \Vert_2^2 \propto \text{VC bound}$
$\min \text{VC bound} \Leftrightarrow \min \Vert w \Vert_2^2 \Leftrightarrow \text{Max margin}$

Shattered and VC dimension

Shattered

A given training set $S$ is shattered by $H$ iff for every labeling of $S$, $\exists h \in H$ consistent with this labeling.

Theorem 1:

Consider some set of $m$ points in $\mathbb R^n$, choose a point as origin, then
the $m$ point can be shattered by oriented hyperplanes iff the positive vectors of the rest points are linearly independent.

VC-dimension

The Vapnink-Chervonenkis dimension, $\text{VC}(H)$ of hypothesis space $H$ defined over the input space $X$ is the size of the (existent) largest finite subset of $X$ shattered by $H$.

example:

\[H=\{ \text{all hyperplanes in } \mathbb R^n \}\Rightarrow \text{VC}(H)=n+1\]

Share on

Twitter Facebook Google+ LinkedIn

Elsa CHOI

Learning

Learning

Probably Approximately Correct Learning (PAC)

Generalization Error of Pac Model

Probably Approximately Correct

Find Minimum expected risk

The ideal hypothesis minimization

Empirical Risk Hypothesis (ERM)

VC Confidence

The Structural Risk Minimization(SRM)

Shattered and VC dimension

Shattered

Theorem 1:

VC-dimension

Share on

You May Also Enjoy

Stochastic Processs

紅綠燈

Track Probability

04747 Java