Clustering

1 minute read

Published: September 26, 2019

Clustering

Clustering

Basically, this is my studying from NTCU open course - Machine Learning. I take the key points for my references.

EM-Algorithm

E-step

Compute the probability that piont $n$ is generated by distribution $k$. There are total $N$ points and $K$ clusters.
The EM algorithm is an efficient iterative procedure to compute the Maximum Likelihood (ML) estimate in the presence of missing or hidden data
- In the soft K-means, we don’t know the proportion of each instance belong to each cluster.
In Maximum Likelihood estimation, we wish to estimate the model parameter(s) for which the observe data are the most likely.
Each interation of the EM algorithm consistes of two processes
- E-step: the missing data are estimated given the observed data and current esstimate of the model parameters.
- M-step: the likelihood function is maximized under the assumption that the missing data are known.

\[P^{(i)}(k|n)=\frac{P_k^{(i)}g(x_n,m_k^{(i)},\sigma_k^{(i)})}{\sum\limits_{k=1}^{K} P_j^{(i)}g(x_n,m_j^{(i)},\sigma_j^{(i)})}\]

where $P_k=P(X=k)$

M-step

\[\begin{aligned} m_k^{(i+1)}& =\frac{\sum\limits_{n=1}^{N}P^{(i)}(k|n)x_n}{\sum\limits_{n=1}^{N}P^{(i)}(k|n)} \\ \sigma_k^{(i+1)}&=\sqrt{\frac{1}{D}\frac{\sum\limits_{n=1}^{N}P^{(i)}(k|n)\Vert x_n-m_k^{(i+1)} \Vert^2}{\sum\limits_{n=1}^{N}P^{(i)}(k|n)}}\\ P_k^{(i+1)}&=\frac{1}{N}\sum\limits_{n=1}^{N}P^{(i)}(k|n) \end{aligned}\]

EM and K-means

Notice the similarity between EM for normal mixtures and K-means.
- The expectation is the assignment
- The maximization step is update of centres
K-means is a simplified EM
K-means makes a hard decision while EM makes a soft decision when updating the parameters of the model.

Share on

Twitter Facebook Google+ LinkedIn

Elsa CHOI

Clustering

Clustering

EM-Algorithm

E-step

M-step

EM and K-means

Share on

You May Also Enjoy

Stochastic Processs

紅綠燈

Track Probability

04747 Java