# Softmax Classification

## 什么是softmax

$$y_{i} = \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}} \ \ \ \forall i \in 1…C$$

### Softmax梯度

softmax函数进行求导，即求
$$\frac{\partial{y_{i}}}{\partial{a_{j}}}$$

$$\frac{\partial y_i}{\partial a_j}=\frac{\partial \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}}}{\partial a_j}$$

$$f(x) = \frac{g(x)}{h(x)}$$

$$f’(x) = \frac{g’(x)h(x) - g(x)h’(x)}{[h(x)]^2}$$

$$g(x) = e^{a_i} \ h(x) = \sum_{k=1}^{C}e^{a_k}$$

$e^{a_i}$（即$g(x)$）对${a_j}$进行求导，要分情况讨论：

1.如果$i=j$，则求导结果为$e^{a_i}$

2.如果$i \neq j$，则求导结果为0

$$\frac{\partial y_i}{\partial a_j}=\frac{\partial \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}}}{\partial a_j}=\frac{e^{a_i}\sum -e^{a_i}-e^{a_j}}{\sum ^{2}}=\frac{e^{a_i}}{\sum } \frac{\sum -e^{a_j}}{\sum }={y_i}({1-y_i})$$

$$\frac{\partial y_i}{\partial a_j}=\frac{\partial \frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}}}{\partial a_j}=\frac{0-e^{a_i}e^{a_j}}{\sum ^{2}}=-\frac{e^{a_i}}{\sum }\frac{e^{a_j}}{\sum }=-{y_i}{y_j}$$

softmax函数的求导，我在两年前微信校招面试基础研究岗位一面的时候，就遇到过，这个属于比较基础的问题。

### softmax的计算与数值稳定性

$${y_i}=\frac{e^{a_i}}{\sum_{k=1}^{C}e^{a_k}}=\frac{Ee^{a_i}}{\sum_{k=1}^{C}Ee^{a_k}}=\frac{e^{a_i+log(E)}}{\sum_{k=1}^{C}e^{a_k+log(E)}}=\frac{e^{a_i+F}}{\sum_{k=1}^{C}e^{a_k+F}}$$

$$F = -max(a_1, a_2, …, a_C)$$

## 线性模型

x=\left( \begin{align} {x_1} \ {\vdots} \ {x_n} \ \end{align} \right)

$$\theta_i = (\theta_{i1},\cdots,\theta_{in})$$

$$score_i= \theta_ix=\theta_{i1}x_1 +\cdots+\theta_{in}x_n$$

$$i=1…N(假设有N类)$$

## 带核函数的模型

### 核函数

$$k(x,c) = e^{-\frac{ {\left| x-c \right|}^2}{2} }$$

### 核模型的评分

score_i={ {\theta }_{i} }\left( \begin{align} k(x,{ {x}^{(1)} }) \ k(x,{ {x}^{(2)} }) \ \vdots \ k(x,{ {x}^{(m)} }) \ \end{align} \right)

$$\theta_i = (\theta_{i1},\cdots,\theta_{im})$$

$$i=1…N(假设有N类)$$

## 训练方法

### 极大似然估计

$$L(\theta)=p(y_1 |\ x_1,\theta)\cdot p(y_2 |\ x_2,\theta)\cdots p(y_m |\ x_m,\theta)$$

$$\theta = argmax\ log(L(\theta))$$

$$\theta = argmin(-\sum_{i=1}^mlog(\ p(y_{i} |\ x_i,\theta)\ ))$$

$$Loss = -\sum_{i=1}^mlog(\ p(y_{i} |\ x_i,\theta)\ )$$

### KL散度和交叉熵

$$Loss = \sum_{i=1}^N KL(p_i||q_i)$$

$$Loss = \sum_{i=1}^N H(p_i,q_i) - H(p_i)$$
${p_i}$向量只有一个分量取值为1, 其它全为0, 因此$H(p_i)\equiv0$ 所以，可以仅用交叉熵$H(p_i,q_i)$表示损失函数
$$Loss = \sum_{i=1}^N H(p_i,q_i)$$

$$Loss = \sum_{i=1}^N p_{it}log(q_{it})$$

$t$表示${p_i}$为1的那个分量

$$Loss = -\sum_{i=1}^mlog(\ p(y_{i} |\ x_i,\theta)\ )$$

#### LOSS function求导

$$L = -\sum_{k = 1}^{n}\sum_{i = 1}^{C}t_{ki} log(y_{ki})$$

$$\frac{\partial l_{CE}}{\partial a_j}=-\sum_{i=1}^{C}\frac{\partial t_ilog(y_i)}{\partial a_j}=-\sum_{i=1}^{C}{t_i}\frac{\partial log(y_i)}{\partial a_j}=-\sum_{i=1}^{C}{t_i}\frac{1}{y_i}\frac{\partial y_i}{\partial a_j}$$

\begin{align} -\sum_{i=1}^{C}{t_i}\frac{1}{y_i}\frac{\partial y_i}{\partial a_j} &= -\frac{t_i}{y_i}\frac{\partial y_i}{\partial a_j}-\sum_{i\neq j}^{C}\frac{t_i}{y_i}\frac{\partial y_i}{\partial a_j} \ &=-\frac{t_i}{y_i}{y_i}(1-{y_j})-\sum_{i\neq j}^{C}\frac{t_i}{y_i}(-{y_i}{y_j}) \ &=-{t_j}+{t_j}{y_j}+\sum_{i\neq j}^{C}{t_i}{y_j}=-{t_j}+\sum_{i=1}^{C}{t_i}{y_j} \ &=-{t_j}+{y_j}\sum_{i=1}^{C}{t_i}={y_j}-{t_j} \end{align }