# Wide and Deep Learning for Recommender Systems

Google Play 用的深度神经网络推荐系统，主要思路是将 Memorization(Wide Model) 和 Generalization(Deep Model) 取长补短相结合。论文见 Wide & Deep Learning for Recommender Systems

# Overview of System

• User features
e.g., country, language, demographics
• Contextual features
e.g., device, hour of the day, day of the week
• Impression features
e.g., app age, historical statistics of an app

WDL 就是用在排序系统中。

# Wide and Deep Learning

## Wide Model

Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.

Linear model 大家都很熟悉了

$$y = w^Tx+b$$

$x = [x_1, x_2, …, x_d]$是包含了 d 个特征的向量，$w = [w_1, w_2, …, w_d]$是模型参数，b 是偏置。特征包括了原始的输入特征以及 cross-product transformation 特征，cross-product transformation 的式子如下：

$\varnothing_k(x)=\prod^d_{i=1}x_i^{c_{ki}}$

$ckjc_{kj}$是一个布尔变量，如果第 i 个特征是第 k 个 transformation φk 的一部分，那么值就为 1，否则为 0，作用：

This captures the interactions between the binary features, and adds nonlinearity to the generalized linear model.

## Deep Model

Generalization is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past.

$$a^{(l+1)}=f(W^{(l)}a^{(l)}+b^{(l)})$$

f 是激活函数(通常用 ReLU)，l 是层数。

## Joint Training

Joint Training vs Ensemble

• Joint Training 同时训练 wide & deep 模型，优化的参数包括两个模型各自的参数以及 weights of sum
• Ensemble 中的模型是分别独立训练的，互不干扰，只有在预测时才会联系在一起

# System Implementation

pipeline 如下图

## Data Generation

Label: 标准是 app acquisition，用户下载为 1，否则为 0
Vocabularies: 将类别特征(categorical features)映射为整型的 id，连续的实值先用累计分布函数CDF归一化到[0,1]，再划档离散化。

Continuous real-valued features are normalized to [0, 1] by mapping a feature value x to its cumulative distribution function P(X ≤ x), divided into nqnqn_q quantiles. The normalized value is $i−1 \over n_q-1$for values in the i-th quantiles.