Read a paper: Implicit Look-Alike modeling in Display Ads
Zhang, W., Chen, L., Wang, J.: Implicit Look-alike Modelling in Display Ads: Transfer Collaborative Filtering to CTR Estimation, arxiv:1601.02377
Background
- The look-alike modeling consists of two stages:
- profiling
- explicitly build the user profiles and detect their interest segment
via their online behaviour
- explicitly build the user profiles and detect their interest segment
- targeting
- detect the users with simlilar interests to the known customer
- profiling
- However this profiling-and-targeting mechanism is not optimal:
- potentially correlated segments are regarded as separated
- building segments is performed independently of its latter use of ad response prediction
- difficult to update over time
Contribution
- They propose a framework to implicitly and jointly learn the users’ profiles
on both Web browing behaviours and ad response behaviours- directly maps each user, webpage and ad into a latent space
- jointly learns the users’ profiles on general browsing and ad response behaviour
- makes an improvement over the prediction of the users’ ad response
with the knowledge from the user browsing behaviours
Implicit Look-Alike Modeling
- We commonly have two types of observations about underlying user behaviours
- web browsing behaviours prediction (collaborative filtering task)
- dataset $D^{c}$, data instance as $(\mathbf{x}^{c}, y^{c})$
- ad response behaviours prediction (CTR task)
- dataset $D^{r}$, data instance as $(\mathbf{x}^{r}, y^{r})$
- web browsing behaviours prediction (collaborative filtering task)
- The joint conditional likelihood:
- There are two predictions tasks:
- Web Browsing Prediction (CF task)
- data $\mathbf{x}^{c} \equiv (\mathbf{x}^{u},\mathbf{x}^{p})$: each users’s online browsing behaviour
- $\mathbf{x}^{u} \in \mathbb{R}^{I^{c}}$ - the set of features for a user
- $\mathbf{x}^{p} \in \mathbb{R}^{J^{c}}$ - the set of features for a publisher
- each $\mathbf{x}_{i}^{u},\mathbf{x}_{i}^{p}$ is associated with a $K$-dimensional latent vector $\mathbf{v}_{i}^{c},\mathbf{v}_{j}^{c}$
- thus the latent matrix $\mathbf{V}^{c} \in \mathbb{R}^{(I^{c}+J^{c})\times K}$
- target to predict $y^{c}$
- whether the user is interested in visiting any given new publisher
- data $\mathbf{x}^{c} \equiv (\mathbf{x}^{u},\mathbf{x}^{p})$: each users’s online browsing behaviour
- Ad Response Prediction (CTR task)
- data $\mathbf{x}^{r} \equiv (\mathbf{x}^{u},\mathbf{x}^{p},\mathbf{x}^{a})$: each user’s online ad feedback behaviour
- $\mathbf{x}^{u} \in \mathbb{R}^{I^{r}}$ - the set of features for a user
- $\mathbf{x}^{p} \in \mathbb{R}^{J^{r}}$ - the set of features for a publisher
- $\mathbf{x}^{a} \in \mathbb{R}^{L^{r}}$ - the set of features for a advertiser
- each $\mathbf{x}_{i}^{u},\mathbf{x}_{i}^{p},\mathbf{x}_{i}^{a}$ is associated with a $K$-dimensional latent vector $\mathbf{v}_{i}^{r},\mathbf{v}_{j}^{r},\mathbf{v}_{l}^{r}$
- thus the latent matrix $\mathbf{V}^{r} \in \mathbb{R}^{(I^{r}+J^{r}+L^{r})\times K}$
- target to predict $y^{r}$
- how likely it is that the user will click a specific ad impression
- data $\mathbf{x}^{r} \equiv (\mathbf{x}^{u},\mathbf{x}^{p},\mathbf{x}^{a})$: each user’s online ad feedback behaviour
- Web Browsing Prediction (CF task)
- Dual-Task Bridge
- the weights of the user features and publisher features in CTR task
are assumed to be generated from those in CF task (as a prior):- $\mathbf{w}^{r} \simeq N(\mathbf{w}^{c}, \sigma^{2}I)$
- the users’ interest towards webpage is relatively general and ad can be regarded as a special kind of webpage content
- user interests from their browsing behaviours can be regareded as a modification or derivative from the learned general interests
- they add a hyperparameter $\alpha$ to balance the relative importance of the two tasks
- the weights of the user features and publisher features in CTR task
Evaluation
- Dataset
- $\mathbf{x}^{u}$: user_cookie, hour, browser, os, user_agent, screen_size
- $\mathbf{x}^{p}$: domain, url, exchange, ad_slot, slot_size
- $\mathbf{x}^{a}$: advertiser, campaign
- Expriments
- 1st
- only focus on user_cookie and domain as a baseline
- to check users’ behaviours and webpage browsing
lead to better ad click modeling
- to check users’ behaviours and webpage browsing
- only focus on user_cookie and domain as a baseline
- 2nd
- append various features in the first setting
- to observe the performance change
- to check which featrues lead to better transfer learning
- the larger $\alpha$ means the larger weight is allocated on the CF task
- if a large-value $\alpha$ leads to the optimal estimation performance, such feature takes effect on the transfer learning
- if a low-value $\alpha$ leads to the optimal estimation performance, such feature has no effect on transfer learning
- append various features in the first setting
- 1st
- Compared Models
- Base
- only CTR task
- DisJoint
- train the CF task model and next train the CTR task
with the trained parameters in CF task fixed
- train the CF task model and next train the CTR task
- DisJointLR
- proposed in another paper
- Joint
- their proposed model
- Base
- Result
- 1st
- Joint consistently outperforms Base and DisJoint on both AUC and RMSE
- demonstrates the effectiveness of tranfer learning
- Joint still outperforms Base and DisJoint in $\alpha=0$
(the CF side model $\mathbf{w}^{c}$ does not learn)- different prior of $\mathbf{w}^{r}$ and $\mathbf{V}^{r}$
- both AUC and RMSE of Joint are 0.5 in $\alpha=1$
- Joint consistently outperforms Base and DisJoint on both AUC and RMSE
- 2nd
- the user browsing hour and ad slot position
are the most valuable features that help transfer learning - the user screen size does not bring any transfer value
- the basic user_cookie and domain provide an overall positive value
of transfer learning
- the user browsing hour and ad slot position
- 1st