Started restoring some old images & posts; changed themes to notepadium

2020-02-02 14:02:10 -05:00
parent 94f678534a
commit 7f98cee1da
84 changed files with 126 additions and 71 deletions
@@ -852,7 +852,7 @@ C & =M^\top M \\
 D &= \left(M^\top U - (M^\top U)^\top\right) /\ \textrm{max}(1, M^\top M)
 \end{align}
 $$
-</pre>
+</div>

 where $/$ is Hadamard (i.e. elementwise) division, and $\textrm{max}$ is elementwise maximum with 1.  Then, the below gives the prediction for how user $u$ will rate movie $j$:

@@ -860,7 +860,7 @@ where $/$ is Hadamard (i.e. elementwise) division, and $\textrm{max}$ is element
 $$
 P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)}
 $$
-</pre>
+</div>

 $D_j$ and $C_j$ are row $j$ of $D$ and $C$, respectively. $M_u$ and $U_u$ are column $u$ of $M$ and $U$, respectively. $\odot$ is elementwise multiplication.

@@ -891,7 +891,7 @@ S_{j,i}(\chi)} u_j - u_i = \frac{1}{card(S_{j,i}(\chi))}\left(\sum_{u
 \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i\right)
 \end{split}
 $$
-</pre>
+</div>

 where:

@@ -930,7 +930,7 @@ matrix multiplication:

 <div>
 $$C=M^\top M$$
-</pre>
+</div>

 since $C\_{i,j}=card(S\_{j,i}(\chi))$ is the dot product of row $i$ of $M^T$ - which is column
 $i$ of $M$ - and column $j$ of $M$.
@@ -940,7 +940,7 @@ We still need the other half:

 <div>
 $$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$
-</pre>
+</div>

 We can apply a similar trick here.  Consider first what $\sum\_{u \in
 S\_{j,i}(\chi)} u\_j$ means: It is the sum of only those ratings of
@@ -958,7 +958,7 @@ $M\_j$ (consider the definition of $M\_j$) computes this, and so:

 <div>
 $$\sum_{u \in S_{j,i}(\chi)} u_j = M_i \cdot U_j$$
-</pre>
+</div>

 and as with $C$, since we want every pairwise dot product, this summation just
 equals element $(i,j)$ of $M^\top U$. The other half of the summation,
@@ -967,13 +967,13 @@ the transpose of this matrix:

 <div>
 $$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i = M^\top U - (M^\top U)^\top = M^\top U - U^\top M$$
-</pre>
+</div>

 So, finally, we can compute an entire deviation matrix at once like:

 <div>
 $$D = \left(M^\top U - (M^\top U)^\top\right) /\ M^\top M$$
-</pre>
+</div>

 where $/$ is Hadamard (i.e. elementwise) division, and $D\_{j,i} = \textrm{dev}\_{j,i}$.

@@ -987,7 +987,7 @@ Finally, the paper gives the formula to predict how user $u$ will rate movie $j$
 $$
 P(u)_j = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(\textrm{dev}_{j,i}+u_i\right) = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(D_{j,i} + U_{u,j} \right)
 $$
-</pre>
+</div>

 where $R\_j = \{i | i \in S(u), i \ne j, card(S\_{j,i}(\chi)) > 0\}$, and $S(u)$ is the set of movies that user $u$ has rated. To unpack the paper's somewhat dense notation, the summation is over every movie $i$ that user $u$ rated and that at least one other user rated, except movie $j$.

@@ -995,7 +995,7 @@ We can apply the usual trick yet one more time with a little effort. The summati

 <div>
 $$P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)}$$
-</pre>
+</div>

 #### 5.2.2.4. Approximation

@@ -1003,7 +1003,7 @@ The paper also gives a formula that is a suitable approximation for larger data

 <div>
 $$p^{S1}(u)_j = \bar{u} + \frac{1}{card(R_j)}\sum_{i\in R_j} \textrm{dev}_{j,i}$$
-</pre>
+</div>

 where $\bar{u}$ is user $u$'s average rating. This doesn't change the formula much; we can compute $\bar{u}$ simply as column means of $U$.

@@ -1169,7 +1169,7 @@ In that sense, $P$ and $Q$ give us a model in which ratings are an interaction b

 <div>
 $$\hat{r}_{ui}=q_i^\top p_u$$
-</pre>
+</div>

 However, some things aren't really interactions. Some movies are just (per the ratings) overall better or worse. Some users just tend to rate everything higher or lower. We need some sort of bias built into the model to comprehend this.

@@ -1177,7 +1177,7 @@ Let's call $b_i$ the bias for movie $i$, $b_u$ the bias for user $u$, and $\mu$

 <div>
 $$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$
-</pre>
+</div>

 This is the basic model we'll implement, and the same one described in the references at the top.

@@ -1187,7 +1187,7 @@ More formally, the prediction model is:

 <div>
 $$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$
-</pre>
+</div>

 where:

@@ -1215,7 +1215,7 @@ $$
 \frac{\partial E}{\partial b_i} &= 2 \sum_{r_{ui}} \left(\lambda b_i + r_{ui} - \hat{r}_{ui}\right)
 \end{split}
 $$
-</pre>
+</div>

 Gradient with respect to $p_u$ proceeds similarly:

@@ -1229,7 +1229,7 @@ p_u}q_i^\top p_u \right) + 2 \lambda p_u \\
 \frac{\partial E}{\partial p_u} &= 2 \sum_{r_{ui}} \lambda p_u - \left(r_{ui} - \hat{r}_{ui}\right)q_i^\top
 \end{split}
 $$
-</pre>
+</div>

 Gradient with respect to $b\_u$ is identical form to $b\_i$, and gradient with respect to $q\_i$ is identical form to $p\_u$, except that the variables switch places.  The full gradients then have the standard form for gradient descent, i.e. a summation of a gradient term for each individual data point, so they turn easily into update rules for each parameter (which match the ones in the Surprise link) after absorbing the leading 2 into learning rate $\gamma$ and separating out the summation over each data point. That's given below, with $e\_{ui}=r\_{ui} - \hat{r}\_{ui}$:

@@ -1242,7 +1242,7 @@ $$
 \frac{\partial E}{\partial q_i} &= 2 \sum_{r_{ui}} \lambda q_i - e_{ui}p_u^\top\ \ \ &\longrightarrow q_i' &= q_i - \gamma\frac{\partial E}{\partial q_i} &= q_i + \gamma\left(e_{ui}p_u - \lambda q_i \right) \\
 \end{split}
 $$
-</pre>
+</div>

 The code below is a direct implementation of this by simply iteratively applying the above equations for each data point - in other words, stochastic gradient descent.