Fix some MathJax annoyances

This commit is contained in:
Chris Hodapp 2020-04-30 17:52:44 -04:00
parent 64b3ddb238
commit ca556b1243
3 changed files with 186 additions and 126 deletions

View File

@ -171,7 +171,7 @@ image.
Here's another comparison, this time a 1:1 crop from the center of an Here's another comparison, this time a 1:1 crop from the center of an
image (shot at 40mm with [this lens][12-40mm], whose Amazon price image (shot at 40mm with [this lens][12-40mm], whose Amazon price
mysteriously is now <span>$</span>146 instead of the <span>$</span>23 mysteriously is now $146 instead of the $23
I actually paid). Click the preview for a lossless PNG view, as JPEG I actually paid). Click the preview for a lossless PNG view, as JPEG
might eat some of the finer details, or [here][leaves-full] for the might eat some of the finer details, or [here][leaves-full] for the
full JPEG file (including raw, if you want to look around). full JPEG file (including raw, if you want to look around).

View File

@ -85,6 +85,7 @@ Below is just to inspect that data appears to be okay:
ml.info() ml.info()
{{< / highlight >}} {{< / highlight >}}
{{< rawhtml >}}
<pre class="result"> <pre class="result">
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000263 entries, 0 to 20000262 RangeIndex: 20000263 entries, 0 to 20000262
@ -96,7 +97,7 @@ ml.info()
dtypes: datetime64[ns](1), float32(1), int32(2) dtypes: datetime64[ns](1), float32(1), int32(2)
memory usage: 381.5 MB memory usage: 381.5 MB
</pre> </pre>
{{< /rawhtml >}}
{{<highlight python>}} {{<highlight python>}}
ml.describe() ml.describe()
@ -106,7 +107,6 @@ ml.describe()
<pre class="result">
| user_id | movie_id | rating | user_id | movie_id | rating
|---------|----------|------- |---------|----------|-------
count|2.000026e+07|2.000026e+07|2.000026e+07 count|2.000026e+07|2.000026e+07|2.000026e+07
@ -117,7 +117,6 @@ min|1.000000e+00|1.000000e+00|5.000000e-01
50%|6.914100e+04|2.167000e+03|3.500000e+00 50%|6.914100e+04|2.167000e+03|3.500000e+00
75%|1.036370e+05|4.770000e+03|4.000000e+00 75%|1.036370e+05|4.770000e+03|4.000000e+00
max|1.384930e+05|1.312620e+05|5.000000e+00 max|1.384930e+05|1.312620e+05|5.000000e+00
</pre>
@ -131,7 +130,6 @@ ml[:10]
<pre class="result">
| user_id | movie_id | rating | time | user_id | movie_id | rating | time
|--------|---------|-------|----- |--------|---------|-------|-----
0|1|2|3.5|2005-04-02 23:53:47 0|1|2|3.5|2005-04-02 23:53:47
@ -144,7 +142,6 @@ ml[:10]
7|1|223|4.0|2005-04-02 23:46:13 7|1|223|4.0|2005-04-02 23:46:13
8|1|253|4.0|2005-04-02 23:35:40 8|1|253|4.0|2005-04-02 23:35:40
9|1|260|4.0|2005-04-02 23:33:46 9|1|260|4.0|2005-04-02 23:33:46
</pre>
@ -159,12 +156,13 @@ max_user, max_movie, max_user * max_movie
{{< rawhtml >}}
<pre class="result"> <pre class="result">
(138494, 131263, 18179137922) (138494, 131263, 18179137922)
</pre> </pre>
{{< /rawhtml >}}
Computing what percent we have of all 'possible' ratings (i.e. every single movie & every single user), this data is rather sparse: Computing what percent we have of all 'possible' ratings (i.e. every single movie & every single user), this data is rather sparse:
@ -174,12 +172,13 @@ Computing what percent we have of all 'possible' ratings (i.e. every single movi
print("%.2f%%" % (100 * ml.shape[0] / (max_user * max_movie))) print("%.2f%%" % (100 * ml.shape[0] / (max_user * max_movie)))
{{< / highlight >}} {{< / highlight >}}
{{< rawhtml >}}
<pre class="result"> <pre class="result">
0.11% 0.11%
</pre> </pre>
{{< /rawhtml >}}
## 3.1. Aggregation ## 3.1. Aggregation
@ -214,7 +213,6 @@ movie_stats.sort_values("num_ratings", ascending=False)[:25]
<pre class="result">
| movie_title | num_ratings | avg_rating | movie_id | | | | movie_title | num_ratings | avg_rating | movie_id | | |
|------------|------------|-----------|---------|-|-|- |------------|------------|-----------|---------|-|-|-
296|Pulp Fiction (1994)|67310.0|4.174231 296|Pulp Fiction (1994)|67310.0|4.174231
@ -242,7 +240,6 @@ movie_stats.sort_values("num_ratings", ascending=False)[:25]
608|Fargo (1996)|43272.0|4.112359 608|Fargo (1996)|43272.0|4.112359
47|Seven (a.k.a. Se7en) (1995)|43249.0|4.053493 47|Seven (a.k.a. Se7en) (1995)|43249.0|4.053493
380|True Lies (1994)|43159.0|3.491149 380|True Lies (1994)|43159.0|3.491149
</pre>
@ -267,9 +264,9 @@ examples of this, check out section 11.3.2 in [MMDS](http://www.mmds.org/).)
In a utility matrix, each row represents one user, each column represents In a utility matrix, each row represents one user, each column represents
one item (a movie, in our case), and each element represents a user's one item (a movie, in our case), and each element represents a user's
rating of an item. If we have $n$ users and $m$ movies, then this is a rating of an item. If we have \\(n\\) users and \\(m\\) movies, then this is a
$n \times m$ matrix $U$ for which $U_{k,i}$ is user $k$'s rating for \\(n \times m\\) matrix \\(U\\) for which \\(U_{k,i}\\) is user \\(k\\)'s rating for
movie $i$ - assuming we've numbered our users and our movies. movie \\(i\\) - assuming we've numbered our users and our movies.
Users have typically rated only a fraction of movies, and so most of Users have typically rated only a fraction of movies, and so most of
the elements of this matrix are unknown. Algorithms represent this the elements of this matrix are unknown. Algorithms represent this
@ -315,13 +312,14 @@ ml_mat_train
{{< rawhtml >}}
<pre class="result"> <pre class="result">
<138494x131263 sparse matrix of type '<class 'numpy.float32'>' <138494x131263 sparse matrix of type '<class 'numpy.float32'>'
with 15000197 stored elements in Compressed Sparse Column format> with 15000197 stored elements in Compressed Sparse Column format>
</pre> </pre>
{{< /rawhtml >}}
To demonstrate that the matrix and dataframe have the same data: To demonstrate that the matrix and dataframe have the same data:
@ -335,7 +333,6 @@ ml_train[:10]
<pre class="result">
| user_id | movie_id | rating | time | user_id | movie_id | rating | time
|--------|---------|-------|----- |--------|---------|-------|-----
13746918|94976|7371|4.5|2009-11-04 05:51:26 13746918|94976|7371|4.5|2009-11-04 05:51:26
@ -348,7 +345,6 @@ ml_train[:10]
15311014|105846|4226|4.5|2004-07-30 18:12:26 15311014|105846|4226|4.5|2004-07-30 18:12:26
8514776|58812|1285|4.0|2000-04-24 20:39:46 8514776|58812|1285|4.0|2000-04-24 20:39:46
3802643|25919|3275|2.5|2010-06-18 00:48:40 3802643|25919|3275|2.5|2010-06-18 00:48:40
</pre>
@ -361,12 +357,13 @@ list(ml_train.iloc[:10].rating)
{{< rawhtml >}}
<pre class="result"> <pre class="result">
[4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5] [4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5]
</pre> </pre>
{{< /rawhtml >}}
@ -379,12 +376,13 @@ movie_ids = list(ml_train.iloc[:10].movie_id)
{{< rawhtml >}}
<pre class="result"> <pre class="result">
[4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5] [4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5]
</pre> </pre>
{{< /rawhtml >}}
Okay, enough of that; we can begin with some actual predictions. Okay, enough of that; we can begin with some actual predictions.
@ -457,7 +455,6 @@ names.merge(ml_train[ml_train.user_id == target_user], right_on="movie_id", left
<pre class="result">
| movie_title | user_id | movie_id | rating | time | movie_title | user_id | movie_id | rating | time
|------------|--------|---------|-------|----- |------------|--------|---------|-------|-----
4229884|Jumanji (1995)|28812|2|5.0|1996-09-23 02:08:39 4229884|Jumanji (1995)|28812|2|5.0|1996-09-23 02:08:39
@ -471,7 +468,6 @@ names.merge(ml_train[ml_train.user_id == target_user], right_on="movie_id", left
4229957|Independence Day (a.k.a. ID4) (1996)|28812|780|5.0|1996-09-23 02:09:02 4229957|Independence Day (a.k.a. ID4) (1996)|28812|780|5.0|1996-09-23 02:09:02
4229959|Phenomenon (1996)|28812|802|5.0|1996-09-23 02:09:02 4229959|Phenomenon (1996)|28812|802|5.0|1996-09-23 02:09:02
4229960|Die Hard (1988)|28812|1036|5.0|1996-09-23 02:09:02 4229960|Die Hard (1988)|28812|1036|5.0|1996-09-23 02:09:02
</pre>
@ -488,12 +484,13 @@ names[names.index == target_movie]
{{< rawhtml >}}
<pre class="result"> <pre class="result">
| movie_title | movie_id | | movie_title | movie_id |
|------------|---------|- |------------|---------|-
586|Home Alone (1990) 586|Home Alone (1990)
</pre> </pre>
{{< /rawhtml >}}
@ -513,7 +510,6 @@ users_df
<pre class="result">
| movie_id_x | user_id | rating_x | rating_y | movie_id_x | user_id | rating_x | rating_y
|-----------|--------|---------|--------- |-----------|--------|---------|---------
0|329|17593|3.0|4.0 0|329|17593|3.0|4.0
@ -527,8 +523,6 @@ users_df
522688|2|126271|3.0|4.0 522688|2|126271|3.0|4.0
522689|595|82760|2.0|4.0 522689|595|82760|2.0|4.0
522690|595|18306|4.5|5.0 522690|595|18306|4.5|5.0
</pre>
@ -544,7 +538,6 @@ users_df
<pre class="result">
| movie_id_x | user_id | rating_x | rating_y | rating_dev | movie_id_x | user_id | rating_x | rating_y | rating_dev
|-----------|--------|---------|---------|----------- |-----------|--------|---------|---------|-----------
0|329|17593|3.0|4.0|1.0 0|329|17593|3.0|4.0|1.0
@ -558,7 +551,6 @@ users_df
522688|2|126271|3.0|4.0|1.0 522688|2|126271|3.0|4.0|1.0
522689|595|82760|2.0|4.0|2.0 522689|595|82760|2.0|4.0|2.0
522690|595|18306|4.5|5.0|0.5 522690|595|18306|4.5|5.0|0.5
</pre>
@ -574,9 +566,6 @@ names.join(rating_dev, how="inner").sort_values("rating_dev")
<pre class="result">
| movie_title | rating_dev | movie_title | rating_dev
|------------|----------- |------------|-----------
318|Shawshank Redemption, The (1994)|-1.391784 318|Shawshank Redemption, The (1994)|-1.391784
@ -600,8 +589,6 @@ names.join(rating_dev, how="inner").sort_values("rating_dev")
173|Judge Dredd (1995)|0.518570 173|Judge Dredd (1995)|0.518570
19|Ace Ventura: When Nature Calls (1995)|0.530155 19|Ace Ventura: When Nature Calls (1995)|0.530155
160|Congo (1995)|0.559034 160|Congo (1995)|0.559034
</pre>
@ -620,8 +607,6 @@ df.join(names, on="movie_id").sort_values("movie_title")
<pre class="result">
| user_id | movie_id | rating | rating_adj | movie_title | user_id | movie_id | rating | rating_adj | movie_title
|--------|---------|-------|-----------|------------ |--------|---------|-------|-----------|------------
4229920|28812|344|3.0|3.141987|Ace Ventura: Pet Detective (1994) 4229920|28812|344|3.0|3.141987|Ace Ventura: Pet Detective (1994)
@ -645,7 +630,6 @@ df.join(names, on="movie_id").sort_values("movie_title")
4229892|28812|50|3.0|1.683520|Usual Suspects, The (1995) 4229892|28812|50|3.0|1.683520|Usual Suspects, The (1995)
4229903|28812|208|3.0|3.250881|Waterworld (1995) 4229903|28812|208|3.0|3.250881|Waterworld (1995)
4229919|28812|339|4.0|3.727966|While You Were Sleeping (1995) 4229919|28812|339|4.0|3.727966|While You Were Sleeping (1995)
</pre>
@ -660,12 +644,13 @@ df["rating_adj"].mean()
{{< rawhtml >}}
<pre class="result"> <pre class="result">
4.087520122528076 4.087520122528076
</pre> </pre>
{{< /rawhtml >}}
As mentioned above, we also happen to have the user's actual rating on *Home Alone* in the test set (i.e. we didn't train on it), so we can compare here: As mentioned above, we also happen to have the user's actual rating on *Home Alone* in the test set (i.e. we didn't train on it), so we can compare here:
@ -678,12 +663,13 @@ ml_test[(ml_test.user_id == target_user) & (ml_test.movie_id == target_movie)]["
{{< rawhtml >}}
<pre class="result"> <pre class="result">
4.0 4.0
</pre> </pre>
{{< /rawhtml >}}
That's quite close - though that may just be luck. It's hard to say from one point. That's quite close - though that may just be luck. It's hard to say from one point.
@ -702,7 +688,6 @@ names.join(num_ratings, how="inner").sort_values("num_ratings")
<pre class="result">
| movie_title | num_ratings | movie_title | num_ratings
|------------|------------ |------------|------------
802|Phenomenon (1996)|3147 802|Phenomenon (1996)|3147
@ -726,7 +711,6 @@ names.join(num_ratings, how="inner").sort_values("num_ratings")
593|Silence of the Lambs, The (1991)|12120 593|Silence of the Lambs, The (1991)|12120
480|Jurassic Park (1993)|13546 480|Jurassic Park (1993)|13546
356|Forrest Gump (1994)|13847 356|Forrest Gump (1994)|13847
</pre>
@ -757,7 +741,6 @@ df
<pre class="result">
| user_id | movie_id | rating | rating_adj | num_ratings | rating_weighted | user_id | movie_id | rating | rating_adj | num_ratings | rating_weighted
|--------|---------|-------|-----------|------------|---------------- |--------|---------|-------|-----------|------------|----------------
4229918|28812|329|4.0|3.767164|6365|23978.000326 4229918|28812|329|4.0|3.767164|6365|23978.000326
@ -781,7 +764,6 @@ df
4229912|28812|296|4.0|2.883755|11893|34296.500678 4229912|28812|296|4.0|2.883755|11893|34296.500678
4229884|28812|2|5.0|4.954595|7422|36773.001211 4229884|28812|2|5.0|4.954595|7422|36773.001211
4229953|28812|595|4.0|3.515051|9036|31761.999825 4229953|28812|595|4.0|3.515051|9036|31761.999825
</pre>
@ -794,12 +776,13 @@ df["rating_weighted"].sum() / df["num_ratings"].sum()
{{< rawhtml >}}
<pre class="result"> <pre class="result">
4.02968199025023 4.02968199025023
</pre> </pre>
{{< /rawhtml >}}
It changes the answer, but only very slightly. It changes the answer, but only very slightly.
@ -818,8 +801,9 @@ eyes glaze over, you can probably just skip this section.
### 5.2.1. Short Answer ### 5.2.1. Short Answer
Let $U$ be the utility matrix. Let $M$ be a binary matrix for which $M_{i,j}=1$ if user $i$ rated movie $j$, otherwise 0. Compute the model's matrices with: Let \\(U\\) be the utility matrix. Let \\(M\\) be a binary matrix for which \\(M_{i,j}=1\\) if user \\(i\\) rated movie \\(j\\), otherwise 0. Compute the model's matrices with:
{{< rawhtml >}}
<div> <div>
$$ $$
\begin{align} \begin{align}
@ -828,35 +812,40 @@ D &= \left(M^\top U - (M^\top U)^\top\right) /\ \textrm{max}(1, M^\top M)
\end{align} \end{align}
$$ $$
</div> </div>
{{< /rawhtml >}}
where $/$ is Hadamard (i.e. elementwise) division, and $\textrm{max}$ is elementwise maximum with 1. Then, the below gives the prediction for how user $u$ will rate movie $j$: where \\(/\\) is Hadamard (i.e. elementwise) division, and \\(\textrm{max}\\) is elementwise maximum with 1. Then, the below gives the prediction for how user \\(u\\) will rate movie \\(j\\):
{{< rawhtml >}}
<div> <div>
$$ $$
P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)} P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)}
$$ $$
</div> </div>
{{< /rawhtml >}}
$D_j$ and $C_j$ are row $j$ of $D$ and $C$, respectively. $M_u$ and $U_u$ are column $u$ of $M$ and $U$, respectively. $\odot$ is elementwise multiplication.
\\(D_j\\) and \\(C_j\\) are row \\(j\\) of \\(D\\) and \\(C\\), respectively. \\(M_u\\) and \\(U_u\\) are column \\(u\\) of \\(M\\) and \\(U\\), respectively. \\(\odot\\) is elementwise multiplication.
### 5.2.2. Long Answer ### 5.2.2. Long Answer
First, we need to have our data encoded as an $n \times m$ utility First, we need to have our data encoded as an \\(n \times m\\) utility
matrix (see a [few sections above](#Utility-Matrix) for the definition matrix (see a [few sections above](#Utility-Matrix) for the definition
of *utility matrix*). of *utility matrix*).
As noted, most elements of this matrix are unknown as users have rated As noted, most elements of this matrix are unknown as users have rated
only a fraction of movies. We can represent this with another only a fraction of movies. We can represent this with another
$n \times m$ matrix (specifically a binary matrix), a 'mask' $M$ in \\(n \times m\\) matrix (specifically a binary matrix), a 'mask' \\(M\\) in
which $M_{k,i}$ is 1 if user $k$ supplied a rating for movie $i$, and which \\(M_{k,i}\\) is 1 if user \\(k\\) supplied a rating for movie \\(i\\), and
otherwise 0. otherwise 0.
#### 5.2.2.1. Deviation Matrix #### 5.2.2.1. Deviation Matrix
I mentioned *deviation* above and gave an informal definition of it. I mentioned *deviation* above and gave an informal definition of it.
The paper gaves a formal but rather terse definition below of the The paper gaves a formal but rather terse definition below of the
average deviation of item $i$ with respect to item $j$, and I average deviation of item \\(i\\) with respect to item \\(j\\), and I
then separate out the summation a little: then separate out the summation a little:
{{< rawhtml >}}
<div> <div>
$$ $$
\begin{split} \begin{split}
@ -867,128 +856,154 @@ S_{j,i}(\chi)} u_j - u_i = \frac{1}{card(S_{j,i}(\chi))}\left(\sum_{u
\end{split} \end{split}
$$ $$
</div> </div>
{{< /rawhtml >}}
where: where:
- $u_j$ and $u_i$ mean: user $u$'s ratings for movies $i$ and $j$, respectively - \\(u_j\\) and \\(u_i\\) mean: user \\(u\\)'s ratings for movies \\(i\\) and \\(j\\), respectively
- $u \in S_{j,i}(\chi)$ means: all users $u$ who, in the dataset we're - \\(u \in S_{j,i}(\chi)\\) means: all users \\(u\\) who, in the dataset we're
training on, provided a rating for both movie $i$ and movie $j$ training on, provided a rating for both movie \\(i\\) and movie \\(j\\)
- $card$ is the cardinality of that set, i.e. - \\(card\\) is the cardinality of that set, i.e.
${card(S_{j,i}(\chi))}$ is how many users rated both $i$ and \\({card(S_{j,i}(\chi))}\\) is how many users rated both \\(i\\) and
$j$. \\(j\\).
#### 5.2.2.2. Cardinality/Counts Matrix #### 5.2.2.2. Cardinality/Counts Matrix
Let's start with computing ${card(S_{j,i}(\chi))}$, the number of Let's start with computing \\({card(S_{j,i}(\chi))}\\), the number of
users who rated both movie $i$ and movie $j$. Consider column $i$ of users who rated both movie \\(i\\) and movie \\(j\\). Consider column \\(i\\) of
the mask $M$. For each value in this column, it equals 1 if the the mask \\(M\\). For each value in this column, it equals 1 if the
respective user rated movie $i$, or 0 if they did not. Clearly, respective user rated movie \\(i\\), or 0 if they did not. Clearly,
simply summing up column $i$ would tell us how many users rated movie simply summing up column \\(i\\) would tell us how many users rated movie
$i$, and the same applies to column $j$ for movie $j$. \\(i\\), and the same applies to column \\(j\\) for movie \\(j\\).
Now, suppose we take element-wise logical AND of columns $i$ and $j$. Now, suppose we take element-wise logical AND of columns \\(i\\) and \\(j\\).
The resultant column has a 1 only where both corresponding elements The resultant column has a 1 only where both corresponding elements
were 1 - where a user rated both $i$ and $j$. If we sum up this were 1 - where a user rated both \\(i\\) and \\(j\\). If we sum up this
column, we have exactly the number we need: the number of users who column, we have exactly the number we need: the number of users who
rated both $i$ and $j$. Some might notice that "elementwise logical rated both \\(i\\) and \\(j\\). Some might notice that "elementwise logical
AND" is just "elementwise AND" is just "elementwise
multiplication", thus "sum of elementwise logical AND" is just "sum of multiplication", thus "sum of elementwise logical AND" is just "sum of
elementwise multiplication", which is: dot product. That is, elementwise multiplication", which is: dot product. That is,
${card(S_{j,i}(\chi))}=M_j \cdot M_i$ if we use $M_i$ and $M_j$ for \\({card(S_{j,i}(\chi))}=M_j \cdot M_i\\) if we use \\(M_i\\) and \\(M_j\\) for
columns $i$ and $j$ of $M$. columns \\(i\\) and \\(j\\) of \\(M\\).
However, we'd like to compute deviation as a matrix for all $i$ and However, we'd like to compute deviation as a matrix for all \\(i\\) and
$j$, so we'll likewise need ${card(S_{j,i}(\chi))}$ for every single \\(j\\), so we'll likewise need \\({card(S_{j,i}(\chi))}\\) for every single
combination of $i$ and $j$ - that is, we need a dot product between combination of \\(i\\) and \\(j\\) - that is, we need a dot product between
every single pair of columns from $M$. This is incidentally just every single pair of columns from \\(M\\). This is incidentally just
matrix multiplication: matrix multiplication:
{{< rawhtml >}}
<div> <div>
$$C=M^\top M$$ $$C=M^\top M$$
</div> </div>
{{< /rawhtml >}}
since $C\_{i,j}=card(S\_{j,i}(\chi))$ is the dot product of row $i$ of $M^T$ - which is column
$i$ of $M$ - and column $j$ of $M$.
That was the first half of what we needed for $\textrm{dev}_{j,i}$. since \\(C\_{i,j}=card(S\_{j,i}(\chi))\\) is the dot product of row \\(i\\) of \\(M^T\\) - which is column
\\(i\\) of \\(M\\) - and column \\(j\\) of \\(M\\).
That was the first half of what we needed for \\(\textrm{dev}_{j,i}\\).
We still need the other half: We still need the other half:
{{< rawhtml >}}
<div> <div>
$$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$ $$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$
</div> </div>
{{< /rawhtml >}}
We can apply a similar trick here. Consider first what $\sum\_{u \in We can apply a similar trick here. Consider first what $\sum\_{u \in
S\_{j,i}(\chi)} u\_j$ means: It is the sum of only those ratings of S\_{j,i}(\chi)} u\_j$ means: It is the sum of only those ratings of
movie $j$ that were done by a user who also rated movie $i$. movie \\(j\\) that were done by a user who also rated movie \\(i\\).
Likewise, $\sum\_{u \in S\_{j,i}(\chi)} u\_i$ is the sum of only those Likewise, \\(\sum\_{u \in S\_{j,i}(\chi)} u\_i\\) is the sum of only those
ratings of movie $i$ that were done by a user who also rated movie ratings of movie \\(i\\) that were done by a user who also rated movie
$j$. (Note the symmetry: it's over the same set of users, because \\(j\\). (Note the symmetry: it's over the same set of users, because
it's always the users who rated both $i$ and $j$.) it's always the users who rated both \\(i\\) and \\(j\\).)
Let's call the utility matrix $U$, and use $U\_i$ and $U\_j$ to refer Let's call the utility matrix \\(U\\), and use \\(U\_i\\) and \\(U\_j\\) to refer
to columns $i$ and $j$ of it (just as in $M$). $U\_i$ has each rating to columns \\(i\\) and \\(j\\) of it (just as in \\(M\\)). \\(U\_i\\) has each rating
of movie $i$, but we want only the sum of the ratings done by a user of movie \\(i\\), but we want only the sum of the ratings done by a user
who also rated movie $j$. Like before, the dot product of $U\_i$ and who also rated movie \\(j\\). Like before, the dot product of \\(U\_i\\) and
$M\_j$ (consider the definition of $M\_j$) computes this, and so: \\(M\_j\\) (consider the definition of \\(M\_j\\)) computes this, and so:
{{< rawhtml >}}
<div> <div>
$$\sum_{u \in S_{j,i}(\chi)} u_j = M_i \cdot U_j$$ $$\sum_{u \in S_{j,i}(\chi)} u_j = M_i \cdot U_j$$
</div> </div>
{{< /rawhtml >}}
and as with $C$, since we want every pairwise dot product, this summation just
equals element $(i,j)$ of $M^\top U$. The other half of the summation, and as with \\(C\\), since we want every pairwise dot product, this summation just
$\sum\_{u \in S_{j,i}(\chi)} u\_i$, equals $M\_j \cdot U\_i$, which is just equals element \\((i,j)\\) of \\(M^\top U\\). The other half of the summation,
\\(\sum\_{u \in S_{j,i}(\chi)} u\_i\\), equals \\(M\_j \cdot U\_i\\), which is just
the transpose of this matrix: the transpose of this matrix:
{{< rawhtml >}}
<div> <div>
$$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i = M^\top U - (M^\top U)^\top = M^\top U - U^\top M$$ $$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i = M^\top U - (M^\top U)^\top = M^\top U - U^\top M$$
</div> </div>
{{< /rawhtml >}}
So, finally, we can compute an entire deviation matrix at once like: So, finally, we can compute an entire deviation matrix at once like:
{{< rawhtml >}}
<div> <div>
$$D = \left(M^\top U - (M^\top U)^\top\right) /\ M^\top M$$ $$D = \left(M^\top U - (M^\top U)^\top\right) /\ M^\top M$$
</div> </div>
{{< /rawhtml >}}
where $/$ is Hadamard (i.e. elementwise) division, and $D\_{j,i} = \textrm{dev}\_{j,i}$.
where \\(/\\) is Hadamard (i.e. elementwise) division, and \\(D\_{j,i} = \textrm{dev}\_{j,i}\\).
By convention and to avoid division by zero, we treat the case where the denominator and numerator are both 0 as just equaling 0. This comes up only where no ratings exist for there to be a deviation - hence the `np.maximum(1, counts)` below. By convention and to avoid division by zero, we treat the case where the denominator and numerator are both 0 as just equaling 0. This comes up only where no ratings exist for there to be a deviation - hence the `np.maximum(1, counts)` below.
#### 5.2.2.3. Prediction #### 5.2.2.3. Prediction
Finally, the paper gives the formula to predict how user $u$ will rate movie $j$, and I write this in terms of our matrices: Finally, the paper gives the formula to predict how user \\(u\\) will rate movie \\(j\\), and I write this in terms of our matrices:
{{< rawhtml >}}
<div> <div>
$$ $$
P(u)_j = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(\textrm{dev}_{j,i}+u_i\right) = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(D_{j,i} + U_{u,j} \right) P(u)_j = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(\textrm{dev}_{j,i}+u_i\right) = \frac{1}{card(R_j)}\sum_{i\in R_j} \left(D_{j,i} + U_{u,j} \right)
$$ $$
</div> </div>
{{< /rawhtml >}}
where $R\_j = \{i | i \in S(u), i \ne j, card(S\_{j,i}(\chi)) > 0\}$, and $S(u)$ is the set of movies that user $u$ has rated. To unpack the paper's somewhat dense notation, the summation is over every movie $i$ that user $u$ rated and that at least one other user rated, except movie $j$.
We can apply the usual trick yet one more time with a little effort. The summation already goes across a row of $U$ and $D$ (that is, user $u$ is held constant), but covers only certain elements. This is equivalent to a dot product with a mask representing $R\_j$. $M\_u$, row $u$ of the mask, already represents $S(u)$, and $R\_j$ is just $S(u)$ with some more elements removed - which we can mostly represent with $M\_u \odot (C\_j > 0)$ where $\odot$ is elementwise product (i.e. Hadamard), $C\_j$ is column/row $j$ of $C$ (it's symmetric), and where we abuse some notation to say that $C\_j > 0$ is a binary vector. Likewise, $D\_j$ is row $j$ of $D$. The one correction still required is that we subtract $u\_j$ to cover for the $i \ne j$ part of $R\_j$. To abuse some more notation: where \\(R\_j = \{i | i \in S(u), i \ne j, card(S\_{j,i}(\chi)) > 0\}\\), and \\(S(u)\\) is the set of movies that user \\(u\\) has rated. To unpack the paper's somewhat dense notation, the summation is over every movie \\(i\\) that user \\(u\\) rated and that at least one other user rated, except movie \\(j\\).
We can apply the usual trick yet one more time with a little effort. The summation already goes across a row of \\(U\\) and \\(D\\) (that is, user \\(u\\) is held constant), but covers only certain elements. This is equivalent to a dot product with a mask representing \\(R\_j\\). \\(M\_u\\), row \\(u\\) of the mask, already represents \\(S(u)\\), and \\(R\_j\\) is just \\(S(u)\\) with some more elements removed - which we can mostly represent with \\(M\_u \odot (C\_j > 0)\\) where \\(\odot\\) is elementwise product (i.e. Hadamard), \\(C\_j\\) is column/row \\(j\\) of \\(C\\) (it's symmetric), and where we abuse some notation to say that \\(C\_j > 0\\) is a binary vector. Likewise, \\(D\_j\\) is row \\(j\\) of \\(D\\). The one correction still required is that we subtract \\(u\_j\\) to cover for the \\(i \ne j\\) part of \\(R\_j\\). To abuse some more notation:
{{< rawhtml >}}
<div> <div>
$$P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)}$$ $$P(u)_j = \frac{[M_u \odot (C_j > 0)] \cdot (D_j + U_u) - U_{u,j}}{M_u \cdot (C_j > 0)}$$
</div> </div>
{{< /rawhtml >}}
#### 5.2.2.4. Approximation #### 5.2.2.4. Approximation
The paper also gives a formula that is a suitable approximation for larger data sets: The paper also gives a formula that is a suitable approximation for larger data sets:
{{< rawhtml >}}
<div> <div>
$$p^{S1}(u)_j = \bar{u} + \frac{1}{card(R_j)}\sum_{i\in R_j} \textrm{dev}_{j,i}$$ $$p^{S1}(u)_j = \bar{u} + \frac{1}{card(R_j)}\sum_{i\in R_j} \textrm{dev}_{j,i}$$
</div> </div>
{{< /rawhtml >}}
where $\bar{u}$ is user $u$'s average rating. This doesn't change the formula much; we can compute $\bar{u}$ simply as column means of $U$.
where \\(\bar{u}\\) is user \\(u\\)'s average rating. This doesn't change the formula much; we can compute \\(\bar{u}\\) simply as column means of \\(U\\).
## 5.3. Implementation ## 5.3. Implementation
I left out another detail above, which is that the above can't really be implemented exactly as written on this dataset (though, it works fine for the much smaller [ml-100k](https://grouplens.org/datasets/movielens/100k/)) because it uses entirely too much memory. I left out another detail above, which is that the above can't really be implemented exactly as written on this dataset (though, it works fine for the much smaller [ml-100k](https://grouplens.org/datasets/movielens/100k/)) because it uses entirely too much memory.
While $U$ and $M$ can be sparse matrices, $C$ and $D$ sort of must be dense matrices, and for this particular dataset they are a bit too large to work with in memory in this form. Some judicious optimization, attention to datatypes, use of $C$ and $D$ being symmetric and skew-symmetric respectively, and care to avoid extra copies could probably work around this - but I don't do that here. While \\(U\\) and \\(M\\) can be sparse matrices, \\(C\\) and \\(D\\) sort of must be dense matrices, and for this particular dataset they are a bit too large to work with in memory in this form. Some judicious optimization, attention to datatypes, use of \\(C\\) and \\(D\\) being symmetric and skew-symmetric respectively, and care to avoid extra copies could probably work around this - but I don't do that here.
However, if we look at the $P(u)_j$ formula above, it refers only to row $j$ of $C$ and $D$ and the formulas for $C$ and $D$ make it easy to compute them by row if needed, or by blocks of rows according to what $u$ and $j$ we need. This is what I do below. However, if we look at the \\(P(u)_j\\) formula above, it refers only to row \\(j\\) of \\(C\\) and \\(D\\) and the formulas for \\(C\\) and \\(D\\) make it easy to compute them by row if needed, or by blocks of rows according to what \\(u\\) and \\(j\\) we need. This is what I do below.
{{<highlight python>}} {{<highlight python>}}
@ -1020,12 +1035,13 @@ To show that it actually gives the same result as above, and that the approximat
{{< rawhtml >}}
<pre class="result"> <pre class="result">
(4.0875210502743862, 4.0875210502743862) (4.0875210502743862, 4.0875210502743862)
</pre> </pre>
{{< /rawhtml >}}
This computes training error on a small part (1%) of the data, since doing it over the entire thing would be horrendously slow: This computes training error on a small part (1%) of the data, since doing it over the entire thing would be horrendously slow:
@ -1102,13 +1118,14 @@ print("Training error: MAE={:.3f}, RMSE={:.3f}".format(err_mae_train, err_rms_t
print("Testing error: MAE={:.3f}, RMSE={:.3f}".format(err_mae_test, err_rms_test)) print("Testing error: MAE={:.3f}, RMSE={:.3f}".format(err_mae_test, err_rms_test))
{{< / highlight >}} {{< / highlight >}}
{{< rawhtml >}}
<pre class="result"> <pre class="result">
Training error: MAE=0.640, RMSE=0.834 Training error: MAE=0.640, RMSE=0.834
Testing error: MAE=0.657, RMSE=0.856 Testing error: MAE=0.657, RMSE=0.856
</pre> </pre>
{{< /rawhtml >}}
# 6. "SVD" algorithm # 6. "SVD" algorithm
@ -1126,33 +1143,39 @@ References on this model are in a few different places:
## 6.2. Motivation ## 6.2. Motivation
We again start from the $n \times m$ utility matrix $U$. As $m$ and $n$ tend to be quite large, $U$ has a lot of degrees of freedom. If we want to be able to predict anything at all, we must assume some fairly strict constraints - and one form of this is assuming that we don't *really* have that many degrees of freedom, and that there are actually some much smaller latent factors controlling everything. We again start from the \\(n \times m\\) utility matrix \\(U\\). As \\(m\\) and \\(n\\) tend to be quite large, \\(U\\) has a lot of degrees of freedom. If we want to be able to predict anything at all, we must assume some fairly strict constraints - and one form of this is assuming that we don't *really* have that many degrees of freedom, and that there are actually some much smaller latent factors controlling everything.
One common form of this is assuming that the rank of matrix $U$ - its *actual* dimensionality - is much lower. Let's say its rank is $r$. We could then represent $U$ as the matrix product of smaller matrices, i.e. $U=P^\top Q$ where $P$ is a $r \times n$ matrix and $Q$ is $r \times m$. One common form of this is assuming that the rank of matrix \\(U\\) - its *actual* dimensionality - is much lower. Let's say its rank is \\(r\\). We could then represent \\(U\\) as the matrix product of smaller matrices, i.e. \\(U=P^\top Q\\) where \\(P\\) is a \\(r \times n\\) matrix and \\(Q\\) is \\(r \times m\\).
If we can find dense matrices $P$ and $Q$ such that $P^\top Q$ equals, or approximately equals, $U$ for the corresponding elements of $U$ that are known, then $P^\top Q$ also gives us predictions for the unknown elements of $U$ - the ratings we don't know, but want to predict. Of course, $r$ must be small enough here to prevent overfitting. If we can find dense matrices \\(P\\) and \\(Q\\) such that \\(P^\top Q\\) equals, or approximately equals, \\(U\\) for the corresponding elements of \\(U\\) that are known, then \\(P^\top Q\\) also gives us predictions for the unknown elements of \\(U\\) - the ratings we don't know, but want to predict. Of course, \\(r\\) must be small enough here to prevent overfitting.
(What we're talking about above is [matrix completion](https://en.wikipedia.org/wiki/Matrix_completion) using low-rank [matrix decomposition/factorization](https://en.wikipedia.org/wiki/Matrix_decomposition). These are both subjects unto themselves. See the [matrix-completion-whirlwind](https://github.com/asberk/matrix-completion-whirlwind/blob/master/matrix_completion_master.ipynb) notebook for a much better explanation on that subject, and an implementation of [altMinSense/altMinComplete](https://arxiv.org/pdf/1212.0467).) (What we're talking about above is [matrix completion](https://en.wikipedia.org/wiki/Matrix_completion) using low-rank [matrix decomposition/factorization](https://en.wikipedia.org/wiki/Matrix_decomposition). These are both subjects unto themselves. See the [matrix-completion-whirlwind](https://github.com/asberk/matrix-completion-whirlwind/blob/master/matrix_completion_master.ipynb) notebook for a much better explanation on that subject, and an implementation of [altMinSense/altMinComplete](https://arxiv.org/pdf/1212.0467).)
Ordinarily, we'd use something like SVD directly if we wanted to find matrices $P$ and $Q$ (or if we wanted to do any of about 15,000 other things, since SVD is basically magical matrix fairy dust). We can't really do that here due to the fact that large parts of $U$ are unknown, and in some cases because $U$ is just too large. One approach for working around this is the UV-decomposition algorithm that section 9.4 of [MMDS](http://www.mmds.org/) describes. Ordinarily, we'd use something like SVD directly if we wanted to find matrices \\(P\\) and \\(Q\\) (or if we wanted to do any of about 15,000 other things, since SVD is basically magical matrix fairy dust). We can't really do that here due to the fact that large parts of \\(U\\) are unknown, and in some cases because \\(U\\) is just too large. One approach for working around this is the UV-decomposition algorithm that section 9.4 of [MMDS](http://www.mmds.org/) describes.
What we'll do below is a similar approach to UV decomposition that follows a common method: define a model, define an error function we want to minimize, find that error function's gradient with respect to the model's parameters, and then use gradient-descent to minimize that error function by nudging the parameters in the direction that decreases the error, i.e. the negative of their gradient. (More on this later.) What we'll do below is a similar approach to UV decomposition that follows a common method: define a model, define an error function we want to minimize, find that error function's gradient with respect to the model's parameters, and then use gradient-descent to minimize that error function by nudging the parameters in the direction that decreases the error, i.e. the negative of their gradient. (More on this later.)
Matrices $Q$ and $P$ have some other neat properties too. Note that $Q$ has $m$ columns, each one $r$-dimensional - one column per movie. $P$ has $n$ columns, each one $r$-dimensional - one column per user. In effect, we can look at each column $i$ of $Q$ as the coordinates of movie $i$ in "concept space" or "feature space" - a new $r$-dimensional space where each axis corresponds to something that seems to explain ratings. Likewise, we can look at each column $u$ of $P$ as how much user $u$ "belongs" to each axis in concept space. "Feature vectors" is a common term to see. Matrices \\(Q\\) and \\(P\\) have some other neat properties too. Note that \\(Q\\) has \\(m\\) columns, each one \\(r\\)-dimensional - one column per movie. \\(P\\) has \\(n\\) columns, each one \\(r\\)-dimensional - one column per user. In effect, we can look at each column \\(i\\) of \\(Q\\) as the coordinates of movie \\(i\\) in "concept space" or "feature space" - a new \\(r\\)-dimensional space where each axis corresponds to something that seems to explain ratings. Likewise, we can look at each column \\(u\\) of \\(P\\) as how much user \\(u\\) "belongs" to each axis in concept space. "Feature vectors" is a common term to see.
In that sense, $P$ and $Q$ give us a model in which ratings are an interaction between properties of a movie, and a user's preferences. If we're using $U=P^\top Q$ as our model, then every element of $U$ is just the dot product of the feature vectors of the respective movie and user. That is, if $p_u$ is column $u$ of $P$ and $q_i$ is column $i$ of $Q$: In that sense, \\(P\\) and \\(Q\\) give us a model in which ratings are an interaction between properties of a movie, and a user's preferences. If we're using \\(U=P^\top Q\\) as our model, then every element of \\(U\\) is just the dot product of the feature vectors of the respective movie and user. That is, if \\(p_u\\) is column \\(u\\) of \\(P\\) and \\(q_i\\) is column \\(i\\) of \\(Q\\):
{{< rawhtml >}}
<div> <div>
$$\hat{r}_{ui}=q_i^\top p_u$$ $$\hat{r}_{ui}=q_i^\top p_u$$
</div> </div>
{{< /rawhtml >}}
However, some things aren't really interactions. Some movies are just (per the ratings) overall better or worse. Some users just tend to rate everything higher or lower. We need some sort of bias built into the model to comprehend this. However, some things aren't really interactions. Some movies are just (per the ratings) overall better or worse. Some users just tend to rate everything higher or lower. We need some sort of bias built into the model to comprehend this.
Let's call $b_i$ the bias for movie $i$, $b_u$ the bias for user $u$, and $\mu$ the overall average rating. We can just add these into the model: Let's call \\(b_i\\) the bias for movie \\(i\\), \\(b_u\\) the bias for user \\(u\\), and \\(\mu\\) the overall average rating. We can just add these into the model:
{{< rawhtml >}}
<div> <div>
$$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$ $$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$
</div> </div>
{{< /rawhtml >}}
This is the basic model we'll implement, and the same one described in the references at the top. This is the basic model we'll implement, and the same one described in the references at the top.
@ -1160,28 +1183,32 @@ This is the basic model we'll implement, and the same one described in the refer
More formally, the prediction model is: More formally, the prediction model is:
{{< rawhtml >}}
<div> <div>
$$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$ $$\hat{r}_{ui}=\mu + b_i + b_u + q_i^\top p_u$$
</div> </div>
{{< /rawhtml >}}
where: where:
- $u$ is a user - \\(u\\) is a user
- $i$ is an item - \\(i\\) is an item
- $\hat{r}_{ui}$ is user $u$'s predicted rating for item $i$ - \\(\hat{r}_{ui}\\) is user \\(u\\)'s predicted rating for item \\(i\\)
- $\mu$ is the overall average rating - \\(\mu\\) is the overall average rating
- our model parameters are: - our model parameters are:
- $b_i$, a per-item deviation for item $i$; - \\(b_i\\), a per-item deviation for item \\(i\\);
- $b_u$, per-user deviation for user $u$ - \\(b_u\\), per-user deviation for user \\(u\\)
- $q_i$ and $p_u$, feature vectors for item $i$ and user $u$, respectively - \\(q_i\\) and \\(p_u\\), feature vectors for item \\(i\\) and user \\(u\\), respectively
The error function that we need to minimize is just sum-of-squared error between predicted and actual rating, plus $L\_2$ regularization to prevent the biases and coordinates in "concept space" from becoming too huge: The error function that we need to minimize is just sum-of-squared error between predicted and actual rating, plus \\(L\_2\\) regularization to prevent the biases and coordinates in "concept space" from becoming too huge:
$$E=\sum\_{r\_{ui} \in R\_{\textrm{train}}} \left(r\_{ui} - \hat{r}\_{ui}\right)^2 + \lambda\left(b\_i^2+b\_u^2 + \lvert\lvert q\_i\rvert\rvert^2 + \lvert\lvert p\_u\rvert\rvert^2\right)$$ $$E=\sum\_{r\_{ui} \in R\_{\textrm{train}}} \left(r\_{ui} - \hat{r}\_{ui}\right)^2 + \lambda\left(b\_i^2+b\_u^2 + \lvert\lvert q\_i\rvert\rvert^2 + \lvert\lvert p\_u\rvert\rvert^2\right)$$
## 6.4. Gradients & Gradient-Descent Updates ## 6.4. Gradients & Gradient-Descent Updates
This error function is easily differentiable with respect to model parameters $b_i$, $b_u$, $q_i$, and $p_u$, so a normal approach for minimizing it is gradient-descent. Finding gradient with respect to $b_i$ is straightforward: This error function is easily differentiable with respect to model parameters \\(b_i\\), \\(b_u\\), \\(q_i\\), and \\(p_u\\), so a normal approach for minimizing it is gradient-descent. Finding gradient with respect to \\(b_i\\) is straightforward:
{{< rawhtml >}}
<div> <div>
$$ $$
\begin{split} \begin{split}
@ -1191,9 +1218,12 @@ $$
\end{split} \end{split}
$$ $$
</div> </div>
{{< /rawhtml >}}
Gradient with respect to $p_u$ proceeds similarly:
Gradient with respect to \\(p_u\\) proceeds similarly:
{{< rawhtml >}}
<div> <div>
$$ $$
\begin{split} \begin{split}
@ -1205,9 +1235,12 @@ p_u}q_i^\top p_u \right) + 2 \lambda p_u \\
\end{split} \end{split}
$$ $$
</div> </div>
{{< /rawhtml >}}
Gradient with respect to $b\_u$ is identical form to $b\_i$, and gradient with respect to $q\_i$ is identical form to $p\_u$, except that the variables switch places. The full gradients then have the standard form for gradient descent, i.e. a summation of a gradient term for each individual data point, so they turn easily into update rules for each parameter (which match the ones in the Surprise link) after absorbing the leading 2 into learning rate $\gamma$ and separating out the summation over each data point. That's given below, with $e\_{ui}=r\_{ui} - \hat{r}\_{ui}$:
Gradient with respect to \\(b\_u\\) is identical form to \\(b\_i\\), and gradient with respect to \\(q\_i\\) is identical form to \\(p\_u\\), except that the variables switch places. The full gradients then have the standard form for gradient descent, i.e. a summation of a gradient term for each individual data point, so they turn easily into update rules for each parameter (which match the ones in the Surprise link) after absorbing the leading 2 into learning rate \\(\gamma\\) and separating out the summation over each data point. That's given below, with \\(e\_{ui}=r\_{ui} - \hat{r}\_{ui}\\):
{{< rawhtml >}}
<div> <div>
$$ $$
\begin{split} \begin{split}
@ -1218,6 +1251,8 @@ $$
\end{split} \end{split}
$$ $$
</div> </div>
{{< /rawhtml >}}
The code below is a direct implementation of this by simply iteratively applying the above equations for each data point - in other words, stochastic gradient descent. The code below is a direct implementation of this by simply iteratively applying the above equations for each data point - in other words, stochastic gradient descent.
@ -1352,6 +1387,7 @@ svd40 = SVDModel(max_movie, max_user, ml["rating"].mean(), num_factors=num_facto
svd40.train(movies_train, users_train, ratings_train, epoch_callback=at_epoch) svd40.train(movies_train, users_train, ratings_train, epoch_callback=at_epoch)
{{< / highlight >}} {{< / highlight >}}
{{< rawhtml >}}
<pre class="result"> <pre class="result">
6982/s 8928/s 10378/s 12877/s 15290/s 11574/s 13230/s 6982/s 8928/s 10378/s 12877/s 15290/s 11574/s 13230/s
@ -1396,7 +1432,7 @@ svd40.train(movies_train, users_train, ratings_train, epoch_callback=at_epoch)
Epoch 20/20; Training: MAE=0.549 RMSE=0.717, Testing: MAE=0.600 RMSE=0.787 Epoch 20/20; Training: MAE=0.549 RMSE=0.717, Testing: MAE=0.600 RMSE=0.787
</pre> </pre>
{{< /rawhtml >}}
{{<highlight python>}} {{<highlight python>}}
@ -1416,6 +1452,7 @@ svd4 = SVDModel(max_movie, max_user, ml["rating"].mean(), 4)
svd4.train(ml_train["movie_id"].values, ml_train["user_id"].values, ml_train["rating"].values, epoch_callback=at_epoch) svd4.train(ml_train["movie_id"].values, ml_train["user_id"].values, ml_train["rating"].values, epoch_callback=at_epoch)
{{< / highlight >}} {{< / highlight >}}
{{< rawhtml >}}
<pre class="result"> <pre class="result">
48199/s 33520/s 16937/s 13842/s 13607/s 15574/s 15431/s 48199/s 33520/s 16937/s 13842/s 13607/s 15574/s 15431/s
@ -1460,7 +1497,7 @@ svd4.train(ml_train["movie_id"].values, ml_train["user_id"].values, ml_train["ra
Epoch 20/20; Training: MAE=0.599 RMSE=0.783, Testing: MAE=0.618 RMSE=0.809 Epoch 20/20; Training: MAE=0.599 RMSE=0.783, Testing: MAE=0.618 RMSE=0.809
</pre> </pre>
{{< /rawhtml >}}
To limit the data, we can use just the top movies (by number of ratings): To limit the data, we can use just the top movies (by number of ratings):
@ -1567,7 +1604,6 @@ latent_factor_grid(svd4.q[:2,:])
<pre class="result">
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15
|--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|--- |--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|---
0|||||||||||||||| 0||||||||||||||||
@ -1586,7 +1622,6 @@ latent_factor_grid(svd4.q[:2,:])
13||||||||Sound of Music; Spy Kids 2: The Island of Lost...|Bring It On; Legally Blonde|Fly Away Home; Parent Trap|Sense and Sensibility; Sex and the City||||| 13||||||||Sound of Music; Spy Kids 2: The Island of Lost...|Bring It On; Legally Blonde|Fly Away Home; Parent Trap|Sense and Sensibility; Sex and the City|||||
14|||||||Babe; Babe: Pig in the City||||Twilight||||| 14|||||||Babe; Babe: Pig in the City||||Twilight|||||
15|||||||||||||||| 15||||||||||||||||
</pre>
@ -1604,7 +1639,6 @@ latent_factor_grid(svd4.q[2:,:])
<pre class="result">
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15
|--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|--- |--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|---
0|||||||||||||||| 0||||||||||||||||
@ -1623,7 +1657,6 @@ latent_factor_grid(svd4.q[2:,:])
13||||||Nightmare on Elm Street 4: The Dream Master; F...|Wes Craven's New Nightmare (Nightmare on Elm S...|Friday the 13th; Exorcist III|Candyman; Texas Chainsaw Massacre 2|Mars Attacks!; Halloween|Evil Dead II (Dead by Dawn); Re-Animator|Night of the Living Dead; Dead Alive (Braindead)||Eraserhead|| 13||||||Nightmare on Elm Street 4: The Dream Master; F...|Wes Craven's New Nightmare (Nightmare on Elm S...|Friday the 13th; Exorcist III|Candyman; Texas Chainsaw Massacre 2|Mars Attacks!; Halloween|Evil Dead II (Dead by Dawn); Re-Animator|Night of the Living Dead; Dead Alive (Braindead)||Eraserhead||
14|||||||Nightmare on Elm Street 3: Dream Warriors; Fre...|Hellbound: Hellraiser II|Nightmare on Elm Street||||||| 14|||||||Nightmare on Elm Street 3: Dream Warriors; Fre...|Hellbound: Hellraiser II|Nightmare on Elm Street|||||||
15|||||||Bride of Chucky (Child's Play 4)||||Texas Chainsaw Massacre||||| 15|||||||Bride of Chucky (Child's Play 4)||||Texas Chainsaw Massacre|||||
</pre>
@ -1643,9 +1676,6 @@ bias.iloc[:10]
<pre class="result">
| movie_title | num_ratings | avg_rating | bias | movie_id | | | | | movie_title | num_ratings | avg_rating | bias | movie_id | | | |
|------------|------------|-----------|-----|---------|-|-|-|- |------------|------------|-----------|-----|---------|-|-|-|-
318|Shawshank Redemption, The (1994)|63366.0|4.446990|1.015911 318|Shawshank Redemption, The (1994)|63366.0|4.446990|1.015911
@ -1658,7 +1688,6 @@ bias.iloc[:10]
50|Usual Suspects, The (1995)|47006.0|4.334372|0.910651 50|Usual Suspects, The (1995)|47006.0|4.334372|0.910651
102217|Bill Hicks: Revelations (1993)|50.0|3.990000|0.900622 102217|Bill Hicks: Revelations (1993)|50.0|3.990000|0.900622
527|Schindler's List (1993)|50054.0|4.310175|0.898633 527|Schindler's List (1993)|50054.0|4.310175|0.898633
</pre>
@ -1672,7 +1701,6 @@ bias.iloc[:-10:-1]
<pre class="result">
| movie_title | num_ratings | avg_rating | bias | movie_id | | | | | movie_title | num_ratings | avg_rating | bias | movie_id | | | |
|------------|------------|-----------|-----|---------|-|-|-|- |------------|------------|-----------|-----|---------|-|-|-|-
8859|SuperBabies: Baby Geniuses 2 (2004)|209.0|0.837321|-2.377202 8859|SuperBabies: Baby Geniuses 2 (2004)|209.0|0.837321|-2.377202
@ -1684,7 +1712,6 @@ bias.iloc[:-10:-1]
4775|Glitter (2001)|685.0|1.124088|-2.047287 4775|Glitter (2001)|685.0|1.124088|-2.047287
31698|Son of the Mask (2005)|467.0|1.252677|-2.022763 31698|Son of the Mask (2005)|467.0|1.252677|-2.022763
5739|Faces of Death 6 (1996)|174.0|1.261494|-2.004086 5739|Faces of Death 6 (1996)|174.0|1.261494|-2.004086
</pre>
@ -1732,7 +1759,6 @@ pd.DataFrame.from_records(
<pre class="result">
| Library | Algorithm | MAE (test) | RMSE (test) | Library | Algorithm | MAE (test) | RMSE (test)
|--------|----------|-----------|------------ |--------|----------|-----------|------------
0||Slope One|0.656514|0.856294 0||Slope One|0.656514|0.856294
@ -1740,7 +1766,6 @@ pd.DataFrame.from_records(
2|Surprise|Random|1.144775|1.433753 2|Surprise|Random|1.144775|1.433753
3|Surprise|Slope One|0.704730|0.923331 3|Surprise|Slope One|0.704730|0.923331
4|Surprise|SVD|0.694890|0.900350 4|Surprise|SVD|0.694890|0.900350
</pre>

View File

@ -0,0 +1,35 @@
<!-- Copied from hugo-notepadium in order to:
- Remove dollar signs from inlineMath because it breaks too much
(I can't have dollar signs twice in one paragraph, even escaped
like \$).
-->
{{- if or (eq site.Params.math.enable true) (eq .Params.math true) -}}
{{- $use := "katex" -}}
{{- with site.Params.math -}}
{{- if and (isset . "use") (eq (.use | lower) "mathjax") -}}
{{- $use = "mathjax" -}}
{{- end -}}
{{- end -}}
{{- if eq $use "mathjax" -}}
{{- $url := "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -}}
{{- $hash := "sha384-e/4/LvThKH1gwzXhdbY2AsjR3rm7LHWyhIG5C0jiRfn8AN2eTN5ILeztWw0H9jmN" -}}
<script defer type="text/javascript" src="{{- $url -}}" integrity="{{- $hash -}}" crossorigin="anonymous"></script>
<script
type="text/x-mathjax-config">MathJax.Hub.Config({ tex2jax: { inlineMath: [/*['$','$'], */['\\(','\\)']] } });</script>
{{- else -}}
{{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" -}}
{{- $hash := "sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" -}}
<link rel="stylesheet" href="{{- $url -}}" integrity="{{- $hash -}}" crossorigin="anonymous">
{{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" -}}
{{- $hash := "sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" -}}
<script defer src="{{- $url -}}" integrity="{{- $hash -}}" crossorigin="anonymous"></script>
{{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" -}}
{{- $hash := "sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" -}}
<script defer src="{{- $url -}}" integrity="{{- $hash -}}" crossorigin="anonymous"
onload="renderMathInElement(document.body);"></script>
{{- end -}}
{{- end -}}