diff --git a/content/posts/2016-10-12-pi-pan-tilt-3.md b/content/posts/2016-10-12-pi-pan-tilt-3.md index 04909fa..1fe9d04 100644 --- a/content/posts/2016-10-12-pi-pan-tilt-3.md +++ b/content/posts/2016-10-12-pi-pan-tilt-3.md @@ -171,7 +171,7 @@ image. Here's another comparison, this time a 1:1 crop from the center of an image (shot at 40mm with [this lens][12-40mm], whose Amazon price -mysteriously is now $146 instead of the $23 +mysteriously is now $146 instead of the $23 I actually paid). Click the preview for a lossless PNG view, as JPEG might eat some of the finer details, or [here][leaves-full] for the full JPEG file (including raw, if you want to look around). diff --git a/content/posts/2018-04-08-recommender-systems-1/index.md b/content/posts/2018-04-08-recommender-systems-1/index.md index b5cb42a..97dcb45 100644 --- a/content/posts/2018-04-08-recommender-systems-1/index.md +++ b/content/posts/2018-04-08-recommender-systems-1/index.md @@ -85,6 +85,7 @@ Below is just to inspect that data appears to be okay: ml.info() {{< / highlight >}} +{{< rawhtml >}}
RangeIndex: 20000263 entries, 0 to 20000262
@@ -96,7 +97,7 @@ ml.info()
dtypes: datetime64[ns](1), float32(1), int32(2)
memory usage: 381.5 MB
-
+{{< /rawhtml >}}
{{| user_id | movie_id | rating |---------|----------|------- count|2.000026e+07|2.000026e+07|2.000026e+07 @@ -117,7 +117,6 @@ min|1.000000e+00|1.000000e+00|5.000000e-01 50%|6.914100e+04|2.167000e+03|3.500000e+00 75%|1.036370e+05|4.770000e+03|4.000000e+00 max|1.384930e+05|1.312620e+05|5.000000e+00 -@@ -131,7 +130,6 @@ ml[:10] -
| user_id | movie_id | rating | time |--------|---------|-------|----- 0|1|2|3.5|2005-04-02 23:53:47 @@ -144,7 +142,6 @@ ml[:10] 7|1|223|4.0|2005-04-02 23:46:13 8|1|253|4.0|2005-04-02 23:35:40 9|1|260|4.0|2005-04-02 23:33:46 -@@ -159,12 +156,13 @@ max_user, max_movie, max_user * max_movie +{{< rawhtml >}}
(138494, 131263, 18179137922)
-
+{{< /rawhtml >}}
Computing what percent we have of all 'possible' ratings (i.e. every single movie & every single user), this data is rather sparse:
@@ -174,12 +172,13 @@ Computing what percent we have of all 'possible' ratings (i.e. every single movi
print("%.2f%%" % (100 * ml.shape[0] / (max_user * max_movie)))
{{< / highlight >}}
+{{< rawhtml >}}
0.11%
-
+{{< /rawhtml >}}
## 3.1. Aggregation
@@ -214,7 +213,6 @@ movie_stats.sort_values("num_ratings", ascending=False)[:25]
-
| movie_title | num_ratings | avg_rating | movie_id | | |
|------------|------------|-----------|---------|-|-|-
296|Pulp Fiction (1994)|67310.0|4.174231
@@ -242,7 +240,6 @@ movie_stats.sort_values("num_ratings", ascending=False)[:25]
608|Fargo (1996)|43272.0|4.112359
47|Seven (a.k.a. Se7en) (1995)|43249.0|4.053493
380|True Lies (1994)|43159.0|3.491149
-
@@ -267,9 +264,9 @@ examples of this, check out section 11.3.2 in [MMDS](http://www.mmds.org/).)
In a utility matrix, each row represents one user, each column represents
one item (a movie, in our case), and each element represents a user's
-rating of an item. If we have $n$ users and $m$ movies, then this is a
-$n \times m$ matrix $U$ for which $U_{k,i}$ is user $k$'s rating for
-movie $i$ - assuming we've numbered our users and our movies.
+rating of an item. If we have \\(n\\) users and \\(m\\) movies, then this is a
+\\(n \times m\\) matrix \\(U\\) for which \\(U_{k,i}\\) is user \\(k\\)'s rating for
+movie \\(i\\) - assuming we've numbered our users and our movies.
Users have typically rated only a fraction of movies, and so most of
the elements of this matrix are unknown. Algorithms represent this
@@ -315,13 +312,14 @@ ml_mat_train
+{{< rawhtml >}}
<138494x131263 sparse matrix of type ''
with 15000197 stored elements in Compressed Sparse Column format>
-
+{{< /rawhtml >}}
To demonstrate that the matrix and dataframe have the same data:
@@ -335,7 +333,6 @@ ml_train[:10]
-| user_id | movie_id | rating | time |--------|---------|-------|----- 13746918|94976|7371|4.5|2009-11-04 05:51:26 @@ -348,7 +345,6 @@ ml_train[:10] 15311014|105846|4226|4.5|2004-07-30 18:12:26 8514776|58812|1285|4.0|2000-04-24 20:39:46 3802643|25919|3275|2.5|2010-06-18 00:48:40 -@@ -361,12 +357,13 @@ list(ml_train.iloc[:10].rating) +{{< rawhtml >}}
[4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5]
-
+{{< /rawhtml >}}
@@ -379,12 +376,13 @@ movie_ids = list(ml_train.iloc[:10].movie_id)
+{{< rawhtml >}}
[4.5, 3.0, 3.0, 4.5, 4.0, 2.5, 5.0, 4.5, 4.0, 2.5]
-
+{{< /rawhtml >}}
Okay, enough of that; we can begin with some actual predictions.
@@ -457,7 +455,6 @@ names.merge(ml_train[ml_train.user_id == target_user], right_on="movie_id", left
-| movie_title | user_id | movie_id | rating | time |------------|--------|---------|-------|----- 4229884|Jumanji (1995)|28812|2|5.0|1996-09-23 02:08:39 @@ -471,7 +468,6 @@ names.merge(ml_train[ml_train.user_id == target_user], right_on="movie_id", left 4229957|Independence Day (a.k.a. ID4) (1996)|28812|780|5.0|1996-09-23 02:09:02 4229959|Phenomenon (1996)|28812|802|5.0|1996-09-23 02:09:02 4229960|Die Hard (1988)|28812|1036|5.0|1996-09-23 02:09:02 -@@ -488,12 +484,13 @@ names[names.index == target_movie] +{{< rawhtml >}}
| movie_title | movie_id | |------------|---------|- 586|Home Alone (1990)- +{{< /rawhtml >}} @@ -513,7 +510,6 @@ users_df -
| movie_id_x | user_id | rating_x | rating_y |-----------|--------|---------|--------- 0|329|17593|3.0|4.0 @@ -527,8 +523,6 @@ users_df 522688|2|126271|3.0|4.0 522689|595|82760|2.0|4.0 522690|595|18306|4.5|5.0 -- @@ -544,7 +538,6 @@ users_df -
| movie_id_x | user_id | rating_x | rating_y | rating_dev |-----------|--------|---------|---------|----------- 0|329|17593|3.0|4.0|1.0 @@ -558,7 +551,6 @@ users_df 522688|2|126271|3.0|4.0|1.0 522689|595|82760|2.0|4.0|2.0 522690|595|18306|4.5|5.0|0.5 -@@ -574,9 +566,6 @@ names.join(rating_dev, how="inner").sort_values("rating_dev") - - -
| movie_title | rating_dev
|------------|-----------
318|Shawshank Redemption, The (1994)|-1.391784
@@ -600,8 +589,6 @@ names.join(rating_dev, how="inner").sort_values("rating_dev")
173|Judge Dredd (1995)|0.518570
19|Ace Ventura: When Nature Calls (1995)|0.530155
160|Congo (1995)|0.559034
-
-
@@ -620,8 +607,6 @@ df.join(names, on="movie_id").sort_values("movie_title")
-
-
| user_id | movie_id | rating | rating_adj | movie_title
|--------|---------|-------|-----------|------------
4229920|28812|344|3.0|3.141987|Ace Ventura: Pet Detective (1994)
@@ -645,7 +630,6 @@ df.join(names, on="movie_id").sort_values("movie_title")
4229892|28812|50|3.0|1.683520|Usual Suspects, The (1995)
4229903|28812|208|3.0|3.250881|Waterworld (1995)
4229919|28812|339|4.0|3.727966|While You Were Sleeping (1995)
-
@@ -660,12 +644,13 @@ df["rating_adj"].mean()
+{{< rawhtml >}}
4.087520122528076
-
+{{< /rawhtml >}}
As mentioned above, we also happen to have the user's actual rating on *Home Alone* in the test set (i.e. we didn't train on it), so we can compare here:
@@ -678,12 +663,13 @@ ml_test[(ml_test.user_id == target_user) & (ml_test.movie_id == target_movie)]["
+{{< rawhtml >}}
4.0
-
+{{< /rawhtml >}}
That's quite close - though that may just be luck. It's hard to say from one point.
@@ -702,7 +688,6 @@ names.join(num_ratings, how="inner").sort_values("num_ratings")
-
| movie_title | num_ratings
|------------|------------
802|Phenomenon (1996)|3147
@@ -726,7 +711,6 @@ names.join(num_ratings, how="inner").sort_values("num_ratings")
593|Silence of the Lambs, The (1991)|12120
480|Jurassic Park (1993)|13546
356|Forrest Gump (1994)|13847
-
@@ -757,7 +741,6 @@ df
-| user_id | movie_id | rating | rating_adj | num_ratings | rating_weighted |--------|---------|-------|-----------|------------|---------------- 4229918|28812|329|4.0|3.767164|6365|23978.000326 @@ -781,7 +764,6 @@ df 4229912|28812|296|4.0|2.883755|11893|34296.500678 4229884|28812|2|5.0|4.954595|7422|36773.001211 4229953|28812|595|4.0|3.515051|9036|31761.999825 -@@ -794,12 +776,13 @@ df["rating_weighted"].sum() / df["num_ratings"].sum() +{{< rawhtml >}}
4.02968199025023
-
+{{< /rawhtml >}}
It changes the answer, but only very slightly.
@@ -818,8 +801,9 @@ eyes glaze over, you can probably just skip this section.
### 5.2.1. Short Answer
-Let $U$ be the utility matrix. Let $M$ be a binary matrix for which $M_{i,j}=1$ if user $i$ rated movie $j$, otherwise 0. Compute the model's matrices with:
+Let \\(U\\) be the utility matrix. Let \\(M\\) be a binary matrix for which \\(M_{i,j}=1\\) if user \\(i\\) rated movie \\(j\\), otherwise 0. Compute the model's matrices with:
+{{< rawhtml >}}
(4.0875210502743862, 4.0875210502743862)
-
+{{< /rawhtml >}}
This computes training error on a small part (1%) of the data, since doing it over the entire thing would be horrendously slow:
@@ -1102,13 +1118,14 @@ print("Training error: MAE={:.3f}, RMSE={:.3f}".format(err_mae_train, err_rms_t
print("Testing error: MAE={:.3f}, RMSE={:.3f}".format(err_mae_test, err_rms_test))
{{< / highlight >}}
+{{< rawhtml >}}
Training error: MAE=0.640, RMSE=0.834
Testing error: MAE=0.657, RMSE=0.856
-
+{{< /rawhtml >}}
# 6. "SVD" algorithm
@@ -1126,33 +1143,39 @@ References on this model are in a few different places:
## 6.2. Motivation
-We again start from the $n \times m$ utility matrix $U$. As $m$ and $n$ tend to be quite large, $U$ has a lot of degrees of freedom. If we want to be able to predict anything at all, we must assume some fairly strict constraints - and one form of this is assuming that we don't *really* have that many degrees of freedom, and that there are actually some much smaller latent factors controlling everything.
+We again start from the \\(n \times m\\) utility matrix \\(U\\). As \\(m\\) and \\(n\\) tend to be quite large, \\(U\\) has a lot of degrees of freedom. If we want to be able to predict anything at all, we must assume some fairly strict constraints - and one form of this is assuming that we don't *really* have that many degrees of freedom, and that there are actually some much smaller latent factors controlling everything.
-One common form of this is assuming that the rank of matrix $U$ - its *actual* dimensionality - is much lower. Let's say its rank is $r$. We could then represent $U$ as the matrix product of smaller matrices, i.e. $U=P^\top Q$ where $P$ is a $r \times n$ matrix and $Q$ is $r \times m$.
+One common form of this is assuming that the rank of matrix \\(U\\) - its *actual* dimensionality - is much lower. Let's say its rank is \\(r\\). We could then represent \\(U\\) as the matrix product of smaller matrices, i.e. \\(U=P^\top Q\\) where \\(P\\) is a \\(r \times n\\) matrix and \\(Q\\) is \\(r \times m\\).
-If we can find dense matrices $P$ and $Q$ such that $P^\top Q$ equals, or approximately equals, $U$ for the corresponding elements of $U$ that are known, then $P^\top Q$ also gives us predictions for the unknown elements of $U$ - the ratings we don't know, but want to predict. Of course, $r$ must be small enough here to prevent overfitting.
+If we can find dense matrices \\(P\\) and \\(Q\\) such that \\(P^\top Q\\) equals, or approximately equals, \\(U\\) for the corresponding elements of \\(U\\) that are known, then \\(P^\top Q\\) also gives us predictions for the unknown elements of \\(U\\) - the ratings we don't know, but want to predict. Of course, \\(r\\) must be small enough here to prevent overfitting.
(What we're talking about above is [matrix completion](https://en.wikipedia.org/wiki/Matrix_completion) using low-rank [matrix decomposition/factorization](https://en.wikipedia.org/wiki/Matrix_decomposition). These are both subjects unto themselves. See the [matrix-completion-whirlwind](https://github.com/asberk/matrix-completion-whirlwind/blob/master/matrix_completion_master.ipynb) notebook for a much better explanation on that subject, and an implementation of [altMinSense/altMinComplete](https://arxiv.org/pdf/1212.0467).)
-Ordinarily, we'd use something like SVD directly if we wanted to find matrices $P$ and $Q$ (or if we wanted to do any of about 15,000 other things, since SVD is basically magical matrix fairy dust). We can't really do that here due to the fact that large parts of $U$ are unknown, and in some cases because $U$ is just too large. One approach for working around this is the UV-decomposition algorithm that section 9.4 of [MMDS](http://www.mmds.org/) describes.
+Ordinarily, we'd use something like SVD directly if we wanted to find matrices \\(P\\) and \\(Q\\) (or if we wanted to do any of about 15,000 other things, since SVD is basically magical matrix fairy dust). We can't really do that here due to the fact that large parts of \\(U\\) are unknown, and in some cases because \\(U\\) is just too large. One approach for working around this is the UV-decomposition algorithm that section 9.4 of [MMDS](http://www.mmds.org/) describes.
What we'll do below is a similar approach to UV decomposition that follows a common method: define a model, define an error function we want to minimize, find that error function's gradient with respect to the model's parameters, and then use gradient-descent to minimize that error function by nudging the parameters in the direction that decreases the error, i.e. the negative of their gradient. (More on this later.)
-Matrices $Q$ and $P$ have some other neat properties too. Note that $Q$ has $m$ columns, each one $r$-dimensional - one column per movie. $P$ has $n$ columns, each one $r$-dimensional - one column per user. In effect, we can look at each column $i$ of $Q$ as the coordinates of movie $i$ in "concept space" or "feature space" - a new $r$-dimensional space where each axis corresponds to something that seems to explain ratings. Likewise, we can look at each column $u$ of $P$ as how much user $u$ "belongs" to each axis in concept space. "Feature vectors" is a common term to see.
+Matrices \\(Q\\) and \\(P\\) have some other neat properties too. Note that \\(Q\\) has \\(m\\) columns, each one \\(r\\)-dimensional - one column per movie. \\(P\\) has \\(n\\) columns, each one \\(r\\)-dimensional - one column per user. In effect, we can look at each column \\(i\\) of \\(Q\\) as the coordinates of movie \\(i\\) in "concept space" or "feature space" - a new \\(r\\)-dimensional space where each axis corresponds to something that seems to explain ratings. Likewise, we can look at each column \\(u\\) of \\(P\\) as how much user \\(u\\) "belongs" to each axis in concept space. "Feature vectors" is a common term to see.
-In that sense, $P$ and $Q$ give us a model in which ratings are an interaction between properties of a movie, and a user's preferences. If we're using $U=P^\top Q$ as our model, then every element of $U$ is just the dot product of the feature vectors of the respective movie and user. That is, if $p_u$ is column $u$ of $P$ and $q_i$ is column $i$ of $Q$:
+In that sense, \\(P\\) and \\(Q\\) give us a model in which ratings are an interaction between properties of a movie, and a user's preferences. If we're using \\(U=P^\top Q\\) as our model, then every element of \\(U\\) is just the dot product of the feature vectors of the respective movie and user. That is, if \\(p_u\\) is column \\(u\\) of \\(P\\) and \\(q_i\\) is column \\(i\\) of \\(Q\\):
+{{< rawhtml >}}
6982/s 8928/s 10378/s 12877/s 15290/s 11574/s 13230/s
@@ -1396,7 +1432,7 @@ svd40.train(movies_train, users_train, ratings_train, epoch_callback=at_epoch)
Epoch 20/20; Training: MAE=0.549 RMSE=0.717, Testing: MAE=0.600 RMSE=0.787
-
+{{< /rawhtml >}}
{{
48199/s 33520/s 16937/s 13842/s 13607/s 15574/s 15431/s
@@ -1460,7 +1497,7 @@ svd4.train(ml_train["movie_id"].values, ml_train["user_id"].values, ml_train["ra
Epoch 20/20; Training: MAE=0.599 RMSE=0.783, Testing: MAE=0.618 RMSE=0.809
-
+{{< /rawhtml >}}
To limit the data, we can use just the top movies (by number of ratings):
@@ -1567,7 +1604,6 @@ latent_factor_grid(svd4.q[:2,:])
-| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|--- 0|||||||||||||||| @@ -1586,7 +1622,6 @@ latent_factor_grid(svd4.q[:2,:]) 13||||||||Sound of Music; Spy Kids 2: The Island of Lost...|Bring It On; Legally Blonde|Fly Away Home; Parent Trap|Sense and Sensibility; Sex and the City||||| 14|||||||Babe; Babe: Pig in the City||||Twilight||||| 15|||||||||||||||| -@@ -1604,7 +1639,6 @@ latent_factor_grid(svd4.q[2:,:]) -
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |--|--|--|--|--|--|--|--|--|--|---|---|---|---|---|--- 0|||||||||||||||| @@ -1623,7 +1657,6 @@ latent_factor_grid(svd4.q[2:,:]) 13||||||Nightmare on Elm Street 4: The Dream Master; F...|Wes Craven's New Nightmare (Nightmare on Elm S...|Friday the 13th; Exorcist III|Candyman; Texas Chainsaw Massacre 2|Mars Attacks!; Halloween|Evil Dead II (Dead by Dawn); Re-Animator|Night of the Living Dead; Dead Alive (Braindead)||Eraserhead|| 14|||||||Nightmare on Elm Street 3: Dream Warriors; Fre...|Hellbound: Hellraiser II|Nightmare on Elm Street||||||| 15|||||||Bride of Chucky (Child's Play 4)||||Texas Chainsaw Massacre||||| -@@ -1643,9 +1676,6 @@ bias.iloc[:10] - - -
| movie_title | num_ratings | avg_rating | bias | movie_id | | | | |------------|------------|-----------|-----|---------|-|-|-|- 318|Shawshank Redemption, The (1994)|63366.0|4.446990|1.015911 @@ -1658,7 +1688,6 @@ bias.iloc[:10] 50|Usual Suspects, The (1995)|47006.0|4.334372|0.910651 102217|Bill Hicks: Revelations (1993)|50.0|3.990000|0.900622 527|Schindler's List (1993)|50054.0|4.310175|0.898633 -@@ -1672,7 +1701,6 @@ bias.iloc[:-10:-1] -
| movie_title | num_ratings | avg_rating | bias | movie_id | | | | |------------|------------|-----------|-----|---------|-|-|-|- 8859|SuperBabies: Baby Geniuses 2 (2004)|209.0|0.837321|-2.377202 @@ -1684,7 +1712,6 @@ bias.iloc[:-10:-1] 4775|Glitter (2001)|685.0|1.124088|-2.047287 31698|Son of the Mask (2005)|467.0|1.252677|-2.022763 5739|Faces of Death 6 (1996)|174.0|1.261494|-2.004086 -@@ -1732,7 +1759,6 @@ pd.DataFrame.from_records( -
| Library | Algorithm | MAE (test) | RMSE (test) |--------|----------|-----------|------------ 0||Slope One|0.656514|0.856294 @@ -1740,7 +1766,6 @@ pd.DataFrame.from_records( 2|Surprise|Random|1.144775|1.433753 3|Surprise|Slope One|0.704730|0.923331 4|Surprise|SVD|0.694890|0.900350 -diff --git a/layouts/partials/math.html b/layouts/partials/math.html new file mode 100644 index 0000000..ba9eacc --- /dev/null +++ b/layouts/partials/math.html @@ -0,0 +1,35 @@ + +{{- if or (eq site.Params.math.enable true) (eq .Params.math true) -}} + {{- $use := "katex" -}} + + {{- with site.Params.math -}} + {{- if and (isset . "use") (eq (.use | lower) "mathjax") -}} + {{- $use = "mathjax" -}} + {{- end -}} + {{- end -}} + + {{- if eq $use "mathjax" -}} + {{- $url := "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -}} + {{- $hash := "sha384-e/4/LvThKH1gwzXhdbY2AsjR3rm7LHWyhIG5C0jiRfn8AN2eTN5ILeztWw0H9jmN" -}} + + + {{- else -}} + {{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" -}} + {{- $hash := "sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" -}} + + + {{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" -}} + {{- $hash := "sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" -}} + + + {{- $url := "https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" -}} + {{- $hash := "sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" -}} + + {{- end -}} +{{- end -}}