Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using UMAP instead of t-SNE #5

Open
TomLucidor opened this issue Sep 15, 2024 · 2 comments
Open

Using UMAP instead of t-SNE #5

TomLucidor opened this issue Sep 15, 2024 · 2 comments

Comments

@TomLucidor
Copy link

Currently, Sprout uses t-SNE, which makes it harder to see the similarity of niche anime to others. UMAP might be stronger in preserving distances, and better for discovering cluster. Also not sure if PaCMAP, LargeVis, and Isomap can do the same thing (probs not for TriMAP and ForceAtlas2).
P.S. reference to other "MAL maps" https://github.com/igfod13/MALmap https://github.com/platers/MAL-Map

@Ameobea
Copy link
Owner

Ameobea commented Sep 15, 2024

I did try out UMAP for building the atlas visualization. However, for me the results weren't that good or interpretable compared to t-SNE.

Most of the embeddings I would up with looked like this one: https://github.com/igfod13/MALmap

A big, dense, homogenous blob without much structure or interest. Although t-SNE does seem to trade off some accuracy, I found that it resulted in a much better looking embedding that was more interesting to browse around in.

which makes it harder to see the similarity of niche anime to others

This is true, but idk if it's t-SNE's fault per se. I set up my node sizing code back when I first built the atlas a couple of years ago. Since then, I've collected much more data but the sizing code hasn't been changed. This causes the most popular anime to be even larger on the visualization and drown out others.

I'll look into the scaling of the vis and the sizing of individual nodes when I deploy the next version of the site. I actually trained up a whole new model with data up to a couple of months ago, but I was having some issues with the quality of the modle compared to what's currently live.

If you want to try it out, it's here: https://anime-preview.ameo.dev/

I'd be interested to hear your thoughts on recommendations for your own profile compared to live if you had time to try it out.

@TomLucidor
Copy link
Author

Here is one: assuming a person don't have MAL, and they pick the first few anime that comes to mind (espcially the niche ones) to see what else would be recommened. Such anime would be specifically "good" or "amazing" to them, but they should be able to rate the first algorithmically recommended anime on top of the list as "meh" or "bad" since they often do have bias for popular middle-of-the-road stuff.
I think a partial solution to the recommendation engine, other than just add negative weight based on popularity, is probably distill factors from anime. Amplifying the differences between different genres would help with niche genres and topics. I may be biased of buzzwording things like ICA and sparse matrix, since often niche anime is lumped with other niche anime, but clusters can still have general dimension.
P.S. Have you tried other visualization techniques for finding the sweet spot and emphasize differences rather than just lumping the most popular stuff in the center?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants