-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using UMAP instead of t-SNE #5
Comments
I did try out UMAP for building the atlas visualization. However, for me the results weren't that good or interpretable compared to t-SNE. Most of the embeddings I would up with looked like this one: https://github.com/igfod13/MALmap A big, dense, homogenous blob without much structure or interest. Although t-SNE does seem to trade off some accuracy, I found that it resulted in a much better looking embedding that was more interesting to browse around in.
This is true, but idk if it's t-SNE's fault per se. I set up my node sizing code back when I first built the atlas a couple of years ago. Since then, I've collected much more data but the sizing code hasn't been changed. This causes the most popular anime to be even larger on the visualization and drown out others. I'll look into the scaling of the vis and the sizing of individual nodes when I deploy the next version of the site. I actually trained up a whole new model with data up to a couple of months ago, but I was having some issues with the quality of the modle compared to what's currently live. If you want to try it out, it's here: https://anime-preview.ameo.dev/ I'd be interested to hear your thoughts on recommendations for your own profile compared to live if you had time to try it out. |
Here is one: assuming a person don't have MAL, and they pick the first few anime that comes to mind (espcially the niche ones) to see what else would be recommened. Such anime would be specifically "good" or "amazing" to them, but they should be able to rate the first algorithmically recommended anime on top of the list as "meh" or "bad" since they often do have bias for popular middle-of-the-road stuff. |
Currently, Sprout uses t-SNE, which makes it harder to see the similarity of niche anime to others. UMAP might be stronger in preserving distances, and better for discovering cluster. Also not sure if PaCMAP, LargeVis, and Isomap can do the same thing (probs not for TriMAP and ForceAtlas2).
P.S. reference to other "MAL maps" https://github.com/igfod13/MALmap https://github.com/platers/MAL-Map
The text was updated successfully, but these errors were encountered: