Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily use JupyterLite for high-traffic links to mybinder.org #513

Closed
minrk opened this issue Dec 20, 2021 · 26 comments · Fixed by #682
Closed

Temporarily use JupyterLite for high-traffic links to mybinder.org #513

minrk opened this issue Dec 20, 2021 · 26 comments · Fixed by #682
Labels

Comments

@minrk
Copy link
Member

minrk commented Dec 20, 2021

The first couple of links on https://jupyter.org/try drive a huge amount of the overall traffic on mybinder.org. We're currently dealing with a bit of a funding crisis, so we might try shifting the highest traffic links to JupyterLite demos, instead, to reduce the gap funding we need to cover while we figure this out.

Maybe someday JupyterLite will be in a place where this is the right thing to do in the long term, but probably now wouldn't be the time if we don't need it. This isn't ideal, but the alternative would be for the links to just stop working, which seems worse. Plus, an exciting demo for Jupyter Lite!

@bollwyvl
Copy link

I actually have some days off coming up, so if we have to spike this, I can support it. I think everything's working basically as well as it ever has at this point, even though we're still in alpha, nominally.

It's really just a matter of what we want:

  • content
  • extra experiences (e.g. tour)
  • build chain (e.g. straight lite, sphnx)
  • host (e.g. RTD, gh-pages... I've some success with gitlab

If we need some more bespoke labextensions, that's also very doable.

At present, there is no google analytics or anything: probably doable.

@choldgraf
Copy link
Collaborator

@bollwyvl what do you think is the biggest bottleneck that we would need to worry about, if there were a large spike in people clicking a link that served a JupyterLite?

@SylvainCorlay
Copy link
Member

I will see if we can help with that.

@bollwyvl
Copy link

biggest bottleneck

There aren't any, really. Once deployed, provided the host (whatever it is) has a CDN in front of it, it can be entirely self-sufficient and cached (just GET requests), including mathjax, pyodide, and whatever PyPI wheels we might want to pre-load, with the fallback to "big pypi" (which is also behind a CDN).

@choldgraf
Copy link
Collaborator

I would am definitely in favor of this, provided that the JupyterLite team is confident that this wouldn't generate a lot of negative attention their way once a lot more people were clicking those links. For generic and straightforward Jupyter experiences, like the ones that try.jupyter.org is meant to serve, I think JupyterLite is a much better solution than a full-blown image setup like mybinder.org . It would save cloud spend for the Binder team, and I think would provide a nice more lightweight solution to complement repo2docker.

@bollwyvl
Copy link

negative attention

Yeah, there are a number of known issues that are going to be surprising to people, but yeah, in this case, it's a demo.

Some gotchas:

  • all over
  • in lab
    • you really can't have more than one or two kernels running
  • in retro
    • the urls won't be exactly like one remembers
  • in pyodide
    • a fair amount of stdlib is missing (what the hell is a process?)
    • no requests.get (gotta use js.fetch)
    • gotta await piplite.install everything before using it

@choldgraf
Copy link
Collaborator

choldgraf commented Dec 20, 2021

Just to put some back-of-the-hand numbers to this, I grabbed the latest 30 days of activity on mybinder.org from @betatim's binderlytics analyzer.

For the top 20 repositories (that make up about 75% of Binder traffic). If we split those repositories into ones that are served by try.jupyter.org, and all the rest, we get something like this:

image

About 600,000 are from try.jupyter.org, and about 120,000 are from other repositories. This means that if we were serving try.jupyter.org via jupyterlite, it would reduce mybinder.org's launch load by more than 50%, which would save a lot of resources.

That said, from a CDN standpoint, could JupyterLite be realistically served to users 600,000 / 30 = 20,000 times a day? I don't know as much about the size/complexity/etc, just making sure that we wouldn't hit an unexpected order-of-magnitude problem

@bollwyvl
Copy link

From a cold start, the jupyterlite launcher page is (gzipped) 3.1mb vs 1.8mb for the jupyter.org home page, all fronted by cloudflare. At that point, it's already got, a lot of features like markdown rendering, but doesn't yet have mathjax or any kernels.

By the time the baseline pyodide notebook loads with a pyolite kernel, it's up to 31.5mb, with almost all of that coming from pyodide on jsdelivr... but at that point has already loaded matplotlib, numpy and a lot of other stuff. Meanwhile, the /widgets page is 14.9mb.

So a "bounced" lite load is ~2x home page loads, "full" interactive lite load is ~10x home page loads, and ~2x widget page loads, which seems like it's not going to break anything too badly.

I've also been toying with shifting some of the heavy to p2p tech... we could likely fit the relevant parts of lab, retro, and the demo wheels within the free tier of many of the IPFS pinning services, and then cloudflare would also host that.

@choldgraf
Copy link
Collaborator

Just read your response and want to note how cool jupyterlite is lol

@bollwyvl
Copy link

Note, of course, of try.jupyter.org's hero badges, we'll only be able to support lab and python directly, and retro as a stand-in for classic.

There are a few other kernels ready to go (p5, lua, wren), a few on deck (robotframework) but r, julia, kotlin, scheme, ruby, voila, are not happening by next week.

@choldgraf
Copy link
Collaborator

choldgraf commented Dec 20, 2021

I don't think that'd be a huge issue for now. by far the most common repositories are:

ipython in depth has no non-Python code, and is the most heavily used
the jupyterlab demo does have non-python stuff (it is kind of a "jack of all trades" repository. But in my opinion, we'd be fine collapsing that to just Python for this particular try.jupyter.org link, and adding another repository / demo that included the non-python stuff

@choldgraf
Copy link
Collaborator

choldgraf commented Dec 21, 2021

Proposal

Since we are not sure exactly what will happen once we turn on the floodgates of people clicking Jupyter Lite links instead of mybinder.org links, what if we did the following:

  • Created a JupyterLite demo (maybe the one that's currently at the Jupyter Lite docs?) that was a lightweight introduction to Jupyter Lab. It would also discuss what Jupyter Lite is, and explain that it was a lightweight Jupyter Lab environment.
  • bumped the current Jupyter Lab demo to the second row so that it has a bit less visibility.

This would slightly reduce the load on Binder, and might be a way for us to identify any issues that popped up in Jupyter Lite as a result. All the old Binder links would still be there, just wouldn't be the first two options on the page. Then after a few weeks if all is well, we could swap the other link (ipython in depth) or do the same "bump to below" approach described above.

Reasoning: Those try.jupyter.org buttons are for people who are learning about Jupyter for the first time. They don't need to be "in-depth" style tutorials - those are better designed for somebody that already knows about Jupyter and has the time/energy to dive in further, so it's OK if it takes more searching for them to find those tutorials. Better to keep the "above the fold" buttons to go towards quick, lightweight, "I have 30 seconds of attention span" experiences, that then direct users to another space if they want to learn further (like a full-blown Binder tutorial).

@choldgraf
Copy link
Collaborator

choldgraf commented Dec 21, 2021

I made a short PR to JupyterLite to add what I think are some minimal "getting started" notebooks to the JupyterLite documentation, so that we can feel more comfortable sending first-time users there if we want to use try.jupyter.org for this: jupyterlite/jupyterlite#432

If folks are generally OK with this plan, how about we try adding the following link as the second button on try.jupyter.org, and bumping the current JupyterLab button below the fold.

If this works out, we can do a similar thing with the IPython link as well:

@bollwyvl
Copy link

Mentioned over on the PR: while I'll stand behind lite as a way to build an ownable site, the docs for lite itself might have a different focus (for builders, etc.) than what try might be trying to do.

Also, basically all the cache goes right out the door when you go to different domains.

So basically, I'm imagining whatever repo is driving try would actually have a lite site on it which was actively tested before being deployed. This would allow for tuning the selection of extensions, messaging, analytics, etc.

@choldgraf
Copy link
Collaborator

Totally agreed that's a better path forward - this was mostly an attempt at mitigating the immediate fire for binder, but we have a little bit breathing room there now anyway. I think we should take whatever path you recommend 👍

@bollwyvl
Copy link

Yeah, can dig.

So is this just a GitHub pages with Jekyll? Can we put python stuff in the build? Or would it make more sense to have a lite branch and submodule that into gh-pages? Does that even work?

@choldgraf
Copy link
Collaborator

Yep just a plain old Jekyll site, though I think it builds with netlify

@krassowski
Copy link
Member

Or just create a dedicated try repo?

@bollwyvl
Copy link

plain old Jekyll site

Cool. Looks like submodules would work.

dedicated try repo?

Right. Either way. Some additional considerations:

  • on the lite site we are trying to self-host as much as possible (partially to demonstrate that it can be done)
    • mathjax
    • a fixed set of all of the wheels used in demos not provided by...
  • pyodide, which we load from their CDN
    • while the entrypoint js and python wasm are small-ish (~10mb)
      • the entire distribution is.... large (~200mb)
    • i'd probably make this a sub -submodule

So we'd want to figure out just how much we'd want to check in to this future repo, whatever it's called, and how it would be coordinated to update to a new version as links into it need to be updated. There are some pointy entropy edges around upgrades than can require significant cache-refreshing.

@choldgraf
Copy link
Collaborator

This all sounds great to me. A dedicated try repo makes sense as well if the repo will require some bespoke configuration.

The only thing I'd urge is that we stick to a minimal set of content and examples at first, so that we don't make this problem harder than it needs to be 🙂

The first two content links at try.jupyter.org is already too complex IMO, and I think our lives will be easier if we think in terms of "a 2 minute experience that gives people an idea of what jupyter is and where to learn more" rather than an in depth treatment or a kitchen sink.

@krassowski
Copy link
Member

the entire distribution is.... large (~200mb)

Is this because of some libraries/packages? Can we cut something to reduce it? Users in some countries won't be able to afford to visit a website with such requirements.

@bollwyvl
Copy link

libraries/packages

These are compiled stand-ins for pip packages, e.g. numpy, scikit-learn, etc. If, as a user, you don't import them, you don't pay for them. Further, this all happens in RAM, as all of the browser-based storage mechanism (indexedb) are way to slow to use for a POSIX filesystem.

At present, the baseline pyolite kernel does install up to matplotlib (hence that 30mb), by default, which does drive the number up fairly quickly: we're going to revisit our ability to patch things at import time, but would really like to get some of that complexity into a separate repo, packaged like any other extension.

Part of not making pyodide special: there would also be the potential for having an even lighter-weight python kernel, as we did on jyve, based on brython or similar, but these have even more quirks than pyodide, which really is python, and really does run IPython.

@jtpio
Copy link
Member

jtpio commented Dec 23, 2021

I would also recommend setting up a dedicated repo / deployment for the lite site of try.jupyter.org. https://jupyterlite.rtfd.io/en/latest/try/lab is like a nightly build on the main branch and things might still change quickly.
Also one of the main points of the jupyterlite project is being able to deploy your own lite website easily.

There is also the demo repo showing how to deploy to GitHub Pages, available here for reference: https://github.com/jupyterlite/demo.
Although for try.jupyter.org we would probably want heavier customization with jupyter_lite_config.json and jupyter-lite.json than just a plain requirements.txt.

@palewire palewire added this to the Streamline the site milestone Dec 23, 2021
@choldgraf
Copy link
Collaborator

That sounds great to me - I am happy to review work making that happen, or to update our docs to point to a new jupyterlite-backed try page. But I lack the knowledge / skills / time to figure out how to deploy a new jupyterlite in a repository, it is all black magic to me. it'd be great if somebody else can spearhead that and I support however I can!

@jtpio
Copy link
Member

jtpio commented Dec 24, 2021

how to deploy a new jupyterlite in a repository, it is all black magic to me

Hopefully the docs will be improved soon: jupyterlite/jupyterlite#393

One way to think about jupyterlite is as just another static website generator. Probably the challenge here (as mentioned in the comments above) is to make the jekyll static website cohabit with the jupyterlite static website under the same domain.

@palewire palewire removed this from the Streamline the site milestone Jan 14, 2022
@palewire palewire added the bug label Jan 14, 2022
@jtpio
Copy link
Member

jtpio commented Feb 23, 2022

FYI I opened an early draft PR for visibility and feedback: #682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants