-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify credits for the next year of gke.mybinder.org
#463
Comments
Update: conversation with Karan and the GCP Research team@consideRatio and I had a conversation with Karan from Google Cloud. He said that he was hopeful they'd be able to fund In particular, they care about things that demonstrate diverse and worldwide impact, like:
I've updated the top comment with some next steps about putting together this 2-pager |
I am going to try putting together a 2-pager ASAP that we can send to Karan, because 1.5 months is not that much time for us to get another round of funding. I would really appreciate any suggestions or help from others! Here are a few things that could be useful to help:
|
I'll work on gathering some analytics |
@minrk I did some work here https://gist.github.com/MridulS/5accc696311c4f381c05cb70922d3624 |
Nice! I'll look at getting region data from matomo |
I wonder if we can deploy this to the federation too, to make info gathering easier in the future? https://github.com/bitnik/binder-launches (Unfortunately, the link to the instance at GESIS no longer seems to be up) |
Hey all - thanks for these very helpful graphs! I tried to reproduce some of them but I cannot figure out how to get the Matomo secret to access that data (here's an issue I opened as a result: #473 (comment)). Can anybody help me get access to the Matomo data so that we can include country information in this report? |
Update: draft is readyHey all - I took some of the plots here (some directly, some as inspiration) and put together the 2-pager at the link below: https://docs.google.com/document/d/1DvW8TYgEVWYvsgZKlr4JrmuhLQoYC-jTie0okgnIjp0/edit?usp=sharing I'd love feedback from folks if they think this looks OK. The goal of the 2-pager is to demonstrate the impact and usage of Binder, but doesn't need to go into a ton of detail. Also note, I couldn't figure out how to get Matomo data myself, so I just went with copy/pasting Google Analytics images, but happy (and prefer) to use Matomo data if somebody can help me get access to it. I've uploaded some archive launch data + the notebooks to visualize it here: https://github.com/choldgraf/binder-meta If people would like to make any changes etc to those notebooks, PRs are welcome! |
Update: sent to Karan for feedbackI know that this is a short turnaround, but we only have about a month before gke.mybinder.org runs out of credits, so I have sent the two-pager above to Karan for some feedback to see if we need to add anything to the 2-pager before he submits internally. I've cc'ed @minrk (as team lead) and @consideRatio (since he's been helping with the GCP Binder move lately) on the email. Will report back with relevant information. |
Thanks a lot for putting the numbers together and adding words! I think it is good enough that we could send it already, so now we have a bit of time to make it even better. I read the draft and left a few comments. Most of them are suggestions/nitpicks. One thing I was wondering is if we can show/say something about an exciting/new area that is growing in terms of mybinder.org usage. The prime example that comes to my mind is things like executable books where we provide a crucial bit of infrastructure for courses/educational books from around the world that lets them do something that is otherwise super hard to do (executable sections in a text book). But I am not deeply enough into the executable books project/user base to know if there are a handful of neat projects we could point to. Not in detail but as a "this is a new area that is growing and super cool!" |
Hey all - I have still not heard any specific response from Google, and so I want to start contingency planning for what to do if we do not get new credits in time. Here is what I propose: Timeline for running out of credits
This will reduce mybinder.org's capacity by about 75%, but I think that's just the reality that we face. Unless somebody has access to a large amount of Google Cloud credits that we can hook into gke.mybinder.org, I'm not sure what else we can do. |
Sad times. We will also need to find a new host for https://github.com/jupyterhub/mybinder.org-deploy/tree/master/images/federation-redirect and tweak its configuration so it will continue to work without GKE as the "prime". I think it makes sense to use the OVH deployment as the new prime site. I think these tasks need to happen to make the move:
I think we can run two instances of the federation proxy in parallel without weird stuff happening. This means it shouldn't be a huge interruption to users. Where/how should I give feedback on the blogpost draft? Should we now tweet about the upcoming change? - If we do we give people (who read the tweet) only about 48h notice which isn't a lot. But hopefully there aren't too many people who rely on gke.mybinder.org explicitly or were planning big demos or some such. I think it would be a good idea to do so. |
We are scrambling a bit to see if we can make up any extra funding from a different source. I also hope to have a more definitive answer from GCP by the end of day US/Pacific. There are two potential other funding sources we might be able to use in a stop-gap fashion.
Either case it not a long-term solution, more like a 1-month stopgap to keep the lights on. Here's my proposed plan: REMOVED here and added to the top comment above I'll update the top comment with this plan for visibility |
What does "deploy with Pangeo funding" mean? Switching billing accounts or deploying to a new cluster or something third? For anything beyond "Switch billing accounts" I think we should start moving the federation proxy as it will be good to have that somewhere else in either case. And it is something we can start doing instead of waiting for the clock to tick down. The closer we get to the lights going out the more hectic things will get, the more hectic things get the more mistakes we will make, the more mistakes we make the more hectic it will get, etc :D So I think starting to move now is worth it. |
@betatim yep, Pangeo has some grant funds parked at Columbia which are earmarked for a Binder deployment, and we can realistically say it is in-scope for that grant to pay for a short time of mybinder.org. However it'd require setting up a new project under the Columbia.edu cloud org, and re-deploying gke.mybinder.org there. This is why it is the last preferred option |
(sorry I edited my last comment above for a long time without clicking "save") |
Has anyone asked the current members of the federation how much spare capacity we have there? Maybe we can increase our allocations there to make up for the lost capacity at GKE. cc @MridulS for gesis, @sgibson91 for Turing (can you tag the right new person please?) and @mael-le-gal for OVH |
Making the disks smaller sounds like a good plan. One thing I have at the back of my mind is that IO performance is linked to disk size. So maybe that was the reason for having such large disks (and we seem to end up with spare credits at the end of the year any way -> the time limit of the credits is a bigger factor that the amount). Worse performance is better than no performance though, so yay to smaller disks. We could also ditch the "two disk" approach and use only the main disk to save even more money. I think OVH has been running in that mode for a while now. It needs a bit of a reconfiguration of the image GC to use an absolute size and not an inode based threshold. |
The local SSD is a relatively small cost, so I'm not sure it's an optimization worth making right now. Since the same capacity on the PD SSD is 5x as expensive, merging the two probably doesn't make sense. It would be interesting if we could get the host docker onto another local SSD and lose the PD-SSD altogether. That would would save tons at a cost of fixed capacity per node. Not sure if that's possible, though. |
I just got off the phone with the Google OSPO office. They are working to find stop-gap funding (maybe 6 weeks or so) in order to keep Binder running through January (though, if we can bring down the costs, we might be able to extend this a few months). That would buy us some time to work out a longer-term solution that is more sustainable for us (and for them) than this "every 1 year we frantically email people we know at Google" approach thus far. No promises from them, but I'm hopeful we'll work something out and will report back here as I learn more. |
New quota increases from federation members have greatly reduced load on GKE prod. I've helped encourage scale-down a little with some cordoning. But I think between (ongoing) stale image deletions and load redistribution, we're looking at at least a few thousand dollars saved today on the monthly bill. |
One thing we probably still need to do to run in a cost-conscious way is help scale-down with manual cordoning of low-occupancy user nodes. We had 7 42-day-old nodes, which is capacity for ~600 users at our lowest traffic times, I think? That's definitely more than we needed. |
For the downscaling we should investigate the custom scheduler we use and if something has changed there. It used to work well :-/ An alternative we've discussed is using node preferences (similar to what we do for "sticky" build pods) where we work out the "least busy node" and then add a anti-preference for that node to a pod when creating it. |
If absolutely necessary and of help, GESIS would also be able to contribute ~$5k via Linode. For the image repository, we could use our existing ones. |
Somewhat off-topic but also not: does anyone know why https://jupyter.org/try links to https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/HEAD?urlpath=lab/tree/demo which currently doesn't build? I thought this must have been a recent commit that broke it but turns out the last commit was in mid October. Maybe no one will notice/complain if we don't provide a seamless transition/switch it over to something a bit different (jupyterlite) given that it seems to have been broken for a while now? |
@betatim nope I don't know who controls that repo or that link, but definitely agree that this is another reason to just use JupyterLite. Here's the issue @minrk brought up to discuss that: jupyter/jupyter.github.io#513 To try and prep for that, I just made a PR to the JupyterLite docs to add a more "introductory" notebook for their links: jupyterlite/jupyterlite#432 |
Looks like we got some interim credits from Google just in time, so we have a little slack while we work on the more permanent solution. |
I think a dependency must have updated out from under it. When I tested, it ran fine on GKE and didn't need a build. This is probably in our top 2 most popular images, so I think folks would notice. It must be getting assigned to turing more often with the recent changes, where it wasn't in the cache already. pyyaml 6.0 recently dropped support for the long-deprecated, but still widely used |
As @minrk mentioned - we just got $10,000 of Google Cloud credits deposited into the same GCP billing account, and they expire in 6 months. This means that we don't need to make any chances and the service will keep running. I'd like to write a short blog post about this episode to give transparency to our user community about what happened and what we're doing to try and improve things in the future. I imagine something like:
Does anybody object to that plan? I'll try to get a draft ASAP while the experience is still fresh. |
Hadn't thought about dependencies changing :-/ I think a blog post is a good idea. I would lead with (5) though instead of a chronological/experience report order. My reasoning is that (5) is the most important thing out of all this for the reader and what we'd like the reader to help us with. This in turn makes me wonder what we are looking for and if we can express that in a couple of sentences. Some properties that I think we want: (0) someone who enjoys fundraising (1) multi year (2) GKE and (3) <$10000 per month for sure, maybe $5000 per month in credits. (4) a call to action "contact us via this thing" if you can help with funding or the effort to secure funding. Other things we could be doing: on board more federation members, increase the capacity at existing federation members. But I would put them in a separate blog post or further down in this post to focus on the above give points as the one thing people remember. I would lead with our ask/next steps because no one reads stuff on the internet and even those who read, should get the most important point first. Before writing I think we should sharpen what we are asking for and how people can reach us so that we have a very concrete idea for both. This will allow us to write a clear article with a concrete call to action. And means in the team we have alignment on it. |
Following on from @betatim's last point, also work out how much time (if any) we can devote to cases where compute can be contributed but not people (thinking of jupyterhub/mybinder.org-deploy#1772) |
I think for things like jupyterhub/mybinder.org-deploy#1772 we will not find out if they are a net good/bad without trying it. But someone has to have time and drive to keep moving it forward. |
Regarding a blog post, I was thinking of just a minimal "what happened, what we did to resolve it, and thank you Google" post rather than a more future-looking post. My reasoning is that I worry raising the bar too much will increase the likelihood that no post will happen at all. I definitely agree it's a good idea to come up with an action plan, call to action, etc but do we have bandwidth to do this? Is anybody willing to champion this? I do think the Google team is interested in meeting further in January to find a more sustainable solution, which is where I'm going to put my cycles if I have them. |
Is there a way to help with that? More generally I am wondering what the next steps are here. |
Thanks for following up @betatim - a few updates: Blog post / CTAYesterday I put together a short draft that tried to incorporate some of the ideas shared above. I added a section for "what we need / what you can do" but I suspect it'll need a bit of iteration: https://docs.google.com/document/d/1A2TDXlQ1ap1dM7ek2gRRfSL9O6xudgNgOQqNW3LUwPo/edit?usp=sharing If folks are interested in having a dedicated brainstorm to think about sustainable pathways forward, I'd be happy to so. Google creditsI got in touch with Karan yesterday to check up on the status of the credit request we had originally put in. He said they'd like to meet on Thursday to discuss more sustainable pathways forward. He put a meeting on my calendar for |
Update from conversation with GoogleJust had a quick meeting with a few folks from Google Research. Here are the notes: https://docs.google.com/document/d/1W5q3WLeT_sviLrW0zhpmo5DfPww8sA0w7tlBCeeAa6o/edit?usp=sharing
They're going to start the process for 1 right now, and for 2 we'll need to do two things:
I like the idea of defining for ourselves what sponsorship means, and then reaching out to Google (or others) for feedback and requests to sponsor us at particular levels. I think that might be a way that we can grow the network of sponsors beyond just Google. What do others think? |
We've had a few conversations here and in the Matrix channel about sustainability opportunities to explore. Rather than ballooning this issue into a long thread about sustainability, I decided to update the top comment of #430 so that it captures a few of the ideas we've discussed for longer-term sustainability efforts. Are folks OK taking the long-term sustainability conversation there, so we can focus this thread around extending our credit runway with Google in the short-term? |
I'm going to close this one, as we have a resolution in jupyterhub/mybinder.org-deploy#2138 and we've also got a longer-term plan for credits in these two issues: |
Debrief from Mary @ GoogleI had a quick phone call to understand from one of the OSPO people what happened this time around. Here are some quick takeaways from that conversation:
I'll set a personal reminder to start asking Google for credits again in September, and will also connect Karan with her so that they can do some information sharing as well. Just wanted to update y'all! |
Super useful, thanks for chasing that down, @choldgraf! |
Proposed change
Our annual allotment of credits for
gke.mybinder.org
runs out in late December (I believe, December 22nd 2021). We won't spend the credits down to 0 at that time, but they will expire on that date.We need to identify where another round of funding for
gke.mybinder.org
will come from.Draft of two pager
See here for a draft two-pager to send to karan
Action plan
We have 3 known options to power gke.mybinder.org:
Jupyter Meets the Earth grant, if we can get approval by @fperez depending on whether they're in scope- this is likely not an option because it only runs on AWShere's the current plan:
gke.mybinder.org
(unless somebody wants to foot the bill, I cannot pay for this deployment from my credit card again). This will reduce mybinder.org's capacity by about 75%, but I think that's just the reality that we face. Unless somebody has access to a large amount of Google Cloud credits that we can hook into gke.mybinder.org, I'm not sure what else we can do.Regardless of all this, we need to release a blog post about the current situation, because it is clearly unsustainable (at least, for me it is unsustainable, and I assume for others as well)
Tasks to complete
The text was updated successfully, but these errors were encountered: