Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add readme #8

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maartenbreddels
Copy link
Member

cc @havok2063

I think it would be good if we explain how to use this package with voila. This example is very much focused on the use case at STScI, for @havok2063 , but at least this is a start.

@havok2063
Copy link

@maartenbreddels I'm now starting to dig into this. I've been following the instructions in your readme. I got everything setup, but when I get to the following step curl http://localhost:8866/ > /dev/null to profile the notebook, I see the following traceback in the voila console. Both the voila and curl terminals are running in the hotpot-km-test conda environment.

[Voila] Voila is running at:
http://localhost:8866/
[Voila] WARNING | Notebook test.ipynb is not trusted
[Voila] Kernel started: f5b21a2e-466e-48f5-859e-534b8b097ad7
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/Users/bcherinka/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/voila/threading.py", line 25, in run
    return ioloop_in_thread.run_until_complete(self._run())
  File "/Users/bcherinka/anaconda3/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/voila/threading.py", line 28, in _run
    async for item in self.fn(*self.args, **self.kwargs):
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/async_generator/_impl.py", line 366, in step
    return await ANextIter(self._it, start_fn, *args)
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/async_generator/_impl.py", line 197, in __next__
    return self._invoke(first_fn, *first_args)
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/async_generator/_impl.py", line 209, in _invoke
    result = fn(*args)
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/voila/exporter.py", line 119, in async_jinja_generator
    for output in self.template.generate(nb=nb_copy, resources=resources, **extra_context):
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 148, in template
    self._template_cached = self._load_template()
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/nbconvert/exporters/templateexporter.py", line 355, in _load_template
    return self.environment.get_template(template_file)
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/jinja2/environment.py", line 883, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/jinja2/environment.py", line 857, in _load_template
    template = self.loader.load(self, name, globals)
  File "/Users/bcherinka/anaconda3/lib/python3.7/site-packages/jinja2/loaders.py", line 429, in load
    raise TemplateNotFound(name)
jinja2.exceptions.TemplateNotFound: voila.tpl

[IPKernelApp] WARNING | Unknown error in handling startup files:

@maartenbreddels
Copy link
Member Author

voila.tpl is from a very old voila, it could be you are using an old template that still uses that. How did you start voila?

@havok2063
Copy link

Ahh yeah thanks for the reminder. A which voila pointed to /Users/bcherinka/anaconda3/bin/voila which was the problem. I refreshed the conda environment and now things look like they're working.

@havok2063
Copy link

havok2063 commented Apr 26, 2021

So after first tests it seems to be working. Testing loading a 3d cube into Jdaviz via voila-embed on my local system reduced the loading time from 10-12 seconds down to ~4 seconds. A similar reduction occurs for 1d spectral data as well. I tried sitting here just refreshing the browser over and over. Most of the time I see no issues, but occasionally I see the traceback down below pop up in the voila terminal window. Is this just an issue with a missing dependency in my environment or something else?

If everything checks out the next step will be to get this deployed in our test environment to do some more performance tests. We did some performance tests with the original voila simulating users hitting the voila server and spinning up kernels. (I don't think any notebooks run in these tests). Here are the results from those tests

service parameters n_samples avg response time status
Voila 1 user, 10 requests 10 4 seconds ok
Voila 10 users, 10 requests each 100 16 seconds ok
Voila 30 users, 10 requests each 300 130 seconds 37% errors - 500

For example, 1 user submitting 10 requests to the voila server had an average response time of 4 seconds from the voila server. When we scaled that up to 30 users, 10 requests each, the average response time jumped to 130 seconds, and 37% of the responses errored out with a 500 status code. This is with a dockerized Voila setup running 3 server instances. So I'll be interested to see how this this hotpot improvement affects this table. Maybe you have some thoughts on these numbers.

Traceback

ERROR:tornado.application:Uncaught exception GET /voila/render/jdaviz_jwst.ipynb (::1)
HTTPServerRequest(protocol='http', host='localhost:8000', method='GET', uri='/voila/render/jdaviz_jwst.ipynb', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/voila/handler.py", line 154, in get
    async for html_snippet, resources in self.exporter.generate_from_notebook_node(notebook, resources=resources, extra_context=extra_context):
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/voila/exporter.py", line 100, in generate_from_notebook_node
    async for output in self.template.generate_async(nb=nb_copy, resources=resources, **extra_context, static_url=self.static_url):
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 35, in generate_async
    yield self.environment.handle_exception()
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/share/jupyter/voila/templates/embed/index.html.j2", line 1, in top-level template code
    {%- set kernel_id = kernel_start(nb) -%}
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 174, in auto_await
    return await value
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/voila/handler.py", line 169, in _jinja_kernel_start
    env=self.kernel_env,
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/nbclient/util.py", line 85, in ensure_async
    result = await obj
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/pooled.py", line 151, in start_kernel
    kernel_id = await self._pop_pooled_kernel(kernel_name, kwargs)
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/pooled.py", line 142, in _pop_pooled_kernel
    return  await self._update_kernel(kernel_name, fut, kwargs)
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/pooled.py", line 226, in _update_kernel
    kernel_id = await kernel_id_future
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/pooled.py", line 27, in _wait_before
    return await aw
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/pooled.py", line 282, in _initialize
    await client.execute(code)
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/client_helper.py", line 322, in execute
    self._check_raise_for_error(exec_reply)
  File "/Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/hotpot_km/client_helper.py", line 417, in _check_raise_for_error
    raise ExecutionError.from_msg(exec_reply_content)
hotpot_km.client_helper.ExecutionError: ---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-b627620c4905> in <module>
    480 for mod in modules:
    481     if not mod.startswith("setuptools"):
--> 482         importlib.import_module(mod)

~/anaconda3/envs/hotpot-km-test/lib/python3.7/importlib/__init__.py in import_module(name, package)
    125                 break
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128
    129

~/anaconda3/envs/hotpot-km-test/lib/python3.7/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/anaconda3/envs/hotpot-km-test/lib/python3.7/importlib/_bootstrap.py in _find_and_load(name, import_)

~/anaconda3/envs/hotpot-km-test/lib/python3.7/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'sip'```

@havok2063
Copy link

Another thought I had is when I start up Voila I'm seeing the initial spin up of the 3 kernels in the pool. However when I refresh the page with jdaviz loaded via voila-embed, I'm seeing voila initialize new kernels as opposed to using the ones in the pool. Is this the expected behaviour? And if so, are those new kernels also being pre-warmed? My guess is yes since I am seeing faster loading times but I just want to double check.

@maartenbreddels
Copy link
Member Author

if not mod.startswith("setuptools"):
--> 482 importlib.import_module(mod)

ModuleNotFoundError: No module named 'sip'```

This is a module not found, and what you can do is a try/catch around the import, since these come from my environment indeed.

You might sometimes see these because voila is connecting to the kernel really early on I think, and it stills 'sees' this exception.

However when I refresh the page with jdaviz loaded via voila-embed, I'm seeing voila initialize new kernels as opposed to using the ones in the pool.

That does not seem right, it should spin up 1 kernel to fill the pool again, and yes, they are pre-warmed.

@havok2063
Copy link

Comparing the terminal output and the browser console, it looks like it initially spins up a number of kernels specified in the config (e.g. 3). voila-embed connects to the one of those three kernels initially but also spins up a new kernel and adds it to the pool. On each page refresh, voila-embed connects to a kernel in the pool but also adds a new kernel to the pool, even if it connects to a different kernel. Is there a way to see how many kernels are in the pool at any given moment?

For example one page refresh displays this in the dev console

Starting WebSocket: ws://localhost:8000/api/kernels/7a62a935-ca79-4624-98a6-3c200a753b93
voila.js:393 Kernel: connected (7a62a935-ca79-4624-98a6-3c200a753b93)

but at the same time you get this in the voila terminal

[Voila] Kernel started: b75f2ec9-dff5-43ab-91db-bb475cbafe4d
[Voila] Initializing kernel: b75f2ec9-dff5-43ab-91db-bb475cbafe4d
WARNING:tornado.access:404 GET /voila/nbextensions/bqplot/index.js.map (::1) 1.78ms
WARNING:tornado.access:404 GET /voila/nbextensions/bqplot-image-gl/index.js.map (::1) 1.93ms

In this example, when I shutdown voila it shutdown 9 kernels in total, but I can't tell if they all belong in the same pool.

@havok2063
Copy link

Currently for production, in our use of voila-embed, we've been culling inactive kernels so the number of kernels does not grow very large. I think here, we might want to do something similar for kernels in or out of the pool. I noticed in mapping.py you disable culling kernels in the pool, but maybe you want a case where kernels in the pool can get culled but only down to the default number you started with, so you always have X number of persistence kernels.

@vidartf
Copy link
Collaborator

vidartf commented May 4, 2021

Currently for production, in our use of voila-embed, we've been culling inactive kernels so the number of kernels does not grow very large. I think here, we might want to do something similar for kernels in or out of the pool. I noticed in mapping.py you disable culling kernels in the pool, but maybe you want a case where kernels in the pool can get culled but only down to the default number you started with, so you always have X number of persistence kernels.

@havok2063 Normal culling rules should not be used for the pool. There shouldn't really be any cases where the pool grows too large (if you can replicate such a state, that would be a bug). If you change the pool size during run time somehow, there is logic to scale down the current pool here:

def unfill_as_needed(self):
"""Kills extra kernels in pool"""
tasks = []
loop = ensure_event_loop()
for name, target in self.kernel_pools.items():
pool = self._pools.get(name, [])
self._pools[name] = pool
for i in range(len(pool) - target):
task = loop.create_task(await_then_kill(self, pool.pop(0)))
self._discarded.append(task)

@vidartf
Copy link
Collaborator

vidartf commented May 4, 2021

On each page refresh, voila-embed connects to a kernel in the pool but also adds a new kernel to the pool, even if it connects to a different kernel.

What you describe seems to me to be the intended behavior. If that is not what you would expect, would you mind outlining what you expected, and maybe adding some details about why such a behavior would be preferable?

@havok2063
Copy link

@vidartf I apologize as I'm not as familiar with the pooling manager. Is the pool meant to grow indefinitely or stay at a fixed size? In my tests I grew the pool size from 3 kernels to 10 very quickly. Our use case has lots of users navigating to a website which has voila embedded in it. Our biggest worry on our side for production is excessive pool growth, having kernels spun up that linger around for a while taking up system resources. We were using the culling parameters to control how those kernels get killed off.

Naively I expected voila-embed to use the existing kernels in the pool rather than spin up new ones, but, as you say, I think the behavior that the pool scales up is ok, and probably desirable. In that case I would prefer one of two options: either a setting to control the maximum number of kernels in the pool, e.g. 100, or an option where the pool scales itself back down to x number of kernels. So for example, perhaps the pool starts with 3 pre-warmed kernels, scales up 100 kernels during some peak, but after periods of inactivity, the pool scales itself back down to 10 kernels.

Maybe this behavior is already covered with unfill_as_needed, fill_if_needed, and elsewhere? In which case I think just some documentation on how the pool works, how to control scalability, what the configuration parameters are, etc, would be very helpful.

@vidartf
Copy link
Collaborator

vidartf commented May 13, 2021

@havok2063 Maybe it would be good to agree on some wording so we do not talk past each other. For me:

  • Pool means: Kernels that have been started, but have not run a notebook.
  • Active kernels: Kernels that are currently running a notebook, or have completed running the notebook and are still in-use by the user (not shut down).

With that definition, kernels will go: pool, -> active -> shut down.

The number of kernels that are in the pool should not increase over time, but the number of active kernels might (especially if you are not careful about shutting down / culling kernels after they are no longer in use). If you are seeing an increase in the number of kernels in the pool, this is a bug, and we should try to get a way to reproduce it. If you are seeing an increase in the number of active kernels, I would recommend considering:

  • Are there actually a lot of concurrent users, or are these old kernels that have not been shut down correctly?
  • Do you have kernel culling configured? Should you use different culling timeouts?
  • If you want to prevent limitless growth, set the max_kernels trait and see if this helps. Note: If max_kernels are reached, the app will simply fail to load for users, so this should be a last resort in protecting the app from overusing resources.

@havok2063
Copy link

havok2063 commented May 13, 2021

@vidartf Ok, so using the defined language ... we've been using https://github.com/mariobuikhuizen/voila-embed to embed our notebooks into webpages. In the original voila, there are no pooled or active kernels when the server is started. Each site visit activates a new kernel with a one-to-one mapping between the client and the server.

Start Voila

 voila
[Voila] Using /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T to store connection files
[Voila] Storing connection files in /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T/voila_xadgwaw1.
[Voila] Serving static files from /Users/bcherinka/anaconda3/envs/havokve/lib/python3.8/site-packages/voila/static.
[Voila] Voilà is running at:
http://localhost:8000/

Loading the browser page activates a kernel and it displays the following in the browser console Starting WebSocket: ws://localhost:8000/api/kernels/f0a2a872-c858-46e2-a0be-de3197eb6895. Listing the directory contents of the "stored connection files" displays 1 kernel JSON file.

Another site visit produces Starting WebSocket: ws://localhost:8000/api/kernels/97b6f39c-ac65-4dc4-bd67-ad82cf1f41c2 which matches the server terminal output [Voila] Kernel started: 97b6f39c-ac65-4dc4-bd67-ad82cf1f41c2, and the directory content now has 2 kernel JSONs. Both are active kernels, as one notebook is running, the other is no longer running. To limit the number of active kernels we use

MappingKernelManager
  .cull_idle_timeout = 120
  .cull_interval = 60

to check for inactive kernels every minute and cull them if they've been idle for 2 mins.

When I switch to using the hotpot, I see that 3 kernels get pooled on voila startup. These are in the pool as they've been started but not run.

 voila
[Voila] Using /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T to store connection files
[Voila] Storing connection files in /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T/voila_wxlv4l_4.
[Voila] Serving static files from /Users/bcherinka/anaconda3/envs/hotpot-km-test/lib/python3.7/site-packages/voila/static.
[Voila] Kernel started: f6278106-8c38-458b-b6d8-8ee4212496f9
[Voila] Initializing kernel: f6278106-8c38-458b-b6d8-8ee4212496f9
[Voila] Kernel started: 19caf5d1-b54c-4022-8568-d0e3ce04069d
[Voila] Initializing kernel: 19caf5d1-b54c-4022-8568-d0e3ce04069d
[Voila] Kernel started: e7dc3a7a-a650-4cb2-9272-a0e478db8c8c
[Voila] Initializing kernel: e7dc3a7a-a650-4cb2-9272-a0e478db8c8c
[Voila] Voilà is running at:
http://localhost:8000/

Listing the file directory shows 3 kernel JSON files, in the pool.

ll /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T/voila_wxlv4l_4

 263B May 13 13:58 kernel-f6278106-8c38-458b-b6d8-8ee4212496f9.json
 263B May 13 13:58 kernel-19caf5d1-b54c-4022-8568-d0e3ce04069d.json
 263B May 13 13:58 kernel-e7dc3a7a-a650-4cb2-9272-a0e478db8c8c.json

Now when a user visits the website, we see a discrepancy occur between the client and the server. The browser displays Starting WebSocket: ws://localhost:8000/api/kernels/f6278106-8c38-458b-b6d8-8ee4212496f9 while the voila server terminal displays

[Voila] Kernel started: 21669019-0cf0-4498-9bc1-891df7cb0064
[Voila] Initializing kernel: 21669019-0cf0-4498-9bc1-891df7cb0064

So I think the browser is activating one of the original kernels in the pool, but it seems like it's also creating a new separate kernel. I don't know if this kernel (2166..) is considered active or simply in the pool, but presumably it is being added to the pool since the browser is actively connecting to a different kernel. The file directory now displays 4 kernel JSON files

ll /var/folders/js/s2zfzbv16019vjh0dft_l_w0000263/T/voila_wxlv4l_4
total 32
 263B May 13 13:58 kernel-f6278106-8c38-458b-b6d8-8ee4212496f9.json
 263B May 13 13:58 kernel-19caf5d1-b54c-4022-8568-d0e3ce04069d.json
 263B May 13 13:58 kernel-e7dc3a7a-a650-4cb2-9272-a0e478db8c8c.json
 263B May 13 13:58 kernel-21669019-0cf0-4498-9bc1-891df7cb0064.json

A new page visit, or second page refresh does the same thing, Starting WebSocket: ws://localhost:8000/api/kernels/19caf5d1-b54c-4022-8568-d0e3ce04069d but also creating a new kernel

[Voila] Kernel started: f441f2d8-2486-4011-b601-f95fbe19cf80
[Voila] Initializing kernel: f441f2d8-2486-4011-b601-f95fbe19cf80

This seems like a bug to me, but I don't know if it's in hotpot or voila-embed. So the number of kernels is growing. The browser always activates kernels in the order of the "pool", i.e. after 5 page visits. Is this the expected behavior?

Starting WebSocket: ws://localhost:8000/api/kernels/f6278106-8c38-458b-b6d8-8ee4212496f9
Starting WebSocket: ws://localhost:8000/api/kernels/19caf5d1-b54c-4022-8568-d0e3ce04069d
Starting WebSocket: ws://localhost:8000/api/kernels/e7dc3a7a-a650-4cb2-9272-a0e478db8c8c
Starting WebSocket: ws://localhost:8000/api/kernels/21669019-0cf0-4498-9bc1-891df7cb0064
Starting WebSocket: ws://localhost:8000/api/kernels/f441f2d8-2486-4011-b601-f95fbe19cf80

You mention in a previous comment that we should not use normal culling rules with the pool. I took this to mean that we should not use the cull_interval, etc parameters. Is that the case? And if so, what is the suggested way of culling inactive kernels? We will have both lots of concurrent users, as well as inactive kernels (once those users close the browser tab).

@vidartf
Copy link
Collaborator

vidartf commented May 14, 2021

Thanks for the details. So when you are connecting to the server with the pool, this happens:

  • One kernel is taken out of the pool, and it becomes an active kernel for the user connecting.
  • Another kernel is prewarmed and added to the pool to replace the kernel that was taken out of the pool.
  • In this way, the number of kernels in the pool stays the same.

You mention in a previous comment that we should not use normal culling rules with the pool. I took this to mean that we should not use the cull_interval, etc parameters. Is that the case?

Active kernels should be culled, but the prewarmed kernels in the pool should not (by definition they will be idle until a user connects). Therefore, we have added code to this package so that the culling rules do not apply to the pool, meaning you should use the culling config as normal:

async def cull_kernel_if_idle(self, kernel_id):
# Ensure we don't cull pooled kernels:
# (this logic assumes the init time is shorter than the cull time)

@havok2063
Copy link

@vidartf Thanks for the clarification. That's helpful. I did a quick test where I started a new voila instance with cull_interval: 60 and cull_idle_timeout: 120 , which spun up 3 kernels in the pool. I then simulated users visiting the site, creating up to 10 active kernels, then closed them down. All 10 kernels were culled and none were left in the pool. All kernel JSON files were removed from the directory of stored connections. Simulating another user accessing the site fails to activate another kernel, and generates an error when attempting to run the notebook. Maybe this is a bug with how voila_embed interacts with hotpot?

@vidartf
Copy link
Collaborator

vidartf commented May 18, 2021

And this was with the latest release (0.2.2)? Would you mind capturing the output with --debug ?

@havok2063
Copy link

Ahh no I just checked and was still running 0.1.1. I updated, and re-tested and it looks like things are working as expected. I started the pool with 3 kernels. Activated up to 10. And it culled kernels back down to 4.

@marscher
Copy link

Is the environment for the pooled kernels hard coded? Somehow the manager/ipykernel tries to execute python from a "test" environment, which has not been configured within this document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants