Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc to explain multithreading #1154

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions content/develop/concepts/architecture/threading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
title: Threading in Streamlit
slug: /develop/concepts/architecture/threading
---

# Threading in Streamlit

While building with Streamlit may feel like magic, the things beneath are still plain Python objects. This means the use of threads to improve performance and responsiveness still applies to Streamlit. However it can be tricky to start more threads from your code. This guide is meant to help do threading right in Streamlit.

Before reading on, you are advised to check [architecture](/develop/concepts/architecture/architecture) and [session-state](/develop/concepts/architecture/session-state) first.

## Threads created by Streamlit

A `streamlit run` process creates 2 types of threads:

- Main thread: runs the web (HTTP + WebSocket) server
- Script thread: runs page code when triggered (by page view or UI interactivity)

This is an oversimplifed and inaccurate illustration to show the creation of Streamlit threads:

```py
from threading import Thread
from streamlit.somewhere import WebSocketServer, ScriptRunContext

# created once per process, runs on main thread
class StreamlitServer(WebSocketServer):
def on_websocket_connection(self, conn):
# assuming 1 connection bounds to exactly 1 session
session = Session()
conn.on_page_run_message(lambda message: session.on_page_run_message(conn, message))


# created for each session
class Session()
def on_page_run_message(self, conn, message):
script_thread = ScriptThread(conn=conn, page_file=message.page_to_run, session=self)
# attach the context object,
# it can be used inside script thread like getattr(current_thread(), "secret..")
setattr(script_thread, "secret_runner_context", ScriptRunContext(session))
script_thread.start()


# created for each page run
class ScriptThread(Thread):
def __init__(self, conn, page_file, session):
self.conn = conn
self.page_file = page_file

def run(self):
with open(self.page_file) as f:
page_code = f.read()
ui_state = eval(page_code)
self.conn.send_ui_state(ui_state)
# on the other end of WebSocket connection,
# frontend receives the state and updates UI


StreamlitServer().listen()
```

## `missing ScriptRunContext!` or `streamlit.errors.NoSessionContext`

Since you are reading this page, chances are that you have already noticed such messages.

Many Streamlit APIs, including `st.session_state` and multiple builtin widgets, expect themselves to run on a ScriptThread. Such APIs are typically related to per-session or per-page-run internal states.

In a happy scenario, such code finds the `ScriptRunContext` object attached to the current thread (like in the illustriial code above). But when such Streamlit APIs couldn't, they issue such warnings or errors.

## Custom threads

An effective mitigation to delay, is to create threads and let them work concurrently. This works especially well with IO-heavy operations like remote query or data load.

But due to the reasons you read by far, interacting with Streamlit code from your thread can be quirky. In this section we introduce 2 patterns to let different threads work together.

Note: they are only patterns rather than complete solutions. You are advised to think them as an idea to start with. For example, one could extend pattern 1 into using a `concurrent.futures.ThreadPoolExecutor` thread pool.

### 1. Only call Stramlit code from script thread

Python threading provides ways to start a thread, wait for its execution, and collect its result. If we isolate custom thread from Streamlit APIs, everything should just work in order.

In the following example page, `main` runs on the script thread and creates 2 custom `WorkerThread`. After WorkerThread-s run concurrently, `main` collects their results and updates UI.

```py
import streamlit as st
import time
from threading import Thread

class WorkerThread(Thread):
def __init__(self, delay):
super().__init__()
self.delay = delay
self.return_value = None
def run(self):
# runs in custom thread, touches no Streamlit APIs
start_time = time.time()
time.sleep(self.delay)
end_time = time.time()
self.return_value = f"start: {start_time}, end: {end_time}"

st.header("t1")
result_1 = st.empty()
st.header("t2")
result_2 = st.empty()

def main():
t1 = WorkerThread(5)
t2 = WorkerThread(5)
t1.start()
t2.start()
t1.join()
t2.join()
# main() runs in script thread, and can safely call Streamlit APIs
result_1.write(t1.return_value)
result_2.write(t2.return_value)

main()

```

### 2. Expose context object to custom thread

Alternatively, one can let a custom thread have access to the `ScriptRunContext` attached to ScriptThread. This pattern is also used by Streamlit standard widgets like [st.spinner](https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/elements/spinner.py).
Copy link
Author

@jokester jokester Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I saw people are already going this way (in various GH issues), I'm not really sure about this pattern

  1. it exposes internal object to page writers

  2. it is less guaranteed. I don't know enough to say the probability where a ScriptRunContext suffices. The other pattern just looks "safer" to me because it assumes less from Streamlit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Officially, adding your own threads isn't supported. We don't want to include unsupported patterns in the main concepts area. This will likely go out as a Knowledge Base (KB) article. There is a subset of this information that can live in concepts, but it will need to be very carefully separated. For now, we can move this page to the KB so that we can get it published faster and I'll likely follow up with moving some of it back into concepts. (I still haven't read through any of it yet, so I still expect a few weeks before I get to this. Just a heads up about what will likely happen.)

In the longer run, the plan is to properly support multithreading and async tasks, but that work isn't currently on the schedule this quarter or next, so it wouldn't likely happen until next year at the earliest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I don't really want to promote the hack either.

When you come back, feel free to change or move things or ask me to do so 👍🏽


**Caution** this may not work with all Streamlit code. The previous pattern is safer in this way.

**Caution** `get_script_run_ctx` is meant to be called from a script thread, not a main or custom thread.

**Caution** when using this pattern, please ensure a custom thread that uses `ScriptRunContext` does not outlive the script thread. Leak of `ScriptRunContext` may cause subtle bugs.

In the following example page, a custom thread with `ScriptRunContext` attached can call `st.write` without a warning. (Remove a call to `add_script_run_ctx()` and you will see a `streamlit.errors.NoSessionContext`)

```py
import streamlit as st
from streamlit.runtime.scriptrunner import add_script_run_ctx, get_script_run_ctx
import time
from threading import Thread

class WorkerThread(Thread):
def __init__(self, delay, target):
super().__init__()
self.delay = delay
self.target = target
def run(self):
# runs in custom thread, but can call Streamlit APIs
start_time = time.time()
time.sleep(self.delay)
end_time = time.time()
self.target.write(f"start: {start_time}, end: {end_time}")

st.header("t1")
result_1 = st.empty()
st.header("t2")
result_2 = st.empty()

def main():
t1 = WorkerThread(5, result_1)
t2 = WorkerThread(5, result_2)
# obtain the ScriptRunContext of the current script thread, and assign to worker threads
add_script_run_ctx(t1, get_script_run_ctx())
add_script_run_ctx(t2, get_script_run_ctx())
t1.start()
t2.start()
t1.join()
t2.join()

main()
```