Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert threads usage; "add executor to validate_payload" #676

Closed
wants to merge 1 commit into from

Conversation

astrand
Copy link
Contributor

@astrand astrand commented Nov 14, 2024

run_in_executor() is in practice just a fancy name for creating threads. However, this library is based on asyncio; not threads. Introducing threads into an application has some serious consequences. For example, it is no longer safe to use os.fork() (or subprocess with preexec_fn). Thus with threads, this library can be incompatible with many existing applications, preventing updates.

@astrand
Copy link
Contributor Author

astrand commented Nov 14, 2024

Also note that version is currently 2.0.0-rc.2 - introducing threads between two release candidates does not make sense at all. Such a change would rather mandate a major version increase, like 3.Y.Z.

@jainmohit2001
Copy link
Collaborator

Hi @astrand, correct me if I am wrong.
Python doesn't support non-blocking file I/O. The only way to do this is by using threads. Even aiofiles does this by delegating the work to a thread pool. I understand your concern of introducing threads in an async python application. If we were to tasked with introducing non-blocking file I/O, we have to work with threads or processes either way.

@drc38
Copy link
Contributor

drc38 commented Nov 14, 2024

run_in_executor() is in practice just a fancy name for creating threads. However, this library is based on asyncio; not threads. Introducing threads into an application has some serious consequences. For example, it is no longer safe to use os.fork() (or subprocess with preexec_fn). Thus with threads, this library can be incompatible with many existing applications, preventing updates.

Hi @astrand, can you explain where the issue arises in using the asyncio documented method for dealing with blocking IO https://docs.python.org/3/library/asyncio-dev.html

run_in_executor() is in practice just a fancy name for creating
threads. However, this library is based on asyncio; not
threads. Introducing threads into an application has some serious
consequences. For example, it is no longer safe to use os.fork() (or
subprocess with preexec_fn). Thus with threads, this library can be
incompatible with many existing applications, preventing updates.
@astrand
Copy link
Contributor Author

astrand commented Nov 14, 2024

Hi @astrand, correct me if I am wrong. Python doesn't support non-blocking file I/O. The only way to do this is by using threads. Even aiofiles does this by delegating the work to a thread pool. I understand your concern of introducing threads in an async python application. If we were to tasked with introducing non-blocking file I/O, we have to work with threads or processes either way.

Python is a language that ideally should support all concurrency mechanisms supported by the platform. In particular, it should support Linux mechanisms such as subprocesses and select-based IO, without using threads.

As I mentioned, I think the problem we are trying to solve needs to be better described. open() typically does not block.

@astrand
Copy link
Contributor Author

astrand commented Nov 14, 2024

run_in_executor() is in practice just a fancy name for creating threads. However, this library is based on asyncio; not threads. Introducing threads into an application has some serious consequences. For example, it is no longer safe to use os.fork() (or subprocess with preexec_fn). Thus with threads, this library can be incompatible with many existing applications, preventing updates.

Hi @astrand, can you explain where the issue arises in using the asyncio documented method for dealing with blocking IO https://docs.python.org/3/library/asyncio-dev.html

Just because an API is documented, does not mean that there are no issues with it. See https://docs.python.org/3/library/os.html:

Even in code that appears to work, it has never been safe to mix threading with os.fork() on POSIX platforms.

Also note that the asyncio example is a CPU bound calculation, not a simple open().

There's a reason for not being able to use normal file descriptions with UNIX/Linux select() - it is typically not necessary since the local file system is considered fast IO.

@jainmohit2001
Copy link
Collaborator

Can we get a sample repro, which would help us understand how os.fork() or any other command might break the validate_payload?

@astrand
Copy link
Contributor Author

astrand commented Nov 14, 2024

Can we get a sample repro, which would help us understand how os.fork() or any other command might break the validate_payload?

It is the other way around. The use of run_in_executor() prevents using os.fork() and preexec_fn. With this patch to the examples:

+++ b/examples/v16/central_system.py
@@ -26,6 +26,14 @@ class ChargePoint(cp):
     def on_boot_notification(
         self, charge_point_vendor: str, charge_point_model: str, **kwargs
     ):
+        import os
+        # If there are threads, os.fork() on Python 3.12 will raise
+        # DeprecationWarning, and cannot use subprocess.Popen with
+        # preexec_fn
+        child = os.fork()
+        if child == 0:
+            # Child
+            os._exit(0)
         return call_result.BootNotificationPayload(
             current_time=datetime.utcnow().isoformat(),
             interval=10,

You will get:
/home/astrand/tmp/ocpp/examples/v16/central_system.py:33: DeprecationWarning: This process (pid=732646) is multi-threaded, use of fork() may lead to deadlocks in the child.

@drc38
Copy link
Contributor

drc38 commented Nov 14, 2024

Thanks @astrand, an interesting discussion on the issue with using os.fork() by the dev team https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555/14

@drc38
Copy link
Contributor

drc38 commented Nov 15, 2024

Thanks @astrand, an interesting discussion on the issue with using os.fork() by the dev team https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555/14

I see python 3.13 uses spawn as the default start method for multiprocessing, assume because of this issue with fork.

@astrand
Copy link
Contributor Author

astrand commented Nov 17, 2024

Thanks for the links. This is a tricky issue, and I don't think upstream Python is paying enough attention to the issue.

  • As pointed out above, there are issues with using both threads and fork()
  • Most large UNIX applications needs to call other executables. Thus, subprocesses and fork() must be used.
    This means that the only solution in such cases which allows fine grained control over subprocesses is to avoid threads. Python seems to go in the opposite direction and more or less forcing the use of threads. This is unfortunate.

In any case, I investigated the issue with the validator. It is not the open() call that takes time, it is this call in messages.py:

validator.validate(message.payload)

On my test CS, the call takes up to 150 ms, with some fairly normal OCPP communication. I guess this could be a problem in some cases. For reference, the default value of loop.slow_callback_duration is 100 ms. With the product I am working on, however, we have a separate process for time critical code, so we prefer that the validation is synchronous.

If jsonschema had an API which allowed incremental validation, the problem could be solved by yielding to the event loop during the validation. But apparently it does not.

Anyhow, it should be possible to disable the use of threads for the validation; if this feature should be kept. At least two solutions are possible:

  • Move run_in_executor call to common method like async_validate_payload in message.py (Don't Repeat Yourself - it is currently called in 4 different places!). Then, we can make a configuration mechanism to enable/disable use of run_in_executor. For example, we can have a module variable that indicates if run_in_executor should be used or not. Another idea is to extend _skip_schema_validation so that it can have a 3rd value (in addition to True and False) which indicates that async validation should be used. One argument for this is that the time it takes to do the validation depends on which message you are dealing with.

  • Another solution outside the ocpp library is to change the default executor to an synchronous one. Again, it is strange the upstream Python does not provide this, but it is easy to create one:

class SyncExecutor(ThreadPoolExecutor):
    def __init__(self, *args, **kwargs):
        super().__init__()

    def submit(self, fun, *args, **kwargs):
        future = Future()
        try:
            future.set_result(fun(*args, **kwargs))
        except Exception as exc:
            future.set_exception(exc)
        return future

loop = asyncio.get_running_loop()
loop.set_default_executor(SyncExecutor())

We could perhaps even include this in our example code.

@drc38
Copy link
Contributor

drc38 commented Nov 17, 2024

Thanks @astrand, I'd be comfortable with the first solution you've proposed.

@astrand
Copy link
Contributor Author

astrand commented Nov 18, 2024

New suggested solution at #678 .

@astrand astrand closed this Nov 18, 2024
jainmohit2001 pushed a commit that referenced this pull request Dec 7, 2024
See discussion at #676

---------

Co-authored-by: Peter Astrand <[email protected]>
Co-authored-by: Patrick Roelke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants