Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: Segmentation fault on 0.11.0.post2 #128

Closed
hnrhonkala88 opened this issue Oct 20, 2021 · 9 comments · Fixed by #130
Closed

Fatal Python error: Segmentation fault on 0.11.0.post2 #128

hnrhonkala88 opened this issue Oct 20, 2021 · 9 comments · Fixed by #130

Comments

@hnrhonkala88
Copy link

Hi,

My python tests started to fail after update to hidapi 0.11.0.post2 -version. version 0.10.1 seems to work fine.

Initially the problem was detected on our CI, where docker dependencies were automatically updated and suddenly all our CI runs were just jammed (did not exit at all) but the root cause finding took some time for us. Anyway, it seems that it's hidapi v0.11.0.post2 that caused this issue.

Now I was finally able to get some logs on my local virtual machine setup on :
ubuntu 20.04.1
python 3.7.10

Main problems:

  • Test run with docker jamms forever with never going out without killing the process (Some thread obviously stays running there and won't stop)
  • Local machine giving a Segmentation fault

We are using the Yepkit's pykush tool, https://github.com/Yepkit/pykush
Failure that we receive is:

Fatal Python error: Segmentation fault

Thread 0x00007f8d191d2180 (most recent call first):
  File "..../pykush.py", line 101 in __del__ ---> (            self._devhandle.close()  in pykush.py)

Segmentation fault (core dumped)

I was able to find following failure with gdb:

Core was generated by `.......venv/bin/python3.7 ....../venv/bin/pyt'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8d1953424b in ?? ()
[Current thread is 1 (LWP 4698)]

Please note that pykush version has been same all the time.

@prusnak
Copy link
Member

prusnak commented Oct 20, 2021

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8d1953424b in ?? ()

You need to compile with debug symbols (-O0 -ggdb) in order to produce more meaningful stacktrace.

@bearsh
Copy link
Contributor

bearsh commented Nov 1, 2021

I also get a segfault using any version >= 0.11. If I revert b94c8a0 everything works. The problem is the atexit.register() call.
I suggest to partly revert that commit...

@prusnak
Copy link
Member

prusnak commented Nov 1, 2021

@bearsh did you call the close() method on all your devices before exit was called?

@prusnak
Copy link
Member

prusnak commented Nov 1, 2021

@hnrhonkala88 same here - https://github.com/Yepkit/pykush/blob/1ee65a7646acc14680321ea9d0a69f22a60dcd69/pykush/pykush.py#L223-L226 - the del destructor is not guaranteed to be called in python; so I guess you need to call the close method explicitly

@bearsh
Copy link
Contributor

bearsh commented Nov 1, 2021

@bearsh did you call the close() method on all your devices before exit was called?

no I did not. once I close the device it doesn't segfault anymore.
Anyway, what's the benefit of atexit.register(hidapi_exit)? Once the script exits, the application is likely to exit as well, so the the allocated memory gets freed anyway. furthermore it breaks existing applications...

@prusnak
Copy link
Member

prusnak commented Nov 1, 2021

I think it is better to fix the faulty applications that do not close the descriptors.

Since the segfault occurs at the end of the application run, it is not harmful anyway and as you say, the memory gets freed anyway.

@prusnak prusnak closed this as completed Nov 1, 2021
@jonasmalacofilho
Copy link
Contributor

Since the segfault occurs at the end of the application run, it is not harmful anyway and as you say, the memory gets freed anyway.

I see your point, as well as the point of the commit that introduced the change.

However, while a segfault is fairly harmless, this breaks memory safety. And the segfault isn't guaranteed.

Another thing to consider is that the Python interpreter may have been embedded in another software. In that case, atexit might not run just when the (rest of the) application is about to exit.

jonasmalacofilho added a commit to jonasmalacofilho/cython-hidapi that referenced this issue Nov 2, 2021
Replace atexit.register() with weakref.finalize(), since that should
guarantee that there are no remaining references to the module (and,
thus, HIDAPI) when it runs.

Related: trezor#128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
jonasmalacofilho added a commit to jonasmalacofilho/cython-hidapi that referenced this issue Nov 2, 2021
hid_exit() is only safe to call once every hid_device has been closed.

Fixes: trezor#128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
@jonasmalacofilho
Copy link
Contributor

jonasmalacofilho commented Nov 2, 2021

@bearsh, @hnrhonkala88,

I tried to improve the situation in PR #130. Can you give it a try, with and, more importantly, without explicitly calling close()?


Essentially, we cannot call hid_exit() unless we know that all devices have been closed and that no [strong] reference to the module remains. This is particularly serious when the LibUSB backend is being used.

The only simple way I found to fix the issue is to use weakref.finalize (Python 3.4+) on both device instances and the module itself. This makes calling close() optional (although still recommend in many scenarios).

I have also turned our public hidapi_exit() API into a no-op: it had been added to expose hid_exit(), but incorrectly calling it breaks Python's memory safety, and this type of concern is not something cython-hidapi callers should have to deal with.

I also think we should hide hidapi_exit(): calling it incorrectly breaks Python's memory safety, and this type of concern is not something cython-hidapi callers should have to deal with. But I haven't done that just yet.[^1]

[^1]: Technically, removing hidapi_exit() would be a breaking change. But IMO it is a necessary change to fix a safety bug, and I don't expect more than a few people to be using that function already.

@hnrhonkala88
Copy link
Author

hnrhonkala88 commented Nov 2, 2021

Thanks everyone about having this discussion and let me understand it more.

@jonasmalacofilho Thanks, I will try to give it a try as soon as I can.

EDIT: The fix in your PR seems to work!

jonasmalacofilho added a commit to jonasmalacofilho/cython-hidapi that referenced this issue Nov 2, 2021
Calling hid_exit() can lead to memory bugs if care is not taken.  This
has been observed as segmentation faults with the LibUSB backend, where
hid_exit() calls libusb_exit(), which in turn has a well documented
contract:

> Deinitialize libusb. Should be called after closing all open devices
> and before your application terminates.

While the Linux HIDRAW backend appears to be more tolerant, there may be
other backends that behave like LibUSB, or even have additional
prerequisites for calling hid_exit().

On its part, HIDAPI is not clear about the requirements of hid_exit():

> This function frees all of the static data associated with HIDAPI. It
> should be called at the end of execution to avoid memory leaks.

Because of these reasons, there is no alternative but to threat
hid_exit() as a very unsafe function.

As such, any wrapper we provide for it – in practice, hidapi_exit() –
would also have to be made safe.  Unfortunately, doing that is
non-trivial, and possibly quite expensive.

Instead, turn hidapi_exit() into a no-op for now.  The HIDAPI library
will still be cleanly finalized before the program terminates, through
another – automatic – mechanism.

Related: trezor#128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
jonasmalacofilho added a commit to jonasmalacofilho/cython-hidapi that referenced this issue Nov 2, 2021
It was not supposed to be used explicitly in the first place (see
comments on the PR this commit is a part of).  The HIDAPI library will
still be cleanly finalized before the program terminates, through
another – automatic – mechanism.

Additionally, the only currently known caller is using it as a
workaround for another issue (trezor#114), and that use case would not be
supported by the no-op compatibility shim anyway.

Related: trezor#114 ("Cannot reenumarate devices in same python process")
Related: trezor#128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
prusnak pushed a commit that referenced this issue Nov 2, 2021
Replace atexit.register() with weakref.finalize(), since that should
guarantee that there are no remaining references to the module (and,
thus, HIDAPI) when it runs.

Related: #128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
prusnak pushed a commit that referenced this issue Nov 2, 2021
hid_exit() is only safe to call once every hid_device has been closed.

Fixes: #128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
prusnak pushed a commit that referenced this issue Nov 2, 2021
Calling hid_exit() can lead to memory bugs if care is not taken.  This
has been observed as segmentation faults with the LibUSB backend, where
hid_exit() calls libusb_exit(), which in turn has a well documented
contract:

> Deinitialize libusb. Should be called after closing all open devices
> and before your application terminates.

While the Linux HIDRAW backend appears to be more tolerant, there may be
other backends that behave like LibUSB, or even have additional
prerequisites for calling hid_exit().

On its part, HIDAPI is not clear about the requirements of hid_exit():

> This function frees all of the static data associated with HIDAPI. It
> should be called at the end of execution to avoid memory leaks.

Because of these reasons, there is no alternative but to threat
hid_exit() as a very unsafe function.

As such, any wrapper we provide for it – in practice, hidapi_exit() –
would also have to be made safe.  Unfortunately, doing that is
non-trivial, and possibly quite expensive.

Instead, turn hidapi_exit() into a no-op for now.  The HIDAPI library
will still be cleanly finalized before the program terminates, through
another – automatic – mechanism.

Related: #128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
prusnak pushed a commit that referenced this issue Nov 2, 2021
It was not supposed to be used explicitly in the first place (see
comments on the PR this commit is a part of).  The HIDAPI library will
still be cleanly finalized before the program terminates, through
another – automatic – mechanism.

Additionally, the only currently known caller is using it as a
workaround for another issue (#114), and that use case would not be
supported by the no-op compatibility shim anyway.

Related: #114 ("Cannot reenumarate devices in same python process")
Related: #128 ("Fatal Python error: Segmentation fault on 0.11.0.post2")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants