Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Python tests on MacOS platform #57

Open
dsuhinin opened this issue Oct 10, 2024 · 2 comments
Open

Fix Python tests on MacOS platform #57

dsuhinin opened this issue Oct 10, 2024 · 2 comments

Comments

@dsuhinin
Copy link
Collaborator

No description provided.

@dsuhinin dsuhinin converted this from a draft issue Oct 10, 2024
@nojaf
Copy link
Collaborator

nojaf commented Nov 7, 2024

This might be low hanging fruit at this point.

@nojaf
Copy link
Collaborator

nojaf commented Dec 16, 2024

This is no low-hanging fruit.

There’s an issue with the native Python SQLite code when our Go binary is used. It’s difficult to pinpoint why this is happening, as our Go code utilizes its own SQLite connector.

The problem in Python stems from a pointer being freed twice. We observe the following stack trace:

* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
 * frame #0: 0x0000000103c7225c libpython3.12.dylib`sqlite3DbNNFreeNN + 152
  frame #1: 0x0000000103c720ac libpython3.12.dylib`sqlite3VdbeClearObject + 76
  frame #2: 0x0000000103c342c8 libpython3.12.dylib`sqlite3VdbeDelete + 40
  frame #3: 0x0000000103c3bb40 libpython3.12.dylib`sqlite3_finalize + 160
  frame #4: 0x000000010404a044 libpython3.12.dylib`stmt_dealloc + 52
  frame #5: 0x00000001040927c4 libpython3.12.dylib`lru_list_elem_dealloc + 184
  frame #6: 0x000000010387bdb0 libpython3.12.dylib`lru_cache_tp_clear + 420
  frame #7: 0x000000010387bbbc libpython3.12.dylib`lru_cache_dealloc + 80
  frame #8: 0x0000000104042548 libpython3.12.dylib`connection_clear + 52
  frame #9: 0x000000010373104c libpython3.12.dylib`gc_collect_main.llvm.7237410993583088352 + 3628
  frame #10: 0x000000010386c1b4 libpython3.12.dylib`finalize_modules.llvm.8252630758059695344 + 4336
  frame #11: 0x0000000103869754 libpython3.12.dylib`Py_FinalizeEx + 244
  frame #12: 0x00000001038e2470 libpython3.12.dylib`Py_Exit + 20
  frame #13: 0x00000001038e2450 libpython3.12.dylib`handle_system_exit + 32
  frame #14: 0x00000001038e2134 libpython3.12.dylib`_PyErr_PrintEx.llvm.5721313084846900902 + 52
  frame #15: 0x0000000103889a3c libpython3.12.dylib`_PyRun_SimpleFileObject + 464
  frame #16: 0x00000001038e42b0 libpython3.12.dylib`_PyRun_AnyFileObject + 80
  frame #17: 0x00000001038e2e94 libpython3.12.dylib`pymain_run_file_obj + 164
  frame #18: 0x00000001038e2a6c libpython3.12.dylib`pymain_run_file + 72
  frame #19: 0x00000001038e0d40 libpython3.12.dylib`Py_RunMain + 1124
  frame #20: 0x000000010385b8d0 libpython3.12.dylib`pymain_main + 456
  frame #21: 0x000000010385b6fc libpython3.12.dylib`Py_BytesMain + 40
  frame #22: 0x00000001923c7154 dyld`start + 2476

In this PR and this PR, we attempted to re-enable the Mac tests, but without success. (Technically, the Mac tests have always been running with continue-on-error: true.)

Here are some thoughts:

  • uv run seems to impact this, and we want to run pytest from the venv. However, this turned out to be unrelated; running tests using .venv/bin/pytest yielded the 5405 Segmentation fault: 11 output.
  • Newer Python versions do not exhibit this issue. However, this was not the case, as we encountered this problem with 3.10, 3.11, 3.12, and others.
  • The issue only arises when we override the TrackingStore. Various combinations or disabling some overrides in conftest.py did not resolve the problem.
  • When we took over the GitHub Action runner using tmate, we noticed that the exact same test command no longer had the issue. It appears the problem only occurs the first time after a version of Python is installed on the machine, matching the behavior we observed when running the tests locally. It seems to be a one-time issue.
  • When we ran the vanilla MLflow tests for Mac, they always succeeded. Therefore, the problem must be related to the combination of Python SQLite and our Go SQLite.
  • Before noticing the Segmentation fault, we suspected an issue with pytest itself. However, after following the insights in this discussion, this appears not to be the case.

To anyone reading this, please feel free to investigate further and submit a PR with a fix. We would greatly appreciate catching issues on Mac during CI. Unfortunately, at this time, we still need to ignore these failures.

@nojaf nojaf removed their assignment Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants