-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin Step slower on large meshes (Benchmarking Adios) #4254
Comments
Code extracted from: https://github.com/pybind/pybind11.git at commit 43de8014f96ab8fc0028f11a26edbcf9b937cd91 (master). Upstream Shortlog ----------------- Aaron Gokaslan (49): c4e29528 perf: Add more moves and optimize (ornladios#3845) 1b27b744 chore: Make stl_bind take slice as const_ref (ornladios#3852) e3aa215b Add perfect forwarding to make_iterator calls (ornladios#3860) 3829b762 chore: simplify numpy dtype ctor (ornladios#3869) fbcde3f0 chore: enable clang-tidy check modernize-use-nullptr (ornladios#3881) 1c636f4d chore: Change numpy dtype from_args call sig to const ref (ornladios#3878) 82455a41 Minor opt to cache tuple casting (ornladios#3894) 75007dda chore: rule of 3 for strdup guard (ornladios#3905) bdc7dd8c chore: update NVIDIA-PGI CI workflow (ornladios#3922) 2e331308 chore: remove unused include from stl.h (ornladios#3928) a8b3ff30 chore: add a couple of moves in pybind11.h (ornladios#3941) c42414db (perf): use a rvalue cast in func_wrapper (ornladios#3966) 2d4a20c8 chore: add missing moves for buffer_func and staticmethod in pybind11.h (ornladios#3969) 68f80105 chore: add err guard to capsule destructor and add a move to iostream (ornladios#3958) 8da58da5 chore: perfectly forward all make_iterator args (ornladios#3980) 58802de4 perf: Add object rvalue overload for accessors. Enables reference stealing (ornladios#3970) 42a41bf3 remove useless ctor (ornladios#3989) 554c0453 enable two new clang-tidy checks (ornladios#3988) e2dcd954 chore: optimize dictionary access in strip_padding numpy (ornladios#3994) 0964a909 Add a missing std::move in numpy.h (ornladios#4005) 2af163d9 Fix: 3.11 beta support (ornladios#3923) f9f00495 Properly visit self in >=3.9 traverse (ornladios#4051) bc9315fe chore: optimize sparse matrix casting with python tuple (ornladios#4064) f47f1edf Fix ornladios#3812 and fix const of inplace assignments (ornladios#4065) 42b54507 chore: use explicit defaulting in pyobject macros (ornladios#4017) b07975f4 Fix missing undef in pytypes (ornladios#4087) 88a1bb92 chore: remove unnecessary temporary std::pair (ornladios#4103) 6abb7de6 chore: Use PyObject_GenericGetDict and PyObject_GenericSetDict functions (ornladios#4106) b884b9dc chore: Add pytests for constructing pytypes from iterable (ornladios#4138) 81f35d29 chore: Mark detail:forward_like as constexpr (ornladios#4147) 9c04c7b0 chore: Delete copy ctor/assign for GIL RAIIs (ornladios#4183) 95d0e71a test C++14 on MSVC (ornladios#4191) 5aa0fad5 perf: call reserve method in set and map casters (ornladios#4194) c78dfe69 bugfix: Add error checking to list append and insert (ornladios#4208) 864ed112 chore: steal arg_v.value from copied arg in unpacking_collector (ornladios#4219) 0927c4d1 chore: Improve PyCapsule exception handling (ornladios#4232) 8781daf6 chore: Optimize iterator advance() call (ornladios#4237) b926396b bugfix: py contains raises errors when appropiate (ornladios#4209) 2ce76f78 Cleanup casters to release none() to avoid ref counting (ornladios#4269) 17c1e27b fix: Revert pfect args make iterator (ornladios#4234) b07223fa fix: improve bytes to str decoding error handling (ornladios#4294) 0176632e chore: sync blacken-docs hook with black (ornladios#4304) e133c33d chore: Convert direct multiprocessing.set_start_method("forkserver") call to a pytest fixture. (ornladios#4377) 7f23e9f3 chore: update clang-tidy to 15 (ornladios#4387) 08a89fac bugfix: delete proper ctors in gil.h (#4490) b2c1978c bugfix: Keep registered types until after Py_Finalize(). Fix #4459 (#4486) 66f12df0 chore: make #4587 use proper cpp17 feature macro (#4592) 1e8b52a9 bugfix: allow noexcept lambdas in C++17. Fix #4565 (#4593) b33d06f6 bugfix: fixes a test suite bug in the __new__ example (#4698) Alexander Grund (1): 3414c56b Workaround NVCC parse failure in `cast_op` (#4893) Antoine Prouvost (1): a67d7865 fix(stl_bind): Enable `bind_map` with `using` declarations. (#4952) Arman (1): 88699849 scoped_interpreter. overloaded constructor: PyConfig param (ornladios#4330) Axel Huebl (4): 0b4c1bc2 test: ConstructorStats newline (PyPy) (ornladios#4167) 6cb21474 fix: NVCC 11.4.0 - 11.8.0 host bug workaround (ornladios#4220) 824dc27a CI: Reenable an NVHPC Test (#4764) 76b7f536 Python_ADDITIONAL_VERSIONS: 3.12 (#4909) Ben Boeckel (1): dc9b3959 pybind11.pc: use pcfiledir for relative destinations (#4830) Brad Messer (1): a48ec3e8 Words matter updates (ornladios#4155) Chekov2k (1): b07d08f6 Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` option (cmake, C++ define) (ornladios#4216) Christoph Grüninger (1): 976fea05 Fix Clazy warnings (#4988) Chun Yang (1): 1e28599e fix: Add missing spaces to error string (#4906) Cliff Burdick (1): a5b9e50f fix: added check on iterator end position (#5129) Daniel Dinu (1): 8a4bca82 fix(cmake): use case-insensitive CMAKE_BUILD_TYPE comparisons (ornladios#4078) Daniel Galvez (1): 7c6f2f80 fix: PyCapsule_GetDestructor is allowed to return a nullptr destructor (ornladios#4221) Daniel Jacobs (1): 44e93682 Use PyConfig_InitPythonConfig instead of PyConfig_InitIsolatedConfig (#4473) Dustin Spicuzza (3): 1874f8fa Clarify GIL documentation (ornladios#4057) 8dcced29 Always display python type information in cast errors (#4463) f8703154 Provide better type hints for a variety of generic types (ornladios#4259) Ed Catmur (2): 9bc27044 Add tests for cast from tuple to sequence containers (ornladios#3900) 68a0b2df Add anyset & frozenset, enable copying (cast) to std::set (ornladios#3901) Eli Schwartz (2): 5bdd3d59 feat(cmake): add installation support for pkg-config dependency detection (ornladios#4077) 3cc7e425 add --version option to pybind11-config (#4526) Ethan Steinberg (5): ee2b5226 Fix functional.h bug + introduce test to verify that it is fixed (ornladios#4254) 06003e82 Introduce a new style of warning suppression based on push/pop (ornladios#4285) ee4b9f5d Fix ODR violations in our Eigen Tensor tests (#4412) 60f02f5f fix: improve the error reporting for inc_ref GIL failures (#4427) 99131a85 Provide `PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREF` as an option (#4753) Frank (1): 00126859 Add option for enable/disable enum members in docstring. (ornladios#2768) Franz Pöschel (1): f7016546 Introduce recursive_container_traits (#4623) HaarigerHarald (1): f306012b fix: file extension on windows with cp36 and cp37 (ornladios#3919) Henry Schreiner (78): 65ec5de5 chore: bump changelog for 2.9.2 (ornladios#3834) 9969f3b5 ci: drop win2016 (ornladios#3854) 1a8603e4 ci: dependabot major versions for official actions (ornladios#3888) 1a7b1298 ci: fix cuda issue & MSVC spurious warning (ornladios#3950) dff6fa05 fix(cmake): avoid issue with NVCC + Windows (ornladios#3947) 1e4bd22b fix(cmake): support release and debug at the same time (ornladios#3948) 918d4481 fix(cmake): support cross-compiles with classic Python (ornladios#3959) c3e9173f ci: use almalinux instead of centos, add 9 (ornladios#4020) 5a3a1e34 chore: simpler dependabot (ornladios#4035) 0ab1fcfb docs: update changelog (ornladios#4042) 7c3a0317 chore: bump clang-tidy to 13 (ornladios#3997) 6b60d97d docs: use Furo (ornladios#3109) 87f64c43 docs: update changelog for 2.10.0 (ornladios#4066) aa304c9c chore: preapre for 2.10.0 release (ornladios#4068) 8d82f298 chore: back to work 5116a629 fix(spelling): PYTHON_VERSIONS 59f03ee3 tests: include pypy3.9 in nox if present bbb89da1 fix(cmake): support vcpkg, try 2 (ornladios#4123) 8275b769 ci: update pre-commit schedule (ornladios#4212) c3854682 ci(fix): don't label weekly dep updates & ci fixes (ornladios#4264) 1d4a65e2 feat: add entrypoint for cmake modules dir (ornladios#4258) 128d988e ci: fix labeler 36ccb08b docs: update changelog (ornladios#4265) 91cfb770 Revert "feat: add entrypoint for cmake modules dir" (ornladios#4270) 4fe905d4 fix: add flag for overriding classic Python search values (ornladios#4195) d1c31e9a chore: improve issue template (ornladios#4276) fcb5554d ci: move to final release of 3.11 (ornladios#4286) 252ed8fb docs: prepare for 2.10.1 release (ornladios#4279) 9727dcda chore: future safe bugbear opinionated warnings (ornladios#4393) a97c4d22 fix(cmake): support Windows ARM cross-compilation (ornladios#4406) 3fd1520d docs: changelog for next version (#4413) 0694ec6a chore: preapre for 2.10.2 release (#4414) b2d7ad72 chore: prepare for 2.10.3 (#4437) a34596bf chore: back to work 3efe9d4c chore: update to black 23 (#4482) 535f81a1 fix: tests dir has started to show up in packaging (#4510) 68211d41 fix: nicer stack level for warning (#4516) 438034c5 chore: move to Ruff and add rules (#4483) c4c15d4b docs: changelog for 2.10.4 (#4532) 4ce05175 ci: Python 3.12 optional test job (#4575) 956390a8 fix(cmake): only define lto if CMAKE's IPO setting is unset (#4643) bc1bcf7c chore: 3.12 + cleanup (#4713) 5ccb9e41 chore: ruff moved to astral-sh (#4726) 2e5f5c4c fix: support CMake 3.27, drop 3.4 (#4719) 9039e6ac chore: use 2x faster black mirror (#4784) 4fb111bd fix(cmake): correctly detect FindPython policy and better warning (#4806) 3aece819 chore: update hooks and Ruff config (#4904) a18c10f6 fix(cmake): make library component optional (#4805) c758b81f chore: move to ruff-format (#4912) b389ae77 chore: update changelog script for categories (#4942) 6cf90e72 fix(cmake): avoid really slow compile on emscripten (#4642) 6831666f ci: add more versions of numpy/scipy/pypy (#4714) 9591cfb0 fix(cmake): findpython issues and 3.12 support for pybind11_find_import (#4941) 39e65e10 ci: group dependabot updates (#4986) e84d446d ci: Ignore v1 updates for checkout (#5023) 0518bf9f ci: ignore actions/checkout until v5 comes out ec73bdaf ci: skipping test for Windows Clang failure (#5062) ddb8b67a fix(cmake): allow forcing old FindPython (#5042) e0f2c715 tests: hide warning on clang (#5069) 6b5674f3 chore: prepare 2.12.0 (#5070) 3e9dfa28 docs: a few missed changes for 2.12 (#5074) b91b584d docs: remove extra space 7af193e7 chore: get back to work 7f2214bc chore: bump cmake to 3.29 (#5075) f3984794 ci: macos-latest is changing to macos-14 ARM runners (#5109) a9256a6d chore: docs and nox bump (#5071) 4d0fcedc fix: support Python 3.13.0b1 (PEP 667 fix) (#5127) 86a64290 chore: some cleanup (#5137) ce08e370 fix: handle NULL correctly (#5145) ae6432b8 fix: Python 3.13t with GIL (#5139) 7187894e fix(cmake): old cmake boost bug (#5149) 1a0ff405 tests: avoid immortal objects in tests (#5150) 186df220 docs: building suggestions update (#5168) 9ec64e37 docs: prepare for 2.13.0 (#5187) b5ec7c71 ci: release with trusted publisher and attestations (#5196) 0c69e1eb chore: prepare for 2.13.0 (#5198) 895e6572 chore: back to work 57287b57 docs: prepare for 2.13.1 (#5203) Huanchen Zhai (1): 31b7e140 bugfix: removing typing and duplicate ``class_`` for KeysView/ValuesView/ItemsView. Fix #4529 (#4985) Hyunwook Choi (1): d70f54b0 docs: Missing semicolons (ornladios#4094) Ilya Lavrenov (1): aec6cc54 fix(cmake): skip empty PYBIND11_PYTHON_EXECUTABLE_LAST for the first cmake run (#4856) Jan Iwaszkiewicz (1): 424ac4fe fix: Windows compiler, missing object initializer (ornladios#4188) Jason Watson (1): 65370f33 Create handle_type_name specialization to type-hint variable length tuples (#5051) Jean Elsner (1): b9359cea Remove newlines from docstring signature (#4735) Joyce (2): d72ffb44 ci: set minimal permissions to github workflows (#4665) 6e6bcca5 Create s Security Policy (#4671) Kenji (1): f47ff328 Fix grammar in functions.rst (#4791) Keto D. Zhang (1): 9ad7e827 docs: Remove upper bound on pybind11 in example pyproject.toml for setuptools (#4774) Konstantin Bespalov (1): 5bbcba54 use C++17 syntax to get rid of recursive template instantiations for concatenating type signatures (#4587) Lalaland (2): fab1eebe First draft of Eigen::Tensor support (ornladios#4201) 8ea75ab4 Fix casts to void* (ornladios#4275) Laramie Leavitt (2): ab59f45d Prefer make_caster<T> to type_caster<T> (ornladios#3859) 088ad4f2 Cleanup cast_safe<void> specialization (ornladios#3861) Lonnie L. Souder II (1): b3ebd11d feature: support compilers that use std::experimental::filesystem (ornladios#3840) Luc de Jonckheere (1): 6d22dba8 Warning on comparing wrapper enums with is (#4732) László Papp (1): 5891867e fix(cmake): support DEBUG_POSTFIX correctly (#4761) Maarten Baert (3): 72eea20a Fix py::cast from pytype rvalue to pytype (ornladios#3949) 4624e8e1 Don't return pointers to static objects with return_value_policy::take_ownership. (ornladios#3946) 918892b9 Make dtype::num() return type consistent with other functions (ornladios#3995) Martin Blanchard (2): 89c3561d Fix multi-context new Python linking mode (ornladios#4401) 09db6445 IPO/LTO support for ICX (IntelLLVM) compiler (ornladios#4402) Masaki Kozuki (1): 374a5b00 [docs] Fix "Enumerations and internal types" example (ornladios#4034) Mateusz Sokół (2): 0a756c0b MAINT: Include `numpy._core` imports (#4857) dd64df73 MAINT: Remove np.int_ (#4867) Matthew Woehlke (1): 479e9a50 Fix arrays with zero-size dimensions (ornladios#4038) Matthias Volk (1): 67c9c568 fix: fully qualify usages of concat to protect against ADL (#4955) Mattias Ellert (1): fa27d2fd Adapt to changed function name in Python 3.13 (#4902) Michael Carlstrom (9): 68405a11 Add `Union` and `Optional` to typing.h (#5165) 7c4ac91d Add type[T] support to typing.h (#5166) aebcd704 Add TypeVars / method generics typing (#5167) 183059f9 feat(types): add support for typing.Literal type (#5192) 26281c79 feat(types): adds support for Never and NoReturn from python Typing (#5193) 2be85c60 feat(types): adds support for TypeGuard and TypeIs (#5194) 4bd538a4 feat(types): add support for Typing.Callable Special Case (#5202) 08f946a4 fix: add guard for GCC <10.3 on C++20 (#5205) d805e996 feat(types) Adds special Case for empty C++ tuple type annotation (#5214) Michael Voznesensky (1): f0b9f755 Replace error printing code gated by NDEBUG with a new flag: PYBIND11_DETAILED_ERROR_MESSAGES (ornladios#3913) Mike Essenmacher (1): 531144dd Replace "whitelist" with "allowlist" (#4506) Oleksandr Pavlyk (4): fa98804a Adds set_name method of pybind11::capsule class (ornladios#3866) ba7a0fac Expand dtype accessors (ornladios#3868) 45164c1f Added deleted copy constructor for error_scope to comply with rule of 3. (ornladios#3870) 7e5edbc9 Avoid copy in iteration by using const auto & (#4861) Pablo Speciale (1): 0cbd92ba Update pytest to version 7.2.0 (which removes their dependency on py) (#4880) Pascal Thomet (1): 768cebe1 doc: add litgen to the automatic generators list (compiling.rst) (#5012) Peter Würtz (1): 76b88581 fix: Different MSVC versions may be ABI incompatible, guard with _MSC_VER (ornladios#2898) (#4779) Pieter P (2): 4bf60c60 Disable strip when build type is unset (#4454) (#4780) 9b3a2000 fix(cmake): improved cross-compilation support (#5083) Pol Mesalles (1): daea1130 fix(cmake): upgrade maximum supported CMake version to 3.27 (#4786) Ralf W. Grosse-Kunstleve (92): 30716c67 Also add error_scope assignment operator to complete the rule-of-3 (follow-on to PR ornladios#3870). (ornladios#3872) 287e4f23 Test pickling a simple callable (does not work). (ornladios#3906) 5621ab85 Do we have a unit test for the traceback code in error_string()? 48c7be4a Undoing previous accidental commit. Sorry I forgot to git branch. 2c549eb7 Move `PyErr_NormalizeException()` up a few lines (ornladios#3971) 8d14e666 fix: avoid `catch (...)` for expected `import numpy` failures (ornladios#3974) 748ae227 Add missing error handling to `module_::def_submodule` (ornladios#3973) de4ba92c Add `error_scope` to `detail::get_internals()` (ornladios#3981) b24c5ed2 Replace "Unknown internal error occurred" with a more helpful message. (ornladios#3982) 9f7b3f73 addl unit tests for PR ornladios#3970 (ornladios#3977) cd08869d PYBIND11_NAMESPACE consistency fixes. (ornladios#4043) 85bc0884 Report `C++ Info:` via `pytest_report_header()` (ornladios#4046) 432bc5cf Add `std::string clean_type_id(const char *typeid_name)` overload (in namespace detail). (ornladios#4049) 023b3f32 Undo accidental one-line change under PR ornladios#3913 (ornladios#4060) 1d811910 Disable implicit conversion of `0` to `pybind11::handle`. (ornladios#4008) cb35a3c1 For PyPy only, re-enable old behavior (runs the risk of masking bugs) (ornladios#4079) 9a296373 More systematic gcc & clang coverage (ornladios#4083) 36655302 Add `-DPYBIND11_WERROR=ON` to mingw cmake commands (ornladios#4073) 29f4940c Fix copy-paste oversight (ornladios#4118) 68e6fdaa embed.h Python 3.11 `config.use_environment=1` + `PYTHONPATH` test (ornladios#4119) fac23b6f `error_fetch_and_normalize`: PyPy 7.3.10+ does not need the PR ornladios#4079 workaround anymore. (ornladios#4154) 4a421562 test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) (ornladios#4217) da104a9e Reproducer and fix for issue encountered in smart_holder update. (ornladios#4228) 964c4997 Minor `py::capsule` cleanup. No functional change. (ornladios#4238) 17c68091 ci: update PGI build (old one no longer signed) (ornladios#4260) 5bc0943e Ensure config, build, toolchain, spelling, etc. issues are not masked. (ornladios#4255) 3a2c96bd fix: unicode surrogate character in Python exception message. (ornladios#4297) b1bd7f26 fix: define (non-empty) `PYBIND11_EXPORT_EXCEPTION` only under macOS. (ornladios#4298) 1f04cc70 Add windows_clang to ci.yml (ornladios#4323) 296615ad Add macos_brew_install_llvm to ci.yml (ornladios#4326) 48949222 Use `PyEval_InitThreads()` as intended (ornladios#4350) 9c18a74e Use `multiprocessing` `start_method` `"forkserver"` (ornladios#4306) 358ba459 Fix test added with PR ornladios#4330 (ornladios#4372) 5b55f8fe Replace `ubuntu-latest` with `ubuntu-22.04` (ornladios#4382) 65374c8e `pybind11::handle` `inc_ref()` & `dec_ref()` `PyGILState_Check()` **excluding** `nullptr` (ornladios#4246) 663b86c2 Add flake8 `B905` to `extend-ignore` in setup.cfg (ornladios#4391) ff42f525 Systematically add `-DCMAKE_VERBOSE_MAKEFILE=ON` to obtain full command lines related to `-Wodr` (ornladios#4398) ece1206b ci: set `env: VERBOSE: 1` (ornladios#4405) f12e098f Fix `detail::obj_class_name()` to work correctly for meta classes. (#4436) 6da268a5 ci: remove clang 10 C++20 (it broke recently) (#4438) d821788b Add clang15 C++20 job (#4443) e53d58af Ensure `import pybind11_tests` traceback is shown. (#4455) a500f439 Resolve new flake8 error (#4462) c71e3af7 Bump isort version to 5.12.0 (#4480) f8713ec4 Revert "bugfix: Keep registered types until after Py_Finalize(). Fix #4459 (#4486)" (#4501) b8f28551 Go back to CMake 3.25.2 (#4496) 08a4a47a Revert "Go back to CMake 3.25.2 (#4496)" (#4503) d1956eab Appease new flake8 B028 error: (#4513) 6a5e6007 Make warning suppressions MINGW-specific again. (#4515) 442261da Remove test code that does not exercise anything in pybind11, but breaks between Python 3.12alpha3 (still working) and 3.12alpha6 (broken): (#4559) cf7d2e6f Change `always_forkserver_on_unix()` to `use_multiprocessing_forkserver_on_linux()` (#4577) 654fe926 Introduce `get_python_state_dict()` for Python 3.12 compatibility. (#4570) 07725c28 Introduce `pybind11::detail::is_move_constructible` (#4631) ff7f5dfc 1. Fully test unstable ABI (#4635) 6de6191a Use `std::hash<std::type_index>`, `std::equal_to<std::type_index>` everywhere **except when libc++ is in use** (ornladios#4319) 3f366ff8 Remove stray comment. (Oversight in PR #4631. Noticed by chance.) (#4641) 90312a6e Add `type_caster<PyObject>` (#4601) e9b961d9 Elide to-python conversion of setter return values (#4621) 19816f0d chore: update changelog, with target date for v2.11.0 release (#4677) ce9bbc0a Python 3.11+: Add `__notes__` to `error_already_set::what()` output. (#4678) 8e1f9d5c Add `format_descriptor<>` & `npy_format_descriptor<>` `PyObject *` specializations. (#4674) 29487dee Disable 🐍 3 • CentOS7 / PGI 22.9 • x64 (#4691) 0e43fcc7 Python 3.12b2 testing (#4695) 86f60a0c pre-commit markdown-it-py<3 (for Python 3.7 compatibility) (#4704) 84932280 Systematically add `PIP_BREAK_SYSTEM_PACKAGES` to all .yml files from which pip is called. (#4705) e10da79b Undo ci.yml gcc10 workaround after docker-library/gcc#95 was resolved. (#4717) 2fb3d7cb Trivial refactoring to make the capsule API more user friendly. (#4720) 782b6281 Drop PyPy 3.7 from GitHub Actions (ci.yml) (#4728) 1a917f18 docs: preparation for v2.11.0 release (#4744) e85696e8 Post release version bump (#4747) ec1b57c5 Disable `PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF` generally for PyPy (not just PyPy Windows). (#4751) 8d08dc64 Copy v2.11.1 changelog section as released. (#4755) f3e06028 Add command to check for vX.Y.Z tag vs pybind11/_version.py consistency. (#4757) 17b61430 clang 17 compatibility fixes (#4767) 690a115d Add `py::set_error()`, use in updated `py::exception<>` documentation (#4772) add281a2 Migrate to readthedocs configuration file v2¶ (#4789) 80bcd21f [ci skip] Adopt nanobind config. (#4792) 1adac5a5 `PYBIND11_INTERNALS_VERSION` bump for MSVC, piggy-backed on PR #4779. See comments there. (#4819) c9638a19 Help Coverty avoid generating a false positive. (#4817) d06d5369 Fix small bug introduced with PR #4735 (#4845) f468b070 Add 2 missing `throw error_already_set();` (#4863) 0e2c3e5d Add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) (#4877) 7969049d Comment out failing job, with link to #4889 (#4890) bf88e29c Bug fix: Replace bare `static exception<T>` with `gil_safe_call_once_and_store`. (#4897) 2c35fde3 Fix refcount bug introduced with PR #4916. (#4927) e250155a Fix a long-standing bug in the handling of Python multiple inheritance (#4762) 869cc1ff install mingw-w64-${{matrix.env}}-python-scipy only for mingw64 (#5006) 0efff79f Bug fixes: Add missing `handle_type_name` specializations. (#5073) ab955f15 Fix refcount bug involving trampoline functions with `PyObject *` return type. (#5156) 5552cbf2 Add Python 3.10, 3.11, 3.12 to win32 job matrix. (#5179) f1a2e03d feat: remove Python 3.6 support (#5177) a406a62e fix: use `std::addressof` in type_caster_base.h (#5189) Sam Gross (4): baa540ec fix: support free-threaded CPython with GIL disabled (#5148) 2e35470c fix: use manual padding of instance_map_shard (#5200) bb05e081 Use PyMutex instead of std::mutex in free-threaded build. (#5219) 43de8014 fix: make gil_safe_call_once thread-safe in free-threaded CPython (#5246) Sebastian Berg (1): 705efcce feat: make `numpy.h` compatible with both NumPy 1.x and 2.x (#5050) Sergei Izmailov (8): 8524b20c fix: Python-3.12 compatibility (ornladios#4168) e705fb5f Fix enum's `__str__` docstring (#4827) db412e6e fix: Render `py::function` as `Callable` (#4829) c8360593 feature: Support move-only iterators in `py::make_*iterator` (#4834) c9149d99 fix: Use lowercase builtin collection names (#4833) b4573674 Update render for buffer sequence and handle (#4831) 8c7b8dd0 fix: Missing typed variants of `iterator` and `iterable` (#4832) 74439a64 feature: Use typed iterators in `make_*iterator` (#4876) Sergei Lebedev (1): a05bc3d2 error_already_set::what() is now constructed lazily (ornladios#1895) Social_Mean (1): 31b0a5d9 fix doc typo Stephan T. Lavavej (1): 2d59b43c Qualify detail::forward_like to avoid conflict. (ornladios#4136) T.Yamada (1): d0232b11 Use annotated for array (#4679) Thierry Coppey (1): 35ff42b5 Add a pybind function to clear a list. (#5153) Thomas Eding (1): f8e8403b Open pybind11 namespace with consistent visility. (ornladios#4098) Tim Stumbaugh (2): cca4c51c Update errors in string "Explicit conversions" docs (#4658) 19a6b9f4 Fix typo in changelog date (#5096) Varun Agrawal (1): e0f9e774 fix(cmake): remove extra = in flto assignment (#5207) Vemund Handeland (1): 07a61aa1 Fix char8_t support (ornladios#4278) Victor Stinner (2): 7d538a42 fix: make_static_property_type() (#4971) dc477fac fix: Use PyObject_VisitManagedDict() of Python 3.13 (#4973) Xuehai Pan (1): 9907bedc fix(.github): fix bug-report issue template (ornladios#4363) aimir (1): 9db98801 Correct class names for KeysView, ValuesView and ItemsView in bind_map (ornladios#4353) albanD (1): c709d2a8 Make sure to properly untrack gc objects before freeing them (#4461) biergaizi (1): da919262 fix: remove -stdlib=libc++ from setup helpers, not needed on modern Pythons (#4639) bogdan-lab (1): f743bdf8 Avoid local_internals destruction (ornladios#4192) bzaar (1): 0620d716 Update README.rst - Add missing comma in the list of acknowlegements (#4750) cyy (1): e3e24f3f fix: issuses detected by static analyzer (#4440) cyyever (1): f2606930 Use newer PyCode API and other fixes (#4916) dependabot[bot] (42): b58b772b chore(deps): bump actions/setup-python from 2 to 3 (ornladios#3895) e79293cf chore(deps): bump actions/cache from 2 to 3 (ornladios#3898) 2a7cb008 chore(deps): bump actions/download-artifact from 2 to 3 (ornladios#3897) be4a634c chore(deps): bump actions/checkout from 2 to 3 (ornladios#3896) 03252067 chore(deps): bump actions/upload-artifact from 2 to 3 (ornladios#3899) 0e956a2e chore(deps): bump pre-commit/action from 2.0.3 to 3.0.0 (ornladios#3992) bc1f9f9b chore(deps): bump actions/setup-python from 3 to 4 (ornladios#3999) 1e3400b6 chore(deps): bump pypa/gh-action-pypi-publish from 1.5.0 to 1.5.1 (ornladios#4091) 283f10dc chore(deps): bump ilammy/msvc-dev-cmd from 1.10.0 to 1.11.0 (ornladios#4161) ff7b6971 chore(deps): bump jwlawson/actions-setup-cmake from 1.12 to 1.13 (ornladios#4233) 5b5547bc chore(deps): bump ilammy/msvc-dev-cmd from 1.11.0 to 1.12.0 (ornladios#4242) b14d58b6 chore(deps): bump pypa/gh-action-pypi-publish from 1.5.1 to 1.5.2 (ornladios#4370) 6a1023e3 chore(deps): bump deadsnakes/action from 2.1.1 to 3.0.0 (ornladios#4383) a6e75e4d chore(deps): bump pypa/gh-action-pypi-publish from 1.5.2 to 1.6.1 (ornladios#4384) 65cc9d2a chore(deps): bump pypa/gh-action-pypi-publish from 1.6.1 to 1.6.4 (ornladios#4389) 9ef65cee chore(deps): bump ilammy/msvc-dev-cmd from 1.12.0 to 1.12.1 (#4493) 04ef4e42 chore(deps): bump pypa/gh-action-pypi-publish from 1.6.4 to 1.8.1 (#4576) 286873ec chore(deps): bump pypa/gh-action-pypi-publish from 1.8.1 to 1.8.3 (#4584) 7ab88d2e chore(deps): bump pypa/gh-action-pypi-publish from 1.8.3 to 1.8.4 (#4602) ed466da5 chore(deps): bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (#4604) 071f35ab chore(deps): bump jwlawson/actions-setup-cmake from 1.13 to 1.14 (#4632) dff75a62 chore(deps): bump pypa/gh-action-pypi-publish from 1.8.5 to 1.8.6 (#4650) c679a920 chore(deps): bump deadsnakes/action from 3.0.0 to 3.0.1 (#4687) d462dd91 chore(deps): bump scipy from 1.8.0 to 1.10.0 in /tests (#4731) b2732c6e chore(deps): bump pypa/gh-action-pypi-publish from 1.8.6 to 1.8.7 (#4718) 4a2f7e46 chore(deps): bump actions/checkout from 1 to 4 (#4836) 6c772085 chore(deps): bump seanmiddleditch/gha-setup-ninja from 3 to 4 (#4875) 4bb6163b chore(deps): bump deadsnakes/action from 3.0.1 to 3.1.0 (#4951) 68322895 chore(deps): bump actions/setup-python from 4 to 5 (#4965) e8a43ea9 chore(deps): bump actions/download-artifact from 3 to 4 (#4976) eeac2f45 chore(deps): bump actions/upload-artifact from 3 to 4 (#4975) b583336c chore(deps): bump ilammy/msvc-dev-cmd from 1.12.1 to 1.13.0 (#4995) 8b48ff87 chore(deps): bump the actions group with 3 updates (#5024) c9747570 chore(deps): bump idna from 3.6 to 3.7 in /docs (#5121) 75025779 chore(deps): bump jinja2 from 3.1.3 to 3.1.4 in /docs (#5122) ede061ca chore(deps): bump the actions group with 1 update (#5082) b07fddb2 --- (#5130) fdd20d69 chore(deps): bump seanmiddleditch/gha-setup-ninja in the actions group (#5169) 1961b96a chore(deps): bump urllib3 from 2.2.1 to 2.2.2 in /docs (#5170) d78446cc chore(deps): bump actions/attest-build-provenance in the actions group (#5216) 50acb81b chore(deps): bump certifi from 2024.2.2 to 2024.7.4 in /docs (#5226) ccefee4c chore(deps): bump actions/attest-build-provenance in the actions group (#5243) gitartpiano (1): 88b019a8 fix pybind11Tools.cmake typo causing Unknown arguments (ornladios#4327) kajananchinniah (1): 70af9873 docs: fixed typo in spelling of first (#4428) luzpaz (1): a672de7c Fix source comment typo (ornladios#4388) nobkd (1): 6497b3f2 docs(numpy): drop duplicated ndim (#5119) pre-commit-ci[bot] (41): f2f0c690 [pre-commit.ci] pre-commit autoupdate (ornladios#3848) ad0de0f5 [pre-commit.ci] pre-commit autoupdate (ornladios#3863) e8e229fa [pre-commit.ci] pre-commit autoupdate (ornladios#3885) 9a16e55a [pre-commit.ci] pre-commit autoupdate (ornladios#3903) ad146b2a [pre-commit.ci] pre-commit autoupdate (ornladios#3933) c5fa3436 [pre-commit.ci] pre-commit autoupdate (ornladios#3951) 21f0e72b [pre-commit.ci] pre-commit autoupdate (ornladios#4003) 2ad974c9 [pre-commit.ci] pre-commit autoupdate (ornladios#4021) dd3bf7fd [pre-commit.ci] pre-commit autoupdate (ornladios#4030) c42e3ab7 [pre-commit.ci] pre-commit autoupdate (ornladios#4041) 790241bc [pre-commit.ci] pre-commit autoupdate (ornladios#4058) ef7d971e [pre-commit.ci] pre-commit autoupdate (ornladios#4082) aa953710 [pre-commit.ci] pre-commit autoupdate (ornladios#4090) ba5ccd84 [pre-commit.ci] pre-commit autoupdate (ornladios#4104) 14c84654 [pre-commit.ci] pre-commit autoupdate (ornladios#4126) 8756f16e [pre-commit.ci] pre-commit autoupdate (ornladios#4151) aa8f8baa [pre-commit.ci] pre-commit autoupdate (ornladios#4171) 64f72818 [pre-commit.ci] pre-commit autoupdate (ornladios#4178) d02f219f [pre-commit.ci] pre-commit autoupdate (ornladios#4189) da8c730a [pre-commit.ci] pre-commit autoupdate (ornladios#4197) 600d6976 [pre-commit.ci] pre-commit autoupdate (ornladios#4210) 2441d25b chore(deps): update pre-commit hooks (ornladios#4302) 4768a6f8 chore(deps): update pre-commit hooks (ornladios#4386) 769fd3b8 chore(deps): update pre-commit hooks (#4439) 8a90b367 chore(deps): update pre-commit hooks (#4495) cbb876cc chore(deps): update pre-commit hooks (#4552) 5e946c2f chore(deps): update pre-commit hooks (#4605) b3e88ecf chore(deps): update pre-commit hooks (#4648) 3617f355 chore(deps): update pre-commit hooks (#4689) 47dc0c4b chore(deps): update pre-commit hooks (#4727) 413e6328 chore(deps): update pre-commit hooks (#4770) 467fe27b chore(deps): update pre-commit hooks (#4838) 2b2e4ca4 chore(deps): update pre-commit hooks (#4868) 0a974fed chore(deps): update pre-commit hooks (#4923) c1e06f5b chore(deps): update pre-commit hooks (#4963) f29def9e chore(deps): update pre-commit hooks (#4994) 416f7a44 chore(deps): update pre-commit hooks (#5018) f33f6afb chore(deps): update pre-commit hooks (#5084) aa98d957 chore(deps): update pre-commit hooks (#5123) b9794be4 chore(deps): update pre-commit hooks (#5154) b21b0490 chore(deps): update pre-commit hooks (#5220) wenqing (1): 51c2aa16 Fixed a compilation error with gcc 14 (#5208) xkszltl (1): a19daeac Inconsistent comments between 2 templates of `unchecked()`. (#4519)
Hi Abishek. OK, lets try to sort what is going on. Probably we should talk through what BeginStep does and maybe we can figure out what is going on. We're talking about SST reader-side BeginStep here I think, so the first thing it does is wait until there is a timestep available for it to read. It looks like your code calls BeginStep with the default timeout parameter, which means that it will block a new incoming timestep is available. Upon receipt of timestep metadata by reader rank 0 (or immediately if it has already arrived), rank 0 then broadcasts the metadata to the other ranks. This may not be an issue for you if your receiver is part of a large collective and the metadata is large. The last thing that BeginStep does it to "install" the received metadata on each reader rank. This a pretty quick process in BP5, but it could be expensive if your metadata is complex. (That is, if you have a lot of variables, blocks, etc. Big data doesn't necessarily mean big metadata. You can write one 1K block or one 1 100Gb block and the metadata size is pretty much the same, but if you write 100,000 1K blocks or 100,000 variables, this is different.) I haven't gone through you code enough to see if I can guess how big your metadata might be, but that's maybe something you can tell me off the top of your head. I'm guessing that at the scales you're probably running at, the MPI_Broadcast is probably not the problem, so I'd be inclined to think about whether or not BeginStep is waiting for data. It looks like you have a number of input engines and you're calling BeginStep with the default timeout, so when you call engine_dict[sim_id].begin_step(), that call will block until engine_dict[sim_id] has finished whatever it is doing, called end_step() and sent us metadata. There is a timeout of .5sec commented out there and based upon the way check_data is used, you might want to uncomment that, or even reduce it to 0sec. Zero sec will essentially do a poll(). Metadata arrived asynchronously (received by a background network handler thread over TCP (not rdma because it's usually smaller than data)), and begin_step() with a zero timeout will simply check to see if the metadata has already arrived and return StepStatus.NotReady if it hasn't. You can check again later to see if it's arrived by then, but in the meantime you can check your other incoming connections. I'm not quite sure I understand what all is going on in get_data() yet, but lets tackle the begin_step() problem first. Does what I've said above make sense? Possibly at the larger scales your writers are taking longer to produce the data, so the reader is having to wait more for the data and that shows up in begin_step() because that's where the waiting happens. Doesn't really consume CPU on the reader, but you might need to tweak the timeout parameter to fix it. At least that's my take given my limited understanding of what you're doing, but if I've misunderstood, then lets dig further. |
Hi again, Thanks for a detailed answer. You are right. It is better to have non-blocking Here is the code for writer side,
Summary of the writer-side code,
Below, I am printing duration for the solver In case of Adios, the solver time increases to ~4.5 seconds/timestep when in fact we are using the same solver that is used with zmq. Only thing that changes is the sending of a mesh with Adios. Iteration 0: solver duration=4.626270, send duration=0.002057
Iteration 1: solver duration=4.567420, send duration=0.001293
Iteration 2: solver duration=4.568082, send duration=0.000520
Iteration 3: solver duration=4.547276, send duration=0.001365
Iteration 4: solver duration=4.544529, send duration=0.000505
Iteration 5: solver duration=4.530054, send duration=0.001410 When commenting only data transfer Iteration 0: solver duration=4.195277
Iteration 1: solver duration=4.234625
Iteration 2: solver duration=4.261027
Iteration 3: solver duration=4.025121
Iteration 4: solver duration=4.312538
Iteration 5: solver duration=4.288451 When commenting both Iteration 0: solver duration=2.670431
Iteration 1: solver duration=2.643374
Iteration 2: solver duration=2.622191
Iteration 3: solver duration=2.622029
Iteration 4: solver duration=2.622040
Iteration 5: solver duration=2.622391 MPI communicator might have something do with this. Because even after commenting the mesh sending part of the loop, solver time did not decrease as much. So, I think something's up after There was another issue probably because of the communicator. I posted an issue we were getting when meshes become large. So, on bigger meshes, the same conjgrad fortran subroutine (from the file above) was deadlocking on |
We, Melissa, are currently trying to shift from zeromq implementation to Adios2 for leveraging RDMA support. After several issues we faced when using Adios SST, we were finally able to run it on Jean-zay which has a support of OmniPath network by setting some of the MPI variables as follows,
These flag ensure that
DataTransport=rdma
is set properly and throw no runtime errors.On jean-zay, OpenMPI PML is configured with
cm
(default) andob1
options. If we usecm
and passDataTransport=rdma
, Adios overrides this choice toevpath
. When usingob1
, it keepsrdma
. So, we know RDMA is being used.Setup
We are now benchmarking adios' communication latency against our zeromq implementation using a Heat PDE solver for sending 1000x1000 mesh for 100 timesteps.
We are running this benchmark on 3 nodes
Problem
Strangely, Adios seems slower than the zmq implementation. The total time for the entire program is,
Whatever post-reception processing is done is exactly the same in both zmq and adios. So, I did some profiling on the reader's receiving loop and found that the
begin_step
call is extremely slow in our case when the mesh is as large as 1000x1000, and is taking most of the execution time.These are the logs of
memory_profiler
for functions within thereceive()
call sorted by their total execution time.I am assuming that major chunk of the time inside
begin_step()
is due to some MPI collective calls but it seems absurdly large (0.8 seconds per call) for RDMA. In theSstVerbose
outputs, adios is picking thehfi_*
OmniPath interfaces. So, we know that at least, it chooses RDMA correctly.I did the same run but with a 100x100 mesh and
begin_step
time is negligible.I am not really sure if this is hardware-specific or not but can you help us investigate what could potentially be a bottleneck inside
begin_step
calls ?I provide our reader-side code file base_server.py.
receive()
: The receiver loop going over all active sst files at the moment until all are writers have finished.check_data()
: callsbegin_step()
get_data()
: callsget()
andend_step()
Extra
To calculate the communication time for each timestep I am doing the following,
Should the
begin_step
call also be included in the communication time ?zmq is a bit different in the reception such that we send partial timesteps and then stitch the partial data before processing it. So, zmq communication time starts when the first partial timestep is received and ends when the entire timestep is stitched together.
The text was updated successfully, but these errors were encountered: