Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue Create info module for each DSO #15

Merged
merged 9 commits into from
Mar 14, 2021
Merged

Conversation

mdavidsaver
Copy link
Member

Extension of #14. Add SONAME to info file and attempt to test. The new test is currently failing on CI due to omission of the generated info file. Which is strange since my local builds with pip wheel and setup.py bdist_wheel include it. Not sure what is going on.

Local testing with

python3 -m virtualenv -p python3 env
. env/bin/activate
pip install -e .
cd example
pip wheel -v -w dist .

The last shows:

...
  adding 'dsodemo/lib/__init__.py'
  adding 'dsodemo/lib/demo_dso.py'
  adding 'dsodemo/lib/libdemo.so'
  adding 'dsodemo/lib/libdemo.so.1.0'
...

The CI logs show

...
  adding 'dsodemo/lib/__init__.py'
  adding 'dsodemo/lib/libdemo.so'
  adding 'dsodemo/lib/libdemo.so.1.0'
...

@ktbarrett

Generates an info module per DSO built with information on the original
dot separated name, the stem and full path to the DSO. The info module
also adds the directory containing the DSO to the DLL path on Windows
once the info module is loaded.

The name for the info module is configurable per DSO and defaults to
`"{stem}_dso.py"`. Setting the name to `None` turns off info moduel
generation for the DSO.

The change does not attempt to create proper package structure to ensure
the info modules are importable, that is up to the user.
@mdavidsaver mdavidsaver self-assigned this Feb 26, 2021
@ktbarrett
Copy link
Contributor

ktbarrett commented Feb 26, 2021

@mdavidsaver There is a pyproject.toml, which means pip is installing in PEP518 mode, with build isolation. setuptools_dso is picked up from PyPi, not the local install. You'll need to add --no-build-isolation to test it this way. Are you using a very old version of pip? Following those commands you posted I do not see demo_dso.py making it into the wheel in my install (up to date miniconda install on Ubuntu 20.04).

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@mdavidsaver
Copy link
Member Author

add --no-build-isolation

Thank you. This has tripped me up before... I'll have another go tomorrow.

@mdavidsaver
Copy link
Member Author

I think I have things straightened out now, and the tests are passing. (and I've cleanup up my old virtualenvs) I needed to add an additional case to the generated info file to handle py <3.8 on windows (before os.add_dll_directory()).

New local build testing recipe

python3 -m virtualenv -p python3 env
. env/bin/activate
pip update -U pip
pip download -d dist setuptools wheel
python setup.py sdist --formats=gztar
# build from source tar to check completeness
pip install -v --no-index -f dist setuptools_dso
cd example
pip install -v --no-index -f ../dist .

The added testing uses ctypes to load the SO qualified library as this is the one which should be loaded implicitly by the extension. The test is to compare a global variable address to ensure that only once instance of the DSO is loaded (important for my uses cases).

@mdavidsaver mdavidsaver marked this pull request as ready for review February 27, 2021 19:42
@mdavidsaver mdavidsaver changed the title More info file Continue Create info module for each DSO Feb 27, 2021
@mdavidsaver
Copy link
Member Author

mdavidsaver commented Feb 27, 2021

$ cat example/src/dsodemo/lib/demo_dso.py 

# generated by setuptools_dso
import sys, os

# on windows, extend DLL search path to include
# the directory containing this file
def fixpath():
    libdir = os.path.dirname(__file__)

    if hasattr(os, 'add_dll_directory'): # py >= 3.8
        os.add_dll_directory(libdir)

    elif sys.platform == "win32":
        path = os.environ.get('PATH', '').split(os.pathsep)
        path.append(libdir)
        os.environ['PATH'] = os.pathsep.join(path)

fixpath()
del fixpath

dsoname = 'dsodemo.lib.demo'
libname = 'libdemo.so'
soname = 'libdemo.so.1.0'
dir = os.path.dirname(__file__)
filename = os.path.join(dir, libname)
sofilename = os.path.join(dir, soname)
del dir
__all__ = ("dsoname", "libname", "soname", "filename", "sofilename")

@mdavidsaver
Copy link
Member Author

While not a breaking change, beginning to automatically injecting code into dependent packages is something which might surprise users (it would surprise me). I can't think of a way to communicate this other than through the version number, so this change will trigger a 2.0.

@navytux @mtikekar Any thoughts or objections?

@ktbarrett
Copy link
Contributor

We could also have this be opt-in behavior by having the name default to None? I'm happy either way.

@navytux
Copy link
Contributor

navytux commented Mar 3, 2021

@mdavidsaver, thanks for asking. Maybe I'm not understanding something correctly, but let me try to share my thoughts.

The problem we are trying to fix is this: if a DSO-A is built in project A and another project B needs to use it (by e.g. linking from DSO-B) there is a problem in how OS dynamic linker should find DSO-A when loading DSO-B. CTypes case is also related: we want to import a DSO but given py-import-like DSO path we don't know the correct OS-level path to DSO file.

If this is indeed the problem, let me share that I actually already hit it in practice with my wendelin.core + libgolang work: wendelin.core (project A) builds wendelin.wcfs.client.libwcfs DSO (https://lab.nexedi.com/kirr/wendelin.core/blob/b5fc98bb/setup.py#L312-317) which is linked to golang.runtime.libgolang (= DSO-B https://lab.nexedi.com/nexedi/pygolang/blob/4f28dddf/golang/pyx/build.py#L122-155). And when wendelin.wcfs.client.libwcfs is loaded by dynamic linker, without special preparations, the linker cannot load it because it cannot find libgolang.so that is coming from golang.runtime.libgolang.

For now I've worked it around by explicitly preloading golang.runtime.libgolang (https://lab.nexedi.com/kirr/wendelin.core/blob/b5fc98bb/bigfile/__init__.py#L24-27), but of course a more proper fix is needed.

So about the solution - my thoughs are this:

  • we are converging for a DSO to be discoverable by python import path. I would actually make it always to be like this.

  • for this to work, and importing to reuse python-level import machinery, it is indeed required (unless we want to reimplement whole import with all its corner cases) to emit a py-level file. I would suggest to call it as _<dsoname>_dsoinfo.py.

  • Then, to actually import a DSO, in general case, e.g. for DSO-B -> DSO-A case shown above, what would be required to import DSO B is:

    • import B's dsoinfo
    • see that B depends on DSO A
    • import A's dsoinfo
    • load A dso into the process address space
    • only then try to load B dso into the process address space

    because without first checking A's dsoinfo and preloading A dso, loading dso B will fail.

  • This suggests that exposing particular semantic of what is stored in _dsoinfo.py to users is asking users to implement the above process themselves.

  • Which would be more logically implemented by setuptools_dso.runtime in e.g. import_dso function.

  • That import_dso would implement the above dependency resolution process and interact with OS-level linker in OS-specific way.

  • It could also return something - e.g. a handle - that could be passed to ctypes somehow. The idea here is that after import_dso('my.dso.library') one should be able to resolve the result to a CDLL eventually with a call or lazy attribute.

  • This scheme also suggests that we shold start naming actual DSO files with their fully-qualified names, e.g. my/dso/my.dso.library.so instead of my/dso/library.so, because there would be a confusion and runtime error in case A.lib and B.lib DSOs are both used together in the same project.

Appologize for sharing it a bit unstructured. It is hard to find time for me right now to think in more details about this, but I would suggest not to rush and to contemplate this whole topic until reasonable understanding that covers all practical cases is reached.

Kirill

@ktbarrett
Copy link
Contributor

ktbarrett commented Mar 3, 2021

The problem we are trying to fix is this: if a DSO-A is built in project A and another project B needs to use it (by e.g. linking from DSO-B) there is a problem in how OS dynamic linker should find DSO-A when loading DSO-B. CTypes case is also related: we want to import a DSO but given py-import-like DSO path we don't know the correct OS-level path to DSO file.

I actually think these are two different problems. Solved the same way. But the current solution only addresses the second point: dynamically opening a library with ctypes.CDLL requires an absolute path since the path to the DSO is not known to ld. I see how the rest could be very useful, but would be a rather large change. Perhaps it should be a follow on?

Actionable comments for this PR:

I would suggest to call it as _<dsoname>_dsoinfo.py.

I agree with renaming to _{}_dsoinfo.py, we are calling them "info" modules, but it's not in the default name.

This scheme also suggests that we shold start naming actual DSO files with their fully-qualified names, e.g. my/dso/my.dso.library.so instead of my/dso/library.so, because there would be a confusion and runtime error in case A.lib and B.lib DSOs are both used together in the same project.

I'm not sure I follow this? Does the loader really consider two different objects with the same library name (e.g. test1/libA.so and test2/libA.so) to be the same library and won't load them both? This does seem like a third problem. Some kind of automated runtime system to abstract out the DSO name / handle would be useful here.

Do we need to step back and consider the whole system together? Or can we piecemeal the problem and offer some of the features today?

@mdavidsaver
Copy link
Member Author

... when wendelin.wcfs.client.libwcfs is loaded by dynamic linker, without special preparations, the linker cannot load it because it cannot find libgolang.so ...

Is this windows, and making some additional os.add_dll_directory() calls? If not, this might be worth exploring separately in an issue ticket.

... I would actually make it always to be like this.

I'm also inclined towards emitting info files by default.

... because without first checking A's dsoinfo and preloading A dso, loading dso B will fail.

I hadn't considered this. You are right though, strictly speaking this process should be recursive to make all the needed os.add_dll_directory() calls. Also worth including the dependency DSO names in the info file.

* start naming actual DSO files with their fully-qualified names, e.g. `my/dso/my.dso.library.so` instead of `my/dso/library.so`, because there would be a confusion and runtime error in case `A.lib` and `B.lib` DSOs are both used together in the same project.

I'm also a bit confused by this. Can you elaborate on a specific situation? It may be that I'm making some assumption like eg. unqualified library names (really SONAMEs) will be unique in any given dependency chain.

@mdavidsaver
Copy link
Member Author

Do we need to step back and consider the whole system together? Or can we piecemeal the problem and offer some of the features today?

No piece of software is ever done :) So I won't be waiting for perfection. My immediate concern is with external interface. If possible, I'd like to avoid having to make a breaking change in another month. Right now, I see this mainly as the generated file name. eg. my/package/_mylib_dsoinfo.py works for me, so I think this is settled. Adding additional info and code to *_dsoinfo.py can likely be done compatibly.

Copy link
Contributor

@ktbarrett ktbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably pedantic, but I prefer it.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
example/src/dsodemo/cli.py Outdated Show resolved Hide resolved
example/src/dsodemo/cli.py Outdated Show resolved Hide resolved
example/src/dsodemo/ext/__init__.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/setuptools_dso/dsocmd.py Outdated Show resolved Hide resolved
@navytux
Copy link
Contributor

navytux commented Mar 3, 2021

@mdavidsaver, @ktbarrett, thanks for feedback. My point is too about not exposing a public interface we will want to change later and it will be hard to change because the information / API is already exposed and used by users. I mean we should keep _{}_dsoinfo.py as internal detail and also keep the code that processes to load them (e.g. into ctypes.CDLL) in setuptools_dso runtime.

I'm on Linux.

Regarding fully qualified name: if a library is linked into libA.so and there are two dir1/libA.so and dir2/libA.so and have the same SONAME, the ld.so dynamic loader will use the first that is loaded into the process, or the first that is found via $LD_LIBRARY_PATH.

See e.g. here:

kirr@deco:~$ objdump -p /bin/ls|grep NEEDED
  NEEDED               libselinux.so.1
  NEEDED               libc.so.6

kirr@deco:~$ ldd /bin/ls
        linux-vdso.so.1 (0x00007ffded8d3000)
        libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fdaac4be000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdaac2f9000)
        libpcre2-8.so.0 => /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fdaac261000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdaac25b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdaac546000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdaac239000)

/bin/ls is linked to libselinux.so.1. Dynamic linker finds that at runtime at /lib/x86_64-linux-gnu/libselinux.so.1. Imagine /bin/ls being linked, or dlopening libA.so with RTLD_GLOBAL. That will make dynamic linker to remember that particular libA.so for libA name. If later dir2/libA is tried to be imported from another module by way of just adding dir2 to dynamic linker path, even though dir2 is now on path, loading libA will be satisified by already loaded dir1/libA. That's why I suggested considering making all SONAMEs unique.

In summary, I suggest:

  • introduce setuptools.import_dso function
  • it should return something with .CDLL attribute or function that provides ctypes.DLL for imported DSO
  • emit into _{}_dsoinfo.py only information about that DSO, not code.
  • the code that uses that information should be inside import_dso() function.

Then we can step-by-step incrementally adjust it for handling DSO -> DSO dependencies in the future.
At least this is what comes offhand to mind.

Hope it clarifies things a bit and sorry for in-hurry writings,
Kirill

@mdavidsaver
Copy link
Member Author

@ktbarrett Your comments above are quite specific. It you've already made the relevant changes, please push to #14.

@ktbarrett
Copy link
Contributor

@mdavidsaver Addressed. Also fixed up the test script.

@mdavidsaver
Copy link
Member Author

mdavidsaver commented Mar 6, 2021

introduce setuptools.import_dso function

I've made an attempt at this. The results are new functions setuptools_dso.prepare_dso() and .find_dso(). The former do only the os.add_dll_directory() calls, with the later also returning the DSO file name.

So example/src/dsodemo/ext/__init__.py becomes:

from setuptools_dso import prepare_dso
prepare_dso('..lib.demo')

and a dynamic lookup would be:

demolib = ctypes.CDLL(setuptools_dso.find_dso('dsodemo.lib.demo', so=True), ctypes.RTLD_GLOBAL)

@mdavidsaver
Copy link
Member Author

... are two dir1/libA.so and dir2/libA.so and have the same SONAME ...

Should these two really have the same SONAME? The convention is the identical SONAME implies ~equivalent content. While ugly, would it be workable for you to encode package name in the SONAME? eg. dir1/libA.so.0.dir1 Though this is probably only a partial solution, which won't work on windows.

@navytux
Copy link
Contributor

navytux commented Mar 11, 2021

@mdavidsaver, thanks.

While ugly, would it be workable for you to encode package name in the SONAME?

My point was that such encoding should be done automatically by setuptools_dso by naming DSO file with fully-qualified name of DSO. Then it will work universally and bridge gap in between python and dynamic linker namespaces. This is what Proposal #2 in bpo-36085 (the issue that added os.add_dll_directory) talks about).

Question: why do we need to distinguish filename and sofilename? I mean in which use-cases find_dso would be called with so=False? To me it looks reasonable to always operate on full sofilename, but maybe I'm missing something.

I also suggest to consider renaming prepare_dso to -> dylink_prepare_dso to emphasize that it configures dynamic linker.

@mdavidsaver
Copy link
Member Author

... done automatically by setuptools_dso by naming DSO file with fully-qualified name of DSO. ...

I'm having trouble articulating exactly why, but this seems a strange solution to me. Maybe I'm being overly conservative. More immediately, it would be an incompatible change, so I don't want to make this a default at present. Making this an option would be fine though. Maybe DSO('a.b.c', basename='a.b.c', ...) could create a/b/liba.b.c.so. Use of info files would make this sort of change ~transparent in future.

Question: why do we need to distinguish filename and sofilename? ...

I haven't been able to think of a good use case for a ctypes user to use so=False. That's why so=True is the default. I'm inclined to keep the option in case someone has a better imagination.

I also suggest to consider renaming prepare_dso to -> dylink_prepare_dso to emphasize that it configures dynamic linker.

Ok, I think I see the potential for confusion. The runtime is being prepared, not the DSO itself.

FYI. I'm planning to merge this PR soon, and will then upload a pre-release to pypi.org (eg. 2.0a1) to facilitate testing/changes to dependent packages. I'm also working to expand and reformat the documentation.

Since _dsoinfo.py is "standard" convention,
only allow specific override (or omission).
@mdavidsaver mdavidsaver merged commit 912a8e1 into master Mar 14, 2021
@mdavidsaver
Copy link
Member Author

Ok. 2.0a1 is uploaded, also expanded documentation.

@ktbarrett Thanks for your contribution (ideas and code).

Two final changes. Renamed prepare_dso to -> dylink_prepare_dso as suggested. DSO(..., dso_info_module_name= became DSO(..., dso_info= which does away with the string formatting to instead accept: True (emit *_dsoinfo.py), False (emit nothing), or a string to override the exact filename.

@navytux
Copy link
Contributor

navytux commented Mar 26, 2021

Thanks, @mdavidsaver; thanks @ktbarrett.

@navytux
Copy link
Contributor

navytux commented Mar 26, 2021

Is this windows, and making some additional os.add_dll_directory() calls? If not, this might be worth exploring separately in an issue ticket.

I followed up on #11 (comment)

navytux added a commit to navytux/pygolang that referenced this pull request Mar 26, 2021
setuptools_dso 2 started to emit those autogenerated files.
See epics-base/setuptools_dso#15 for details.
NexediGitlab pushed a commit to Nexedi/wendelin.core that referenced this pull request Mar 26, 2021
setuptools_dso 2 started to emit those autogenerated files.
See epics-base/setuptools_dso#15 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants