Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to find include files / modules in CPATH environment variable #444

Closed
wants to merge 2 commits into from

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Apr 16, 2021

This should make it easier for GFortran based builds to find modules in non-standard locations

@awvwgk awvwgk requested a review from milancurcic April 16, 2021 17:24
@awvwgk awvwgk linked an issue Apr 16, 2021 that may be closed by this pull request
@awvwgk
Copy link
Member Author

awvwgk commented Apr 16, 2021

This will not resolve the issue that GFortran is not searching in the systems default include directory from the C compiler (like /usr/include), where system provided modules might be installed. Gaining knowledge on the default include directories of the C compiler might be more tricky (requires to invoke C compiler), but assuming paths is difficult.

Any ideas how to force GFortran to actually look for system installed module files?

@awvwgk awvwgk added the wontfix This will not be worked on label Apr 17, 2021
@awvwgk
Copy link
Member Author

awvwgk commented Apr 17, 2021

We could try to parse the cpp include paths from the output of

❯ gfortran -E -v - < /dev/null
Using built-in specs.
COLLECT_GCC=gfortran
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC) 
COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1 -E -quiet -v - -mtune=generic -march=x86-64
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include
 /usr/local/include
 /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include-fixed
 /usr/include
End of search list.
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "<stdin>"
COMPILER_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib/:/lib/../lib/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'

and manually append them. But now we are getting into working around compiler bugs. I'll mark this as won't fix until we established a strategy how to deal with such issues, see #447.

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it and it works. It needed a small fix to os_is_unix().

So if others support it, I think this is a step forward.

@milancurcic
Copy link
Member

I added 4 reviewers to this, with hope to get consensus on whether we should merge this or not.

@certik
Copy link
Member

certik commented Apr 18, 2021

Can you explain the use case that this is trying to fix?

I think no doubt fpm must give the compilers (including GFortran) the correct paths to find modules in whatever way it has to be done.

I don't think we should depend on environment variables like CPATH, FFLAGS, FC, etc., or do we? This is part of the design decisions what we want fpm to be. If users have such environment variables set up, and it causes fpm to break (because it finds system modules that it should not find, or uses compiler flags that it should not), I don't think that's a good design. So that is why I am asking what exact use case you are trying to implement here.

Copy link
Member

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained above, it seems to me currently we do not want to do this. But let's discuss it, I am happy to change my mind if presented with good arguments.

@milancurcic
Copy link
Member

The use case is to tell the compiler where to find modules, without having to do it on the CLI which is possible but unwieldy.

I don't know about the background on CPATH. I haven't used it before for Fortran, and I think the motivation for that name is that other compilers use it.

I think the more relevant question is whether to use env vars in fpm at all. If yes, this could be fpm-specific, like FPM_MODULES_PATH or similar.

@certik
Copy link
Member

certik commented Apr 18, 2021

I still don't understand the use case: why do you need to tell fpm where to find modules? fpm knows where all modules are because it was fpm who compiled them in the first place.

(Yes, FPM_MODULES_PATH instead of CPATH would be a better way to do it.)

@milancurcic
Copy link
Member

This is for finding external dependencies, like HDF5 and NetCDF.

@certik
Copy link
Member

certik commented Apr 18, 2021

I see. I can see a use case where you use Spack or the module system at an HPC machine to load quite a few such dependencies and you would like to use fpm and still pick them up.

So for that FPM_MODULES_PATH seems like the way to go, or perhaps even FPM_EXTERNAL_MODULES_PATH.

@ivan-pi
Copy link
Member

ivan-pi commented Apr 18, 2021

I still don't understand the use case: why do you need to tell fpm where to find modules? fpm knows where all modules are because it was fpm who compiled them in the first place.

Like the title says, it's not just about modules but also include files.

Taking as an example the NLopt library. If I install the developer files with sudo apt-get install libnlopt-dev, it installs the file nlopt.f in the directory /usr/include. Calling codes in Fortran are supposed to use this file as:

include(nlopt.f)

However, since this location is not on gfortran's default include path, I still need to add the flag -I/usr/includemanually.

@certik
Copy link
Member

certik commented Apr 18, 2021

@ivan-pi we could do FPM_EXTERNAL_MODULES_PATH for compiled module files and FPM_EXTERNAL_SOURCES_PATH for your use case to find nlopt.f.

@milancurcic
Copy link
Member

I can see a use case where you use Spack or the module system at an HPC machine to load quite a few such dependencies and you would like to use fpm and still pick them up.

Sure, and I typically set paths manually to my own builds from source, so that's another use case.

@awvwgk
Copy link
Member Author

awvwgk commented Apr 19, 2021

There are some subtle issue with this patch and I also think it is trying to solve the wrong problem. The main issue here is about consistency IMO.

There are two way a Fortran processor can depend on external files:

  • using a preprocessor with #include (cpp), #:include (fypp)
  • using the Fortran processor with include (Fortran source) or use (compiler specific files)

For the C preprocessor cpp there is a standardized way to add search directories to the search path using the CPATH environment variable. For the processor this is probably not defined and the compiler is free to chose whether or not to use CPATH, GFortran decided to not use it, Intel Fortran is using it extensively (check your environment after loading Intel Fortran or MKL). The primary use case in Intel's setup is to locate files for Fortran include statements, there is no module file in the CPATH set by MKL but mixed format headers with interface blocks.

Leaving aside that GFortran at least lacks proper documentation on this behaviour, this discrepancy burns the CPATH environment variable for our use unless we make it consistent ourselves. Fixing this behaviour by just reading the environment variable CPATH is also incorrect, because we wouldn't account for the system paths. Frankly, GFortran's behaviour encourages usage of the C preprocessor with #include rather than include because the Fortran variant is much less powerful than the C variant.

Eventually, it boils down to the issue why we started fpm: dependencies between Fortran projects. The actual issue is that Fortran projects install their compiler dependent module files in the system include path together with their compiler agnostic include files. I think we all agree that this is not the way this problem is to be solved.

Let's focus on the question at hand, is there a way we can allow an fpm user to work around the problem of compiler not properly documenting its behaviour and a Fortran package installing files where they don't belong in a minimally painful way? The answer so far is --flag.

@certik
Copy link
Member

certik commented Apr 19, 2021

@awvwgk does this fix your use case: #444 (comment) ?

@awvwgk
Copy link
Member Author

awvwgk commented Apr 19, 2021

I would refrain from introducing new environment variables whenever possible. But the general idea to split the module search from the include file search is a good one, there are indeed compilers that implement this approach (at least one, don't remember which), but it is not generally available from all vendors so we can't implement it.

@milancurcic
Copy link
Member

@awvwgk What's the alternative then? --flag is overly verbose and error prone. I think an env variable is still better than symlinking external module files into the project tree.

@awvwgk
Copy link
Member Author

awvwgk commented Apr 19, 2021

What's the alternative then?

I wish I had an (easy) answer for you.

My concern are mainly reproducible builds here. Would you consider a change in this environment variable a change in the build environment? Should it go into the build hash or not? If it does not, interesting things might happen if change the HDF5 external module path from version 1.10 to 1.12 and do an incremental rebuild.

Also, I'm currently not seeing how this would solve the issue if I'm building with multiple compilers (different versions, different vendors).

@certik
Copy link
Member

certik commented Apr 19, 2021

Well, that's the issue with depending on external libraries. I feel using CPATH would make it even worse, at least we can control FPM_EXTERNAL_MODULES_PATH and yes, it seems it should be part of the hash. Alternatively, you can specify FPM_EXTERNAL_MODULES_PATH on the command line.

@milancurcic
Copy link
Member

milancurcic commented Apr 19, 2021

Reproducible builds are important.

Would you consider a change in this environment variable a change in the build environment? Should it go into the build hash or not?

Yes, it should. And that's not enough for reproducible builds. A module file under any path (regardless of how it's passed, --flag, include-dir, or env. var) could change and break the build. That's more complex, but still feasible to solve. One solution could be to, for each external module that's used, take a SHA sum of the file and have it go into the build hash.

In that sense, current approaches to external modules are not reproducible either. But env. variable would not make it worse in this regard, and would be better relative to the include-dir approach, which is (sadly) currently my best option for external modules.

Also, I'm currently not seeing how this would solve the issue if I'm building with multiple compilers (different versions, different vendors).

This:

#export FPM_EXTERNAL_MODULES_PATH=$FPM_EXTERNAL_MODULES_PATH_GNU
export FPM_EXTERNAL_MODULES_PATH=$FPM_EXTERNAL_MODULES_PATH_INTEL

is easier to do than replacing module files in include/ every time I want to switch compilers.

@ivan-pi
Copy link
Member

ivan-pi commented Apr 20, 2021

I wonder if we are over-stating the issue of external .mod files, which might be relevant only for small number of third party packages (so far netcdf-fortran and HDF5 are the only examples mentioned). As @LKedward wrote in #439 (comment):

My opinion on this is that in the short term we should provide an environment variable that allows specifying include directories to fpm for existing pre-built .mod files, and in the long term we encourage/help developers to provide proper module interfaces for their libraries that can be distributed as fpm packages.

A first step would be to add a section to the fpm documentation site, or a section in the packaging guide about creating fpm compatible interface to external libraries.

@milancurcic
Copy link
Member

I wonder if we are over-stating the issue of external .mod files, which might be relevant only for small number of third party packages (so far netcdf-fortran and HDF5 are the only examples mentioned).

It's relevant for any 3rd party package that provides a module. There may not be many of them, but the few of them have a huge surface area. So the number of packages is not necessarily the relevant variable. If you mean to say "I won't use it, so we shouldn't support it", that's not a helpful argument and it's dismissive of other users. The issue is overstated if you don't need it, and it's understated if you do need it. I need it. :)

Support for external modules allows me to use fpm for professional work and not just hobby toy projects. Before #438 I could only play with fpm and that's it. So I feel strongly about this feature. NetCDF is the main one. MPI is mostly taken care of thanks to its wrappers. Once I have those two working well, ESMF will be the next one. Maintaining these as fpm packages will be possible, but extremely challenging (HPC-oriented, cross-platform, multi-language, in active development i.e. a moving target, etc.). So it's not a reasonable solution in the present.

And we already support this thanks to #438. So the question now is not whether to support it, but how to improve the UI? Currently it's not good, see #452.

A first step would be to add a section to the fpm documentation site, or a section in the packaging guide about creating fpm compatible interface to external libraries.

Yes, we should do this, and it's a long-term goal for fpm for all Fortran packages to be fpm packages. But it's a separate and orthogonal issue. Until all packages are fpm packages, there must be good support for 3rd party modules.

@everythingfunctional
Copy link
Member

How do packages like NetCDF and HDF5 deal with the problem that *.mod files are not compatible between different compilers (and sometimes even between different versions of the same compiler)? Do they ship multiple collections of the module files, one for each supported compiler and version?

Personally, I prefer command line arguments to environment variables. I get a bit annoyed at things cluttering up my environment. Also, they get documented in the help message, so if I forget how to supply that info, a simple fpm build --help will be enough to find it.

Also, given the (relatively) small number of external packages that may need to be supported this way, should they be supported as "features" that are on for a package that needs them, and users have a config file (maybe $HOME/.config/fpm.toml) that specifies where the installation resides for a given compiler? That seems like the easiest system for users to maintain.

@milancurcic
Copy link
Member

How do packages like NetCDF and HDF5 deal with the problem that *.mod files are not compatible between different compilers (and sometimes even between different versions of the same compiler)? Do they ship multiple collections of the module files, one for each supported compiler and version?

I don't think they do at all. If you get them through the package manager, it's its responsibility. If you build them from source, it's your responsibility. I don't know what Conda or Spack do about it.

I use both the system provided-package for system-provided gfortran, and builds from source for custom versions and other compilers.

Personally, I prefer command line arguments to environment variables. I get a bit annoyed at things cluttering up my environment.

Try it with include paths. I did. It's pretty bad. You have to try it to really appreciate it. :)

@ivan-pi
Copy link
Member

ivan-pi commented Apr 20, 2021

If you mean to say "I won't use it, so we shouldn't support it", that's not a helpful argument and it's dismissive of other users. The issue is overstated if you don't need it, and it's understated if you do need it. I need it. :)

I'm sorry if my comment sounded dismissive. I understand NetCDF is a highly used library relevant to potentially hundreds of Fortran projects. I am aware of circumstances around such packages (some outlined in your second paragraph) that make them a difficult target. To use the (third-party) nlopt library I currently also rely on hard-wiring the build folder location directly in my Makefiles. Before investigating pkg-config I did the same for Intel MKL. Not elegant, but it allows me to get on with my work.

I just thought that if there is only of a handful of such libraries, it might be easier to focus our effort on implementing fpm support "upstream" (e.g. in a fortran-lang fork of the netcdf-fortran repository) than to design temporary "shortcuts" in fpm. After taking a closer look at the netcdf-fortran Makefile and CMake file I understand this is not a simple task (the build settings are highly configurable with some settings depending on how the underlying C library was built, the m4 macro processor is needed, ...).

Until all packages are fpm packages, there must be good support for 3rd party modules.

Now that I have better grasp of your usage case, I can appreciate why a (compiler-independent) way to specify module locations is needed. The next breakpoint for the transformation of third-party libraries into fpm packages will be when custom build scripts are supported (#219).

@certik
Copy link
Member

certik commented Apr 20, 2021

I don't think that @ivan-pi was dismissive. We all agree that we need to support this somehow. What we are discussing is a short term as well as a long term solution. The short term solution is to make fpm more usable for work today. But just like in #443, I don't want us to be "comfortable" with the short term solution and think we are done. Instead I want us to also design a good solution long term, which we will all reap benefits from for years to come.

@milancurcic
Copy link
Member

I don't think that @ivan-pi was dismissive.

Right, that's why I was careful to phrase it as "If you mean to say "I won't use it, so we shouldn't support it"". ;)

I 100% agree with what you wrote.

@milancurcic
Copy link
Member

... it might be easier to focus our effort on implementing fpm support "upstream" (e.g. in a fortran-lang fork of the netcdf-fortran repository)

I strongly believe that fpm will eventually be the build system and package manager of almost all Fortran packages, and all new ones. In other words, there will be a time when it will be weird for a package to not have fpm.toml in it, like Cargo.toml in Rust.

For legacy (unmaintained) packages, we may have to maintain a fork under fortran-lang.org, which is more work.

For active packages like NetCDF or MPI, I think it would be easier to convince and collaborate their developers to adopt an fpm.toml and restructure if needed. Then we wouldn't have to maintain a fork ourselves and make sure it's synced with the original project. Convincing their developers to include and then maintain an fpm manifest may be even easier if we can point them to the large number of fpm packages that use their library as a dependency.

In that sense, it's possible that the short-term solution would help the long-term vision by developing a track record and a network effect within the fpm ecosystem.

@milancurcic
Copy link
Member

milancurcic commented Apr 20, 2021

I'm sorry if my comment sounded dismissive.

@ivan-pi I apologize for assuming too much about your comment and not taking it in the best light. I know that you didn't mean to be dismissive.

@arjenmarkus
Copy link
Member

FYI, I wrote a small program (already some time ago in response to this ticket, but I seem to have forgotten to post it) that reads these pkgconfig files, including the substitution of macros. Maybe it is useful for fpm.
pkgconfig.f90.txt

@awvwgk
Copy link
Member Author

awvwgk commented Jun 27, 2021

Since I don't intend to continue working on this pull request, I'm going to close it.

This branch can be resurrected from e0767b4 if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent module / include search paths
6 participants