Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl 5.36 cpan | PDL build fails #93

Closed
shawnlaffan opened this issue Apr 20, 2023 · 48 comments
Closed

perl 5.36 cpan | PDL build fails #93

shawnlaffan opened this issue Apr 20, 2023 · 48 comments

Comments

@shawnlaffan
Copy link
Contributor

shawnlaffan commented Apr 20, 2023

This might be an issue in PDL, or one of the modules it depends upon is not working correctly.

(Edit - tested with PDL 2.082).

The niceslice tests are failing but it is the all call that is segfaulting. GDB log is:

gdb perl
(gdb) set args -Mblib -MPDL -E"my $p = pdl [6,6]; say all $p == 6;"
(gdb) run
Starting program: C:\strawberry\perl\bin\perl.exe -Mblib -MPDL -E"my $p = pdl [6,6]; say all $p == 6;"
[New Thread 25772.0x8778]
[New Thread 25772.0x7d98]
[New Thread 25772.0x65f0]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ff8c880927b in boot_PDL__Core ()
   from C:\STRAWB~2\data\.cpanm\work\1681992025.28280\PDL-2.082\blib\arch\auto\PDL\Core\Core.xs.dll

Excerpt from the end of the build log:

gmake[2]: Entering directory 'C:/STRAWB~2/data/.cpanm/work/1681992025.28280/PDL-2.082/Basic/SourceFilter'
"C:\strawberry\perl\bin\perl.exe" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, '..\..\blib\lib', '..\..\blib\arch')" t/*.t
t/niceslice-utilcall.t .. Dubious, test returned 5 (wstat 1280, 0x500)
All 6 subtests passed
t/niceslice.t ........... Dubious, test returned 5 (wstat 1280, 0x500)
All 6 subtests passed

Test Summary Report
-------------------
t/niceslice-utilcall.t (Wstat: 1280 (exited 5) Tests: 6 Failed: 0)
  Non-zero exit status: 5
  Parse errors: No plan found in TAP output
t/niceslice.t         (Wstat: 1280 (exited 5) Tests: 6 Failed: 0)
  Non-zero exit status: 5
  Parse errors: No plan found in TAP output
Files=2, Tests=12, 34 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU)
Result: FAIL
Failed 2/2 test programs. 0/12 subtests failed.
gmake[2]: *** [makefile:638: test_dynamic] Error 5
gmake[2]: Leaving directory 'C:/STRAWB~2/data/.cpanm/work/1681992025.28280/PDL-2.082/Basic/SourceFilter'
gmake[1]: *** [makefile:783: subdirs-test_dynamic] Error 2
gmake[1]: Leaving directory 'C:/STRAWB~2/data/.cpanm/work/1681992025.28280/PDL-2.082/Basic'
gmake: *** [makefile:1076: subdirs-test_dynamic] Error 2
FAIL
! Installing PDL failed. See C:\STRAWB~2\data\.cpanm\work\1681992025.28280\build.log for details. Retry with --force to force install it.
@shawnlaffan
Copy link
Contributor Author

@shawnlaffan
Copy link
Contributor Author

PDL 2.078 does not segfault when running C:\strawberry\perl\bin\perl.exe -Mblib -MPDL -E"my $p = pdl [6,6]; say all $p == 6;". However, the niceslice tests fail for other reasons.

"C:\strawberry\perl\bin\perl.exe" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, '..\..\blib\lib', '..\..\blib\arch')" t/*.t
t/niceslice-utilcall.t .. Prototype mismatch: sub Text::Balanced::_match_variable: none vs ($$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 603.
Prototype mismatch: sub Text::Balanced::_match_codeblock: none vs ($$$$$$$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 605.
Prototype mismatch: sub Text::Balanced::_match_quotelike: none vs ($$$$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 607.
t/niceslice-utilcall.t .. ok
t/niceslice.t ........... Prototype mismatch: sub Text::Balanced::_match_variable: none vs ($$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 603.
Prototype mismatch: sub Text::Balanced::_match_codeblock: none vs ($$$$$$$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 605.
Prototype mismatch: sub Text::Balanced::_match_quotelike: none vs ($$$$) at C:\STRAWB~2\data\.cpanm\work\1682027275.31324\PDL-2.078\blib\lib/PDL/NiceSlice.pm line 607.
t/niceslice.t ........... 1/?
#   Failed test 'NiceSlice leaves strings along'
#   at t/niceslice.t line 239.
#          got: '
#   CREATE TABLE $table ->slice(
#   CHECK ( yr = $yr )
#   ) INHERITS ($schema.master_table)
#   '
#     expected: '
#   CREATE TABLE $table (
#   CHECK ( yr = $yr )
#   ) INHERITS ($schema.master_table)
#   '
# Looks like you failed 1 test of 70.
t/niceslice.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/70 subtests

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Apr 20, 2023

PDL 2.080 also seg faults.

Edit: And 2.079.

@shawnlaffan
Copy link
Contributor Author

@mohawk2 - fyi. See previous posts in this issue for errors and link to Strawberry perl 5.36 for testing.

@mohawk2
Copy link

mohawk2 commented Apr 25, 2023

Thank you for letting me know. This works fine on Strawberry 5.32 here with PDL 2.081_03, so if this is caused by a problem in PDL, you're going to need to build PDL with OPTIMIZE=-g and help me out with a stack-trace.

Also, please ensure that all of PDL's deps are up to date, especially Text::Balanced, which will probably be what's breaking NiceSlice.

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Apr 25, 2023

@mohawk2

I built PDL with perl Makefile.PL; gmake OPTIMIZE=-g; gmake test and then ran C:\strawberry\perl\bin\perl.exe -Mblib -MPDL -E"my $p = pdl [6,6]; say all $p == 6;"

It does not segfault and produces the expected output.

As noted in an earlier comment, but easy to miss, a perl 5.36 to test with is at https://github.com/StrawberryPerl/Perl-Dist-Strawberry/releases/tag/dev_20230420

Also, Text::Balanced is at version 2.06. The build system uses all latest modules unless earlier versions are explicitly specified.

@mohawk2
Copy link

mohawk2 commented Apr 26, 2023

@shawnlaffan Is the GCC 10.3, the latest that's available in MinGW? I note that GCC proper is at version 13.

@HaraldJoerg has been working hard on helping me understand why this is happening; turning optimisation off makes it work. FYI it's just the pdl_type_coerce function that appears to be malfunctioning, in pdlapi.c. A workaround might be to compile only that with optimisation turned off, which is viable since there is no mega-high-performance stuff in that file.

For context, this isn't the first time compiler bugs have affected PDL; it won't compile on BSDs which have an older clang, which crashes; and there are a few uses of volatile in the source, since GCC produces code that is wrong under valgrind without them.

@genio
Copy link
Member

genio commented Apr 26, 2023

No, there are later versions available through winlibs. However, we ran into some issues with later versions and found 10.3 to be the least problematic for the moment.

@mohawk2
Copy link

mohawk2 commented Apr 26, 2023

That's a real pity.

@mohawk2
Copy link

mohawk2 commented Apr 26, 2023

No, there are later versions available through winlibs. However, we ran into some issues with later versions and found 10.3 to be the least problematic for the moment.

Can you say how to get a later winlibs and drop it in as a replacement for the 10.3? Also, can you give an indication of what kind of problems were encountered?

@shawnlaffan
Copy link
Contributor Author

@mohawk2 - See discussion starting from #56 (comment)

@shawnlaffan
Copy link
Contributor Author

We have been using 10.3 as that seems to be the latest winlibs gcc10 that does not depend on UCRT or MSCVRT.

Of course, if MSVCRT is what we want and it's just a labelling change then we can start using those (and I can adjust my levels of ignorance about such matters).

WRT GCC 11 and later, I think there are patches that have been introduced to Perl 5.37 that would need to be backported. Or maybe it was only GCC 12: https://github.com/Perl/perl5/issues?q=is%3Aissue+is%3Aclosed+%22GCC+11%22+author%3Ahakonhagland

@sisyphus
Copy link

sisyphus commented Apr 27, 2023

A workaround might be to compile only that with optimisation turned off, which is viable since there is no mega-high-performance stuff in that file.

Note that the 5.36.0 GNUmakefile sets perl's optimization to -O2.
This has since been altered to -Os.
-O2 had always been fine for me until I started using gcc-12.
As it stands (in recent devel releases), the GNUmakefile just unconditionally specifies -Os.
AFAICT, -O2 is still fine for 32-bit builds or for unthreaded builds .... but not many people are looking for either of those perl configurations (especially the latter) anyway.

Of course, if MSVCRT is what we want and it's just a labelling change then we can start using those (and I can adjust my levels of ignorance about such matters).

MSVCRT (which is what mingw-w64 compilers have traditionally used) is definitely what you want.
Perl does not currently build with UCRT - see:
Perl/perl5#18772

EDIT: disregard the following remark. It occurs to me that I don't really know how well the "patched" perl-5.36.0 source will go when built using gcc-12.2.0, as I haven't actually tested:
I don't really see a good reason to not update your compiler to 12.2.0 (so long as you're prepared to apply the requisite patches to the perl-5.36.0 (and 5.36.1) sources.

With PDL-2.082, I get those niceslice failures you've mentioned on my own (patched) perl-5.36.0 (gcc-11.3.0), but the same PDL source builds and tests just fine on perl-5.37.11 (gcc-12.2.0).
BTW, it's the same version (2.06) of Text::Balanced in both cases.

Not so good with perl-5.37.11 (gcc-13.0.1) where PDL-2.082 fails to compile:

gcc -c  "-IC:/Users/Owner/.cpan/build/PDL-2.082-2/Basic/Core"  -DWIN32 -DWIN64 -D_WIN32_WINNT=0x0a00 -fdiagnostics-color=never -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -DUSE_PERLIO -D__USE_MINGW_ANSI_STDIO -fwrapv -fno-strict-aliasing -mms-bitfields -Os   -DVERSION=\"2.082\" -DXS_VERSION=\"2.082\"  "-ID:\perl-5.37.11-1301\lib\MSWin32-x64-multi-thread\CORE"  -DMY_ERFI -DMY_INFINITY -DMY_NAN -DMY_NDTRI -DMY_POLYROOTS cpoly.c
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s: Assembler messages:
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:223: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:329: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:338: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:445: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:477: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:487: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:605: Error: invalid use of operator "shr"
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:664: Error: invalid use of operator "shr"
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:749: Error: invalid use of operator "shr"
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:1126: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:1147: Error: invalid use of register
C:\Users\Owner\AppData\Local\Temp\ccalI6Yg.s:1217: Error: invalid use of register
gmake[2]: *** [Makefile:352: cpoly.o] Error 1
gmake[2]: Leaving directory 'C:/Users/Owner/.cpan/build/PDL-2.082-2/Basic/Math'
gmake[1]: *** [Makefile:535: subdirs] Error 2
gmake[1]: Leaving directory 'C:/Users/Owner/.cpan/build/PDL-2.082-2/Basic'
make.EXE: *** [Makefile:528: subdirs] Error 2
  ETJ/PDL-2.082.tar.gz
  C:\make\bin\make.EXE -- NOT OK

@mohawk2 - let me know if you want me to file a separate PDL bug report about this. (I don't think it should be discussed here .... probably should not even have been mentioned here ;-)

Cheers,
Rob

@mohawk2
Copy link

mohawk2 commented Apr 27, 2023

@sisyphus Please do file a specific bug report if it will make PDL more compatible! Even better, a pull request :-)

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Apr 27, 2023

Thanks @sisyphus

Comments for general input are below.

Note that the 5.36.0 GNUmakefile sets perl's optimization to -O2.

So we Strawberry perl is best built with -Os? Hopefully this is just a case of updating the GNUMakefile before building. (Edit 2 - updated link to correct location).
Perl/perl5@51634b4

And if GCC 12.2 is best avoided for now, per Rob's edited comment, then we need to decide which version to use. I assume the latest GCC 11?

Also, the current list of patches and updates for 5.36 are in the build script. If more are needed then they can be added pretty easily (whether they apply cleanly is a different question).

patch => { #DST paths are relative to the perl src root
'<dist_sharedir>/msi/files/perlexe.ico' => 'win32/perlexe.ico',
'<dist_sharedir>/perl-5.36/perlexe.rc.tt' => 'win32/perlexe.rc',
'<dist_sharedir>/perl-5.36/perl_pr19663.diff' => '*',
'<dist_sharedir>/perl-5.36/rt142390.patch' => '*',
'<dist_sharedir>/perl-5.36/perl_pr20008.diff' => '*',
'<dist_sharedir>/perl-5.36/perl_pr20136.patch' => '*',
'config_H.gc' => {
I_DBM => 'define',
I_GDBM => 'define',
I_NDBM => 'define',
#HAS_BUILTIN_EXPECT => 'define',
HAS_BUILTIN_CHOOSE_EXPR => 'define',
},
'config.gc' => { # see Step.pm for list of default updates
d_builtin_choose_expr => 'define',
#d_builtin_expect => 'define',
d_mkstemp => 'define',
d_ndbm => 'define',
#d_symlink => 'undef', # many cpan modules fail tests when defined
i_db => 'define',
i_dbm => 'define',
i_gdbm => 'define',
i_ndbm => 'define',
myuname => 'Win32 strawberry-perl 5.36.0.1 #1 Sat 04 Mar 2023 x64 tempvaluesonly',
osvers => '10',
},
},
.

Edit: The patch files themselves live under https://github.com/StrawberryPerl/Perl-Dist-Strawberry/tree/wip_536/share/perl-5.36

@sisyphus
Copy link

So we Strawberry perl is best built with -Os?

Yep. I think that's safest. (And updating the GNUmakefile as you indicated is all that's needed.)
The full discussion at Perl/perl5#20136 suggests that if perl is built with -O2 optimization on some version of Windows and that perl is transferred to a different version of Windows, then troubles can arise.
I've not experienced this - but then I don't do that sort of thing so I wouldn't know.
However, it sounds like that issue could be very relevant to the service being provided by Strawberry Perl.
(Personally, my only difficulty with -O2 occurs when I do multi-threaded 64-bit builds using gcc-12 or later.)

And if GCC 12.2 is best avoided for now, per Rob's edited comment,

I shouldn't really be making any suggestions about that all.
There are too many combinations of source/ compiler/runtime- version combinations that I haven't tried.
Just go with whatever is providing good mileage for you.

When it comes to 5.38.0 (next month), I feel that I can legitimately recommend 12.2.0 - having tested perl against that compiler throughout the 5.37 dev cycle. (Even then, there's a lot of "vendor" modules that I haven't built or tested.)
But for all else I should probably just shut up.

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

Thanks Rob.

Unless there are views otherwise I will have a go building extlibs and perl 5.36.0 with GCC 11.3.0 tomorrow.

Link from https://winlibs.com/ is https://github.com/brechtsanders/winlibs_mingw/releases/download/11.3.0-14.0.3-10.0.0-msvcrt-r3/winlibs-x86_64-posix-seh-gcc-11.3.0-mingw-w64msvcrt-10.0.0-r3.zip

@mohawk2
Copy link

mohawk2 commented Apr 27, 2023

I'd like to know how these post-10.3 versions of GCC do compiling PDL. If they're fine, then we don't need to investigate fixing PDL anymore.

@sisyphus
Copy link

sisyphus commented Apr 27, 2023 via email

@sisyphus
Copy link

I've built by simply running cpan -i PDL
With gcc-13.0.1, runtime version 11, the "gmake" step fails with the assembler problem I detailed earlier in this thread.
I'll try to get a look at the temporary file that's the source of the errors, before I submit a bug report about it.

With gcc-12.2.0, runtime version 10, all goes well with PDL - which passes the test suite and is installed without issue.
The build then tries (and fails) to build and install OpenGL-GLUT-0.72 and OpenGL-0.70.

With gcc-11.3.0, runtime version 10, PDL builds ok but the test suite stops just after it starts:

"D:\perl-5.37.11-1130\bin\MSWin32-x64-multi-thread\perl.exe" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef Test::Harness::Switches; test_harness(0, '....\blib\lib', '....\blib\arch')" t/.t
t/niceslice-utilcall.t .. Dubious, test returned 5 (wstat 1280, 0x500)
All 6 subtests passed
t/niceslice.t ........... Dubious, test returned 5 (wstat 1280, 0x500)
All 6 subtests passed

Test Summary Report

t/niceslice-utilcall.t (Wstat: 1280 (exited 5) Tests: 6 Failed: 0)
Non-zero exit status: 5
Parse errors: No plan found in TAP output
t/niceslice.t (Wstat: 1280 (exited 5) Tests: 6 Failed: 0)
Non-zero exit status: 5
Parse errors: No plan found in TAP output
Files=2, Tests=12, 1 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Result: FAIL

I also built perl-5.37.11 with gcc-10.3.0, runtime version 9 and had a crack at building PDL-2.082 on it, too.
I got the same behaviour as I got with the gcc-11.3.0 build.

Cheers,
Rob

@HaraldJoerg
Copy link

As another data point, I bisected PDL between 2.078 and 2.079. It begins to fail after commit 51a5bf87. However, this is most probably not helping: That change is in the relevant source (pdlapi.c), but the change is in function pdl_scalar which is never called before SEGV occurs.

@mohawk2
Copy link

mohawk2 commented Apr 27, 2023

As another data point, I bisected PDL between 2.078 and 2.079. It begins to fail after commit 51a5bf87. However, this is most probably not helping: That change is in the relevant source (pdlapi.c), but the change is in function pdl_scalar which is never called before SEGV occurs.

That says compiler bug to me.

@shawnlaffan
Copy link
Contributor Author

@sisyphus and @HaraldJoerg - did you compile PDL with OPTIMIZE=-Os? Same for the 5.36 perls.

Although the errors @sisyphus is seeing in the niceslice tests are the same as I get for 10.3, although I have not yet built a perl with -Os.

@sisyphus
Copy link

... did you compile PDL with OPTIMIZE=-Os? Same for the 5.36 perls.

Yes, the patch to GNUmakefile was applied to the 5.37 releases a few months ago and has apparently been backported to the 5.36.1 release.
My 5.36.0 builds have OPTIMIZE=-O2 unless built using gcc-12.2.0 or later. (But I don't think I've actually got any 5.36.0 installations built with 12.2.0 or later.)

IIUC the issue with gcc-13.0.1 and PDL is probably an as.exe bug that may get fixed.
See:
https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/thread/CADZSBj2PDJj1E64zWeWUeZ%3DrvRj7_ETbHxb6p1pBKg-rf9BJ5Q%40mail.gmail.com/#msg37836780

A workaround is provided therein - though I'll probably create an issue with PDL suggesting that any potentially clashing symbols in cpoly.c (such as "shr") be renamed.
But I need to do some testing first - to see what works, and to determine the extent of the issue.

Cheers,
Rob

1 similar comment
@sisyphus
Copy link

... did you compile PDL with OPTIMIZE=-Os? Same for the 5.36 perls.

Yes, the patch to GNUmakefile was applied to the 5.37 releases a few months ago and has apparently been backported to the 5.36.1 release.
My 5.36.0 builds have OPTIMIZE=-O2 unless built using gcc-12.2.0 or later. (But I don't think I've actually got any 5.36.0 installations built with 12.2.0 or later.)

IIUC the issue with gcc-13.0.1 and PDL is probably an as.exe bug that may get fixed.
See:
https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/thread/CADZSBj2PDJj1E64zWeWUeZ%3DrvRj7_ETbHxb6p1pBKg-rf9BJ5Q%40mail.gmail.com/#msg37836780

A workaround is provided therein - though I'll probably create an issue with PDL suggesting that any potentially clashing symbols in cpoly.c (such as "shr") be renamed.
But I need to do some testing first - to see what works, and to determine the extent of the issue.

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

Yes, the patch to GNUmakefile was applied to the 5.37 releases a few months ago and has apparently been backported to the 5.36.1 release.

It's in 5.37.x, but was not backported to 5.36.1.
https://github.com/Perl/perl5/blob/v5.36.1/win32/GNUmakefile#L607-L614

If the errors don't occur when using OPTIMIZE=-Os for both perl and PDL then life becomes simpler wrt compiler choice. Of course, hopes are easily dashed upon the rocks of reality...

@sisyphus
Copy link

It's in 5.37.x, but was not backported to 5.36.1.

Right you are.
My perl-5.36.1 builds do, however, specify OPTIMIZE=-Os .... though that's because my build script specifies -Os on the command line, not because the GNUmakefile was backported.
Sorry, I obviously took a look at the wrong GNUmakefile :-(

At this stage, all of my perl-5.36.1 configurations have been built using gcc-12.2.0, runtime version 10.
With the MSWin32-x64-multi-thread configuration (OPTIMIZE= -Os), PDL-2.082 again throws up that same niceslice failure.
And, again, the running of the test suite abruptly stops.

So, we see the same PDL-2.082 test errors (niceslice) for:
perl-5.36.1, with gcc-12.2.0, rt 10, -Os
perl-5.37.11, gcc-11.3.0, rt 10, -Os
perl-5.37.11, gcc-10.3.0, rt 9, -Os

Yet gcc-12.2.0, rt 10, -Os builds and tests PDL-2.082 fine with perl-5.37.11 (apart from the subsequent failure to build OPenGL).

There's no obvious (to me) conclusion to draw from that.
I suspect that no-one has been watching PDL on any Windows version other than 5.32.x (or earlier) for the last 3 years or so - so it doesn't really surprise me that things have got a bit chaotic.
PDL is a module that has always, IME, required constant monitoring on Windows.

I'm curious to see how PDL-2.082 goes on perl-5.37.11, gcc-13.01. rt 11, -Os when I get past the assembler errors.
I'll try to get to that tonight.
Other than that, I think we need to get that niceslice issue sorted because it seems to afflict a whole range of compilers/runtimes for various perl versions.

Then we can maybe get a better idea of which compiler to use.

Cheers,
Rob

@mohawk2
Copy link

mohawk2 commented Apr 28, 2023

I'm very happy to rename any variables that are triggering bugs. @sisyphus Please either make a PR, or open an issue (or say here) with what turns it from failing to succeeding :-)

@shawnlaffan
Copy link
Contributor Author

Thanks @sisyphus for the diagnoses.

Just to double check, you report PDL as building with gcc 12.2 in #93 (comment) but was that for perl 5.37?

You note that it fails for perl 5.36.1 across gcc versions in #93 (comment)

Also, which were for the winlibs variants? That is the source we are using for Strawberry perl.

@shawnlaffan
Copy link
Contributor Author

I'm very happy to rename any variables that are triggering bugs. @sisyphus Please either make a PR, or open an issue (or say here) with what turns it from failing to succeeding :-)

@mohawk2: I think it's the list in https://sourceware.org/bugzilla/show_bug.cgi?id=12240#c1 (accessed via the sourceware.org links at the end of the bug report linked to above ).

@sisyphus please correct me if this is wrong.

@sisyphus
Copy link

Just to double check, you report PDL as building with gcc 12.2 in #93 (comment) but was that for perl 5.37?

Yes, for perl-5.37.11. (I've just double-checked this.)
A caveat wrt winlibs-12.2.0 is that it is, by comparison with the earlier gcc versions, quite slow at building perl and perl extensions - though gmake -j8 improves things for me.
But, if you are thinking of using 12.2.0, just pay attention to its compilation speed - and judge for yourself whether it's lack of speed is an issue for you.

You note that it fails for perl 5.36.1 across gcc versions in #93 (comment)

Yes - it's the niceslice failure.
Sorry, I have no idea what's going on with that failure. Backtraces and debuggers are beyond my skill set, and I don't know where the code relating to all() is to be found.
Even if I could find that stuff, there's a good chance it would do me no good, anyway ;-)

Also, which were for the winlibs variants?

All non-13.0.1 compilers are winlibs.
For 13.0.1 I've been messing about with both Brecht Sanders' (winlibs) build and LH_Mouse's build.
Thankfully, gcc-13.1.0 has now been released, so I can stop buggerising around with 13.0.1 snapshots.
Brecht is currently building his 13.1.0 compiler packages, and LH_Mouse has already released his.

The links (that you've just posted) to my discussion with mingw-w64 developers re the compilation of PDL/Math/Basic/cpoly.c are correct.

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Apr 29, 2023

Thanks again Rob.

As a general discussion point for all:

It would seem gcc 13.1 is the way to go since it builds PDL? (once the issue in PDLPorters/pdl#434 has been resolved).

There are caveats. (1) Rob has not reported building 5.36.x with gcc 13, at least not in this issue? And (2) there is the risk that gcc 13 tickles some other novel bugs in the extlibs...

Edit - that also still leaves the OpenGL failures but I've been leaving them be for now as PDL is a higher priority.

@sisyphus
Copy link

sisyphus commented Apr 30, 2023

It would seem gcc 13.1 is the way to go since it builds PDL?

I was thinking of it as (maybe) a candidate for 5.38, rather than for 5.36.

Use of gcc-13 to build 5.38 is already going to require patches specified in Perl/perl5#21038 and Perl/perl5#21039 - though the latter is not mandatory for 64-bit builds.

Then there's another liittle issue with gcc-13.1 and PDL.
The patch (including the explanation) that I used to get around it is:

> diff -u Makefile.PL_orig Makefile.PL
--- Makefile.PL_orig    2022-10-22 02:55:54.000000000 +1100
+++ Makefile.PL 2023-04-30 11:12:04.408761700 +1000
@@ -30,10 +30,19 @@

 my @possible_headers = qw( curses.h ncurses/curses.h ncurses.h ncurses/ncurses.h ncursesw/ncurses.h );
 my $found_header;
-foreach my $incl (@possible_headers) {
-    if (eval { check_lib(header=>$incl) }) {
-       $found_header = $incl;
-       last;
+# With mingw-w64 ports of gcc-13, a locatable ncurses/curses.h
+# comes into existence. But it currently defines a symbol ("instr")
+# that clashes with an identical symbol defined in perl's util.h.
+# The clash is not fatal, but it's quickly followed by undefined references
+# to some symbols - which *is* fatal.
+# Better to not include the curses.h file om mingw-w64 builds,
+# so skip the search.
+unless($^O =~ /MSWin32/i && $Config::Config{cc} =~ /\bgcc/i) {
+    foreach my $incl (@possible_headers) {
+        if (eval { check_lib(header=>$incl) }) {
+           $found_header = $incl;
+           last;
+        }
     }
 }

I was about to raise an issue on the PDL list late last, including this patch but then I thought "hang on ... I should probably first make sure that the capability for gcc-13 to resolve those undefined references does not exist."
Maybe te PDL build simply needs to provide a link to the relevant library.

In any case that nameclash between perl's util.h and gcc-13's ncurses/curses.h should be resolved - which will probably be achieved via a patch to util.h (and perhaps other files in the perl-5.37.11/5.38.0 source).
I don't know why PDL-2.082/IO/Browser/browse.c includes ncurses/curses.h ahead of the perl headers.
But if the perl headers come first (as is usually the case) the nameclash becomes fatal.

I wonder what other yet-to-be-discovered issues with gcc-13 are lurking there.

gcc-12.2 is probably a good candidate for perl-5.38 as no patching of the perl source will be needed.
(Looks like I'll be using gcc-13.1.0.)

Neither 10.3.0 nor 11.3.0 nor 12.2.0 will currently build PDL on perl-5.36.1 - without any clear notion of when or how that PDL issue will be resolved.
My feeling is that the niceslice problem lies in the older perl source, not the compiler version. (Right ? .. or did we uncover behaviour that suggested that compiler version could play a hand ?)

It looks to me that you might as well just test the waters with an official release of perl-5.36.1 (MSWin32-x64-multi-thread only), and see what issues are thrown up by users.
It seems to me that using your current compiler (10.3.0) is as good as anything - unless you want to take a punt on gcc-13.1.

Get it out there, and see what the feedback is.
In the meantime keep working on making whatever other releases you're wanting to do.

I've just embarked upon building 64-bt perl-5.36.1 (MSWin32-x64-multi-thread) with gcc-13.1, using the same 5.36.1 source as I've used for the other builds, plus the one line insertion in GNUmakefile that copies libmcfgthread-1.dll to the t directory.
I'll report back on how that goes, including the build of PDL-2.082, when it's all done.
(Not sure that this is actually relevant ... but I'm curious to know, and irrelevance has never hindered me in the past.)

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

Thanks Rob.

Your point regarding perl 5.36 and gcc-13 is well made. That said, we are already patching in some of the 5.37 code changes. If only a small number are needed, and they apply cleanly, then we can seriously consider gcc-13.1 for 5.36.

Not being able to build PDL is something of a show-stopper for me upgrading my own perl to 5.36. A lot of my code depends on PDL.

The PDL errors might be a compiler bug but the fact they manifest with three major gcc versions (10, 11, 12) makes me wonder if the problem (or solution/workaround) might lie with PDL.

@sisyphus
Copy link

Well ... my build of 64-bit perl-5.36.1 using gcc-13.1.0 went fine, with all tests passing.
That didn't surprise me.
But it did surprise me that PDL-2.082 then built and tested fine (having first patched Math/Basic/Makefile.PL and IO/Browser/Makefile.PL as already detailed.)

I guess the fact that there's a problem with perl C/XS files that want to #include <ncurses/curses.h> is not really a great problem.
Since there has never before been an ncurse/curses.h file on mingw-w64 distros (AFAIK), there can't be much pre-existing code that's going to want to include that file.
We're just a bit unlucky that PDL searches for that file, and then decides that it should be used if it exists,

Anyway, I want to concentrate now on finding the correct fix to that IO/Browser problem.
Then I'll create an issue about it on the PDL repo.

BTW, winlibs now have gcc-13.1.0 available for download from the website.

Cheers,
Rob

@sisyphus
Copy link

The winlibs build of gcc-13.1.0 doesn't have curses.h anywhere.
Maybe that was just extra stuff that LH_Mouse provides.
Both distros contain bin/libncursesw6.dll.

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

Thanks Rob. I've started a docker container with gcc-13.1 so we'll see how it goes.

I also have containers with gcc-11.3 and gcc-12.3 on the go building the extlibs.

@sisyphus
Copy link

I just built perl-5.37.11 using winlibs 13.1.0 compiler.
It went fine - and I didn't need to do any patching at all to PDL-2.082 source.
Seems that the need to specify "-masm=att" applies only to Lh_Mouse's compiler. (I certainly wasn't expecting that to be the case.)

I'm also not going to worry about the issue with curses.h (which also afflicts only LH_Mouse's build) - since Curses is not really part of the gcc toolset and Strawberry Perl doesn't ship with libcurses.
Incidentally - the undefined references I was getting re "curses", when using LH_Mouse's compiler to build PDL, arose because the build linked to libcurses.a - which is a static library, not the requisite import library.
The import lib was named libcursesw6.dll.a (or something like that), and making the build link to that library fixed the problem.
The "instr" nameclash with util.h is apparently of no consequence.

Cheers,
Rob

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Apr 30, 2023

Thanks Rob.

For general info, the extlibs build across the winlibs gcc versions (11, 12, 13) with the exception of netcdf-4.9.0 and the need to upgrade to libiconv-1.17 from 1.15 (with patches from MSYS2). I'll file issues and PRs on the extlibs repo for those.

Next is building perl 5.36.1 and the cpan libs. I'll hopefully get to that later today.

Edit - actually, libunistring also has issues with gcc-12 and gcc-13. It looks like a threads linking issue and I'll post details to an extlibs issue.

@shawnlaffan
Copy link
Contributor Author

Issues on the build-extlibs repo:
netcdf: StrawberryPerl/build-extlibs#42
libunistring: StrawberryPerl/build-extlibs#43
libiconv: StrawberryPerl/build-extlibs#44

@shawnlaffan
Copy link
Contributor Author

Coming back to this issue now that PDL 2.084 has been released.

The issue can be reduced to an arithmetic combination of a PDL ndarray with a perl scalar.

The first and third cases below work, the second and fourth seg fault.

perl.exe -Mblib -MPDL -E "my $p=pdl(6,6); say $p; my $y = $p * pdl(6); say $y"
[6 6]
[36 36]

perl.exe -Mblib -MPDL -E "my $p=pdl(6,6); say $p; my $y = $p * 6; say $y"
[6 6]

perl.exe -Mblib -MPDL -E "my $p=pdl(6,6); say $p; my $y = $p->mult(pdl(6)); say $y"
[6 6]
[36 36]

perl.exe -Mblib -MPDL -E "my $p=pdl(6,6); say $p; my $y = $p->mult(6); say $y"
[6 6]


To my untrained eye this looks more like a bug with how PDL is handling perl scalars than a compiler issue?

@sisyphus
Copy link

To my untrained eye this looks more like a bug with how PDL is handling perl scalars than a compiler issue?

Nice reduction, Shawn.
I see the same behaviour (still using 2.082).

AIUI, PDL is building fine in current blead,
Can we therefore say that this issue:

  1. has been fixed;
  2. is the consequence of a bug in earlier perls.

Somewhere in the range 5.36.0 .. 5.37.4, this PDL issue goes away.
Later today, I'll build the in-between versions (5.37.0, 5.37.1, 5.37.2, 5.37.3) and determine just which release "fixed" the problem.

Cheers,
Rob

PS - I have no idea if this could prove to be relevant, but 5.37.1 updated ExtUtils::CBuilder with this fix:

  - Remove image-base generation on Win32/gcc and instead use GCC's built-in
    `--enable-auto-image-base` linker option.

@shawnlaffan
Copy link
Contributor Author

Thanks Rob.

The issue is definitely fixed for Perl 5.38.

It would still be very useful if there were a fix so PDL builds under 5.36, either in PDL or in perl itself (for which 5.36 is still under maint). If there is a patch for perl 5.36 then we can look at applying it for the Strawberry builds.

@shawnlaffan
Copy link
Contributor Author

And FWIW, the Strawberry Perl build process checks and updates CPAN modules in core so the failing builds have all included that ExtUtils::CBuilder change. If your 5.36 builds don't run such an update then it is probably not related.

@sisyphus
Copy link

Heh ... yeah, I looked at the CHANGES file for 0.280236 :-(
The CHANGES file for 0.280237 and 0.280238 (which I don't think have been ported over to CPAN yet) are the relevant ones to check, and I don't see anything suspicious in them.

Cheers,
Rob

@sisyphus
Copy link

sisyphus commented May 22, 2023

The segfault goes away when PDL is built on perl-5.37.2.
I think the problem is a manifestation of the vmem.h memory alignment issue (Perl/perl5#19824) - a patch for which was incorporated into 5.37.2.
Applying that patch to the 5.36.x source will hopefully address this problem with PDL.

The win32/vmem.h file is the same for both 5.36.0 and 5.36.1.
It's also the same for 5,37.2 and 5.37.11.

Therefore, if that's the problem, then this patch to win32/vmem.h should suffice for both perl-5.36.0 and 5.36.1:

$ diff -wu perl-5.37.1/win32/vmem.h perl-5.37.2/win32/vmem.h
--- perl-5.37.1/win32/vmem.h    2022-06-21 03:57:59.000000000 +1000
+++ perl-5.37.2/win32/vmem.h    2022-07-21 02:30:13.000000000 +1000
@@ -69,14 +69,32 @@

 #ifdef _USE_LINKED_LIST
 class VMem;
+
+/*
+ * Address an alignment issue with x64 mingw-w64 ports of gcc-12 and
+ * (presumably) later. We do the same thing again 16 lines further down.
+ * See https://github.com/Perl/perl5/issues/19824
+ */
+
+#if defined(__MINGW64__) && __GNUC__ > 11
+typedef struct _MemoryBlockHeader* PMEMORY_BLOCK_HEADER __attribute__ ((aligned(16)));
+#else
 typedef struct _MemoryBlockHeader* PMEMORY_BLOCK_HEADER;
+#endif
+
 typedef struct _MemoryBlockHeader {
     PMEMORY_BLOCK_HEADER    pNext;
     PMEMORY_BLOCK_HEADER    pPrev;
     VMem *owner;
+
+#if defined(__MINGW64__) && __GNUC__ > 11
+} MEMORY_BLOCK_HEADER __attribute__ ((aligned(16))), *PMEMORY_BLOCK_HEADER;
+#else
 } MEMORY_BLOCK_HEADER, *PMEMORY_BLOCK_HEADER;
 #endif

+#endif
+
 class VMem
 {
 public:

I'll try that out myself, shortly.

I wish I had a better memory :-(

Cheers,
Rob

UPDATE: Using the new, improved win32/vmem.h with perl-5.36.1 enabled PDL-2.048 to build just fine, for me !!

@shawnlaffan
Copy link
Contributor Author

Much appreciated Rob.

I've built a 5.36.1 with that patch in place. PDL passes its tests and the reproducer runs without seg faulting.

I'll close this issue once I've pushed those changes to the wip_536 branch. A new dev release will follow soonish.

shawnlaffan added a commit that referenced this issue May 23, 2023
This gets PDL to build.

Fixes #93

Also update the libgd version so it passes tests.
@shawnlaffan
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants