Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Travis CI #322

Merged
merged 2 commits into from
Mar 18, 2021
Merged

Fix Travis CI #322

merged 2 commits into from
Mar 18, 2021

Conversation

cyrilbouvier
Copy link
Member

The Goal of the PR is to fix Travis CI. Currently, two out of the four jobs fail: https://travis-ci.org/github/linbox-team/fflas-ffpack/builds/741688422

First, commit 1569db4 should fix the homebrew problem with the OS X job (a similar changes was made in Givaro a few months ago).

The other problem is with the job using gcc 4.9 (see issue #313). To fix this issue, I see two possibilities:

  1. drop support of gcc 4.9, which is old (last release in 2016) and not supported anymore (https://gcc.gnu.org/gcc-4.9/)
  2. do not use any intrinsics not recognize by gcc 4.9.
    For example, for the first missing intrinsic _mm512_castps512_ps256, I was able to remove it by rewriting the blendv method in simd512_float.inl:
     static INLINE CONST vect_t blendv(const vect_t a, const vect_t b, const vect_t mask) {
-
-        __m256 lowa  = _mm512_castps512_ps256(a);
-        __m256 higha = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(a),1));
-
-        __m256 lowb  = _mm512_castps512_ps256(b);
-        __m256 highb = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(b),1));
-
-        __m256 lowmask  = _mm512_castps512_ps256(mask);
-        __m256 highmask = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(mask),1));
-
-        __m256 reslow = _mm256_blendv_ps(lowa, lowb, lowmask);
-        __m256 reshigh = _mm256_blendv_ps(higha, highb, highmask);
-
-        __m512 res = _mm512_castps256_ps512(reslow);
-        res = _mm512_insertf32x8(res, reshigh, 1);
-
-        return res;
+        return _mm512_mask_blend_ps (
+                    _mm512_movepi32_mask (_mm512_castps_si512 (mask)), a, b);
     }

The advantage of this new code is that it is faster (only two instructions, as the cast is here for compilation only). The main drawback is that _mm512_movepi32_mask requires AVX512DQ and not just AVX512L. But , according to issue #312 , other parts of the simd code already silently require AVX512DQ.

More work is needed to remove the other mising intrinsics if we go with option 2.

@cyrilbouvier
Copy link
Member Author

cyrilbouvier commented Nov 6, 2020

Commit 1569db4 did fix the homebrew problem with the OS X job but revealed a failure for test-fger

The problem does not seem to be specific to OS X, as I was able to reproduce it. I had one failed run of test-fger on more than 100 runs, with the following error message:

Checking Modular_implem<int64_t, uint64_t, uint64_t> modulo 165033971... FAIL
a   :132720555
m   :44, n   : 1
incx :1, incy : 1, ldC : 1
Error C[0,0]=18463374 D[0,0]=24729322
Error C[1,0]=54479037 D[1,0]=144408618
Error C[2,0]=67881147 D[2,0]=99501009
Error C[3,0]=84658322 D[3,0]=141056375
Error C[4,0]=35823758 D[4,0]=44270387
Error C[5,0]=35369714 D[5,0]=15002300
Error C[6,0]=114908161 D[6,0]=42428348
Error C[7,0]=53084340 D[7,0]=15942759
Error C[8,0]=94996110 D[8,0]=104641692
Error C[9,0]=39583949 D[9,0]=143044872
Error C[10,0]=140921121 D[10,0]=136917392
Error C[11,0]=43433855 D[11,0]=74462683
Error C[12,0]=143759473 D[12,0]=89899069
Error C[13,0]=96683823 D[13,0]=75362525
Error C[14,0]=28275651 D[14,0]=78298271
Error C[15,0]=149033254 D[15,0]=138700481
Error C[16,0]=126252883 D[16,0]=66669534
Error C[17,0]=28226674 D[17,0]=51199424
Error C[18,0]=81191980 D[18,0]=144124893
Error C[19,0]=69444028 D[19,0]=61800979
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
FAILED 

Edit: I was able to find a seed that trigger the error:

./test-fger -s 355057717

Edit 2: It seems this issue is already under investigation: see issue #307 and PR #289

@cyrilbouvier
Copy link
Member Author

A more exhaustive list of intrinsics not recognized by gcc 4.9:

  • _mm512_castps512_ps256: see original post
  • _mm512_mullo_epi64: used in mullo for Simd512<int64>. Another problem is that it requires AVX512DQ but is only protected by the AVX512F macro (see fflas-ffpack cannot compile with only AVX512F #312)
  • _mm512_castpd256_pd512: used in blendv and hadd for Simd512<double>
  • _mm512_castpd512_pd256: used in blendv and hadd for Simd512<double>
  • _mm512_castps256_ps512: used in blendv and hadd for Simd512<float>
  • _mm512_castps_pd: used in hadd for Simd512<float>
  • _mm512_castsi512_pd: used in eq, lesser, lesser_eq, greater, greater_eq for Simd512<double>
  • _mm512_castsi512_ps: used in eq, lesser, lesser_eq, greater, greater_eq for Simd512<float>
  • _mm512_cmpgt_epu64_mask: used in greater for Simd512<int64>
  • _mm512_cvtepi64_pd: used in mod for Simd512<int64>. Another problem is that it requires AVX512DQ but is only protected by the AVX512F macro (see fflas-ffpack cannot compile with only AVX512F #312)
  • _mm512_cvtpd_epi64: used in mod for Simd512<int64>. Another problem is that it requires AVX512DQ but is only protected by the AVX512F macro (see fflas-ffpack cannot compile with only AVX512F #312)
  • _mm512_insertf32x8: used in hadd for for Simd512<float>. Another problem is that it requires AVX512DQ but is only protected by the AVX512F macro (see fflas-ffpack cannot compile with only AVX512F #312)

Error message: The job exceeded the maximum log length, and has been terminated.
@ClementPernet
Copy link
Member

OK to merge this PR as is in order to fix the hombrew problem.
Regarding the support for 4.9 as minimal gcc version, it seems that the numerous avx512 undefined intrinsics is a strong enough argument to move to gcc-5 as minimal supported gcc: we should not waste our time writing workarounds. Still the one you propose deserves its own PR, if it improves the current implementation.
The fger bug is already mentionned in #307 and need our attention!

@ClementPernet
Copy link
Member

LGTM . Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants