Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove generated kernel duplication #589

Merged
merged 16 commits into from
Sep 12, 2023
Merged

Conversation

jorgensd
Copy link
Member

from ufl import FiniteElement, TrialFunction, TestFunction, dx, Mesh, VectorElement, triangle, FunctionSpace, inner, grad
import ufl.algorithms
import ufl.classes
mesh = Mesh(VectorElement("Lagrange", triangle, 1))
V = FunctionSpace(mesh, FiniteElement("Lagrange", triangle, 1))
u = TrialFunction(V)
v = TestFunction(V)
t = tuple([i for i in range(1000)])
a = inner(u, v)*dx(t) + inner(grad(u), grad(v))*dx(2) + inner(u, v)*dx

generates a c file with

  // Quadrature rules
  static const double weights_48e[3] = { 0.1666666666666667, 0.1666666666666667, 0.1666666666666667 };
  // Precomputed values of basis functions and precomputations
  // FE* dimensions: [permutation][entities][points][dofs]
  static const double FE3_C0_Q48e[1][1][3][3] =
    { { { { 0.6666666666666667, 0.1666666666666666, 0.1666666666666667 },
          { 0.1666666666666667, 0.1666666666666666, 0.6666666666666665 },
          { 0.1666666666666668, 0.6666666666666665, 0.1666666666666667 } } } };
  static const double FE4_C0_D10_Q48e[1][1][1][3] = { { { { -1.0, 1.0, 0.0 } } } };
  static const double FE4_C1_D01_Q48e[1][1][1][3] = { { { { -1.0, 0.0, 1.0 } } } };
  // Quadrature loop independent computations for quadrature rule 48e
  double J_c0 = 0.0;
  double J_c3 = 0.0;
  double J_c1 = 0.0;
  double J_c2 = 0.0;
  for (int ic = 0; ic < 3; ++ic)
  {
    J_c0 += coordinate_dofs[ic * 3] * FE4_C0_D10_Q48e[0][0][0][ic];
    J_c3 += coordinate_dofs[ic * 3 + 1] * FE4_C1_D01_Q48e[0][0][0][ic];
    J_c1 += coordinate_dofs[ic * 3] * FE4_C1_D01_Q48e[0][0][0][ic];
    J_c2 += coordinate_dofs[ic * 3 + 1] * FE4_C0_D10_Q48e[0][0][0][ic];
  }
  double sp_48e[4];
  sp_48e[0] = J_c0 * J_c3;
  sp_48e[1] = J_c1 * J_c2;
  sp_48e[2] = sp_48e[0] + -1 * sp_48e[1];
  sp_48e[3] = fabs(sp_48e[2]);
  for (int iq = 0; iq < 3; ++iq)
  {
    const double fw0 = sp_48e[3] * weights_48e[iq];
    double t0[3];
    for (int i = 0; i < 3; ++i)
      t0[i] = fw0 * FE3_C0_Q48e[0][0][iq][i];
    for (int i = 0; i < 3; ++i)
      for (int j = 0; j < 3; ++j)
        A[3 * i + j] += FE3_C0_Q48e[0][0][iq][j] * t0[i];
  }

repeated 1000 times.
This makes the code generation slow and not scalable with an increasing number of subdomains.
Running ffcx with a 1000 subdomains takes about 3.8 seconds, while 10000 takes ~43 seconds.

This PR depends on:
FEniCS/ufl#92
and resolves
#447

The current PR with 1000 subdomains runs in 0.9 seconds and 10000 subdomains runs in 8 seconds.
The c-file in this branch is 570 lines with 10000 subdomains, while it is over 600 000 lines on the main branch

@coveralls
Copy link

coveralls commented Aug 11, 2023

Coverage Status

coverage: 79.883% (+0.03%) from 79.855% when pulling a199c5f on dokken/group_integrals_v2 into 019bdc7 on main.

@jorgensd
Copy link
Member Author

Note that there is now an implicit dependency in the generation of the offsets. The list in Line 102 of forms.py is dependent on the dolfinx::fem::IntegralType-ordering.
This means that if we change the IntegralType (either value for an entry or add more, the generation of offsets will be wrong).
This is exposed in:
2fda42c
and was introduced in DOLFINx in
https://github.com/FEniCS/dolfinx/pull/2744/files

@chrisrichardson
Copy link
Contributor

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed, though I am not quite sure how. One idea would be to provide a function (fixed code not generated). Is it worth it? I feel it is duplication - two ways of doing the same thing...

@jorgensd
Copy link
Member Author

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed,

As it is used in DOLFINx,

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed, though I am not quite sure how. One idea would be to provide a function (fixed code not generated). Is it worth it? I feel it is duplication - two ways of doing the same thing...

I see, it uses the ufcx.h file to get exterior_facet etc in one place in dolfinx/fem/utils.h (all other places in the file uses IntegralType::..., so that threw me off.
I think the way to fix this is instead of looping over strings when building the offset, it should base this on the ufch.x header enum, or smth.

@jorgensd
Copy link
Member Author

@chrisrichardson could we go along at get this merged (and the PR in ufl), as it is getting quite tedious to keep up to date with rewrites of ffcx, and I haven't seen anyone having any objections regarding this PR

Copy link
Contributor

@chrisrichardson chrisrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget CI changes before merging.

.github/workflows/dolfin-tests.yml Outdated Show resolved Hide resolved
.github/workflows/pythonapp.yml Outdated Show resolved Hide resolved
@mscroggs
Copy link
Member

Now that FEniCS/ufl#92 is merged, I think we need to merge this asap

@mscroggs mscroggs merged commit de84879 into main Sep 12, 2023
9 checks passed
@mscroggs mscroggs deleted the dokken/group_integrals_v2 branch September 12, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants