Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with calling local functions inside parallel_for functions #664

Closed
dineshadepu opened this issue Aug 15, 2023 · 3 comments
Closed

Help with calling local functions inside parallel_for functions #664

dineshadepu opened this issue Aug 15, 2023 · 3 comments
Labels
question Further information is requested

Comments

@dineshadepu
Copy link
Contributor

Hi,

I need some help regarding calling functinos inside the parallel_for and
neighbor_parallel_for loops. I use these functions to compute the common
interaction terms in SPH and DEM and other numerical techniques.

To explain my problem in simple terms, I have created this simple example.

void compute_square_scalar(double x, double y, double * scalar_quantity){
  *scalar_quantity = x + y;
  return;
}


void compute_square_vector(double a, double *x, double *result){
  result[0] = a * x[0];
  result[1] = a * x[1];
  result[2] = a * x[2];
  return;
}



void fluid_stage_1(AoSoAType aosoa, double dt, int * limits){
  auto aosoa_pos = Cabana::slice<0>     ( aosoa,    "aosoa_pos");
  auto aosoa_vel = Cabana::slice<2>          ( aosoa,    "aosoa_vel");

  auto half_dt = dt * 0.5;

  auto fluid_stage_1_lambda_func = KOKKOS_LAMBDA( const int i )
    {
      double foo = 0.;
      compute_square_scalar(1., 2., &foo);


      double x_ij_vec[3] = {1., 2., 3.};
      double foo_vec[3] = {0., 0., 0.};
      double a = 5.;
      compute_square_vector(a, x_ij_vec, foo_vec);

      aosoa_pos(i, 0) += aosoa_vel( i, 0 ) * half_dt + foo + foo_vec[0];
      aosoa_pos(i, 1) += aosoa_vel( i, 1 ) * half_dt + foo + foo_vec[1];
      aosoa_pos(i, 2) += aosoa_vel( i, 2 ) * half_dt + foo + foo_vec[2];
    };
  Kokkos::RangePolicy<ExecutionSpace> policy( limits[0], limits[1] );
  Kokkos::parallel_for( "CabanaSPH:Integrator:FluidStage1", policy,
                        fluid_stage_1_lambda_func );
}

Here, I am calling compute_square_scalar and compute_square_vector to compute
the required quantities. These functions are dummy, but I use similar type function
in my code. I pass a vector quantity to the function and compute the required quantity
inside the function body.

The code is working on CPU and OpenMP mode, however it doesn't work on
GPU. I get the following warning,

warning #20011-D: calling a __host__ function("compute_square_vector(double, double *, double *)") from a __host__ __device__ function("fluid_stage_1( ::Cabana::AoSoA< ::Cabana::MemberTypes<double [3], int, double [3] > ,  ::Kokkos::Device< ::Kokkos::Cuda,  ::Kokkos::CudaSpace> , (int)8,  ::Kokkos::MemoryTraits<(unsigned int)0u> > , double, int *)::[lambda(int) (instance 1)]::operator () const") is not allowed

which is I understand that the functino has to be __device__ type.

Is there any way to solve this. I have looked up on the documentation online, I
came across KOKKOS_INLINE_FUNCTION. However I couldn't understand it
completely. Are there any places in Cabana, where local functions are used to
inside the Kokkos bodies.

Many thanks for the help.

@dineshadepu
Copy link
Contributor Author

Here is the full code for reference,

/****************************************************************************
 * Copyright (c) 2018-2022 by the Cabana authors                            *
 * All rights reserved.                                                     *
 *                                                                          *
 * This file is part of the Cabana library. Cabana is distributed under a   *
 * BSD 3-clause license. For the licensing terms see the LICENSE file in    *
 * the top-level directory.                                                 *
 *                                                                          *
 * SPDX-License-Identifier: BSD-3-Clause                                    *
 ****************************************************************************/

#include <Cabana_Core.hpp>

#include <iostream>


/*
  Start by declaring the types the particles will store. The first element
  will represent the coordinates, the second will be the particle's ID, the
  third velocity, and the fourth the radius of the particle.
*/

using DataTypes = Cabana::MemberTypes<double[3], // position(0)
                                      int, // ids(1)
                                      double[3] // velocity(2)
                                      >;

/*
  Next declare the data layout of the AoSoA. We use the host space here
  for the purposes of this example but all memory spaces, vector lengths,
  and member type configurations are compatible.
*/
const int VectorLength = 8;
// using MemorySpace = Kokkos::HostSpace;
// using ExecutionSpace = Kokkos::DefaultHostExecutionSpace;
using ExecutionSpace = Kokkos::Cuda;
using MemorySpace = ExecutionSpace::memory_space;
using DeviceType = Kokkos::Device<ExecutionSpace, MemorySpace>;
using DeviceType = Kokkos::Device<ExecutionSpace, MemorySpace>;
using AoSoAType = Cabana::AoSoA<DataTypes, DeviceType, VectorLength>;

// auto aosoa_position = Cabana::slice<0>     ( aosoa,    "aosoa_position");
// auto aosoa_ids = Cabana::slice<1>          ( aosoa,    "aosoa_ids");
// auto aosoa_velocity = Cabana::slice<2>     ( aosoa,    "aosoa_velocity");


void compute_square_scalar(double x, double y, double * scalar_quantity){
  *scalar_quantity = x + y;
  return;
}


void compute_square_vector(double a, double *x, double *result){
  result[0] = a * x[0];
  result[1] = a * x[1];
  result[2] = a * x[2];
  return;
}



void fluid_stage_1(AoSoAType aosoa, double dt, int * limits){
  auto aosoa_pos = Cabana::slice<0>     ( aosoa,    "aosoa_pos");
  auto aosoa_vel = Cabana::slice<2>          ( aosoa,    "aosoa_vel");

  auto half_dt = dt * 0.5;

  auto fluid_stage_1_lambda_func = KOKKOS_LAMBDA( const int i )
    {
      double foo = 0.;
      compute_square_scalar(1., 2., &foo);


      double x_ij_vec[3] = {1., 2., 3.};
      double foo_vec[3] = {0., 0., 0.};
      double a = 5.;
      compute_square_vector(a, x_ij_vec, foo_vec);

      aosoa_pos(i, 0) += aosoa_vel( i, 0 ) * half_dt + foo + foo_vec[0];
      aosoa_pos(i, 1) += aosoa_vel( i, 1 ) * half_dt + foo + foo_vec[1];
      aosoa_pos(i, 2) += aosoa_vel( i, 2 ) * half_dt + foo + foo_vec[2];
    };
  Kokkos::RangePolicy<ExecutionSpace> policy( limits[0], limits[1] );
  Kokkos::parallel_for( "CabanaSPH:Integrator:FluidStage1", policy,
                        fluid_stage_1_lambda_func );
}


void run()
{
  int comm_rank = -1;
  MPI_Comm_rank( MPI_COMM_WORLD, &comm_rank );

  if ( comm_rank == 0 )
    std::cout << "Cabana Rigid body solver example\n" << std::endl;

  auto num_particles = 16;
  AoSoAType aosoa( "particles", num_particles );
  auto aosoa_pos = Cabana::slice<0>     ( aosoa,    "aosoa_pos");
  auto aosoa_vel = Cabana::slice<2>          ( aosoa,    "aosoa_vel");
  Cabana::deep_copy( aosoa_pos, 0. );
  Cabana::deep_copy( aosoa_vel, 1. );

  auto dt = 3.;

  int fluid_limits[2] = {0, 12};
  for ( std::size_t i = 0; i < aosoa_pos.size()+1000000; ++i )
    {
      fluid_stage_1(aosoa, 2. * dt, fluid_limits);
      // std::cout << "\n";
      // std::cout << "position of " << i << " is " << aosoa_pos( i, 0) << ", " << aosoa_pos( i, 1 ) << ", " << aosoa_pos( i, 2 ) << "\n";
    }

}

int main( int argc, char* argv[] )
{

  MPI_Init( &argc, &argv );
  Kokkos::initialize( argc, argv );

  run();

  Kokkos::finalize();

  MPI_Finalize();
  return 0;
}

@streeve
Copy link
Member

streeve commented Aug 15, 2023

You are correct, you need to add KOKKOS_INLINE_FUNCTION to the function signature in order to call within a parallel kernel:

KOKKOS_INLINE_FUNCTION
void compute_square_scalar() 
{ ...

This is used in many places throughout the library

@streeve streeve added the question Further information is requested label Aug 15, 2023
@dineshadepu
Copy link
Contributor Author

I am able to make it work. This a dream come true for me. Thank you so much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants