CUDA access matrix stored in RAM and possibility of being implemented

  c++, cuda, numerical-computing

Recently I started working with numerical computation and solving mathematical problems numerically, programing in C++ with OpenMP. But now my problem is to big and take days to solve even parallelized. So, I’m thinking in start learning CUDA to reduce the time, but I have some doubts.

The heart of my code is the following function. The entries are two pointes to vectors. N_mesh_points_x,y,z are integers pre-defined, weights_x,y,z are column matrices, kern_1 is an exponential function and table_kernel is a function who access a 50 Gb matrix stored in RAM and pre calculated.

void Kernel::paralel_iterate(std::vector<double>* K1, std::vector<double>* K2 )
{
  double r, sum_1 = 0 , sum_2 = 0;
  double phir;

    for (int l = 0; l < N_mesh_points_x; l++){
      for (int m = 0; m < N_mesh_points_y; m++){
        for (int p = 0; p < N_mesh_points_z; p++){
        sum_1 = 0;
        sum_2 = 0;

        #pragma omp parallel for schedule(dynamic) private(phir) reduction(+: sum_1,sum_2)
        for (int i = 0; i < N_mesh_points_x; i++){
          for (int j = 0; j < N_mesh_points_y; j++){
            for (int k = 0; k < N_mesh_points_z; k++){
               
               if (!(i==l) || !(j==m) || !(k==p)){
               phir = weights_x[i]*weights_y[j]*weights_z[k]*kern_1(i,j,k,l,m,p);
               sum_1 += phir * (*K1)[position(i,j,k)];
               sum_2 += phir;
              }

             }
           }
         }
        (*K2)[ position(l,m,p)] = sum_1 + (table_kernel[position(l,m,p)] - sum_2) * (*K1)[position (l,m,p)];
    }
  }
}

return;
}

My questions are:

  • Can I program, at least the central part of this function, in CUDA? I only parallelized with OpenMP the internals loops because was giving the wrong answer when I parallelized all the loops.
  • The function table_kernel who access a big matrix, the matrix is to big to be stored in the memory of my video card, so the file will stay in RAM. This is a problem? The CUDA can access easily the files in RAM? Or this can’t be done and all the files needed to be stored inside video card?

Source: Windows Questions C++

LEAVE A COMMENT