Category : gpu

I’m making a C++ cuda program, in which (as far as I know): cudaMallocManaged(&data, size); and cudaFree(&data); is analogous to malloc/new and delete. Say I have a class struct vec{ int size; int* data; vec(int size){ cudaMallocManaged(&data, x * y * sizeof(int)); } }; and I overload the + operator: vec operator+(vec b){ vec c; ..

Read more

Question: When a DX11 application executes Map/Unmap with D3D11_MAP_WRITE_NO_OVERWRITE, is there a way to efficiently know which region of the provided buffer was modified? Situation: I hooked all DX11 calls of “Heroes of the storm” game and made it playable on low-ends. https://www.reddit.com/r/heroesofthestorm/comments/g3piro/i_reprogrammed_hots_so_you_can_play_it_on_a_poor/ The game creates 2MB vertex and 0.5MB index buffer for all the ..

Read more

I am trying to rewrite an algorithm parallelized with OpenMP to try out the target accel. device capabilities of OMP. I stumbled over the following problem (see example) when trying to use Eigen (3.4 rc1) within the OMP construct: Minimal Example #include <iostream> #include <Eigen/Eigen> #include <cmath> using Eigen::MatrixXd; int main() { int n = ..

Read more

I was trying to detect shared memory bank conflicts for matrix transposition kernels. The first kernel performs matrix transposition without padding, and hence should have bank conflicts, while the second kernel uses padding, and should not have bank conflicts. However, profiling with NSight Compute in the memory workload section shows 0 bank conflicts for both ..

Read more