CUDA kernel stops running when reaching class constructor

  c++, cuda, pycuda

My kernel below stops running when it reaches the call to ray constructor and returns with no error. If I add a printf before and after that call only the pribtf before the call apears on the console. Can anyone guide me why this is happening? my ray class is define with device host so expected it to work fine.

__global__ void init_Frame_Buffer(Vec3* Frame_Buffer, Vec3 lower_left, Vec3 horizontal, Vec3 vertical, Vec3 origin) {

    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    int p = y * width + x;
    float u = (float)(x) / width;
    float v = (float)(y) / height;

    if (x < width && y < height ) {
        ray r(Origin, lower_left + horizontal * u + vertical * v, false);
        r.colour() = ray_color(r);
        Frame_Buffer[p] = r.colour();


Source: Windows Questions C++