Category : transpose

I am trying to implement matrix transposition using shared memory. In all examples that I have come across the programmer declares __shared__ T shared_mem[WIDTH][WIDTH], where WIDTH is usually some #define‘d constant. I saw in the CUDA C programming manual that variable sized shared memory blocks are declared using extern __shared__ T shared_mem[]. So, this is ..

Read more

These are some inputs and relative desired outputs std::vector<std::string> v1{"aeiou", "bcdfghjk"}; std::vector<std::string> v2{"aeiou", "bcd"}; auto w1 = v1 | wanne_be_transpose; auto w2 = v2 | wanne_be_transpose; // w1 = {"ab","ec","id","of","ug","h","j","k"} // w2 = {"ab","ec","id","o","u"} Honestly I have no idea how to emulate it with ranges. (Even less in the general case of v1.size() > 2, ..

Read more

How to do tensor transpose in C++ I am porting Python code to C++ and I encountered this problem: in Python to transpose a tensor of shape (5, 3, 10) into shape (10, 5, 3) I can use following function: np.transpose(tensor, (2, 0, 1)) How can I do same operation in C++ either using standard ..

Read more