Using x64 msvc v19.29 on a AMD Ryzen 9 5900X (12 cores, AMD’s Hyper-Threading-equivalent disabled), a program parallelized using only
omp_set_num_threads( 10 ); and
#pragma omp parallel for schedule( dynamic, 1 ) seems to bind the one master and nine worker threads to different cores. Observing the Windows Task-Manager, 10 cores are 100 % utilised while two cores are more or less idle. The idle and unitised cores change for every restart of the program but the observed behaviour remains the same. Checking the thread affinity for each thread within a static OpenMP loop using the return value of
SetThreadAffinitMask( . ) shows that all threads are allowed to run on all 12 cores.
By contrast, using a C++17-based thread pool (here the master thread does not conduct any work and the 10 worker threads all of it), severe thread ping-pong can be observed and there are no quasi-idle cores. Needless to say, the performance of the OpenMP variant is significantly superior.
My question is, which mechanism is used by OpenMP to cause the observed behaviour if it is not binding the threads to different cores? How can the behaviour be replicated via the Windows API?
As a side note, I am hesitant to use OpenMP since it is not guaranteed to work with C++11 or higher constructs (to the best of my knowlede and ignoring experimental settings, Visual Studio does only support OpenMP 2.0).
Source: Windows Questions C++