Assertions fail during MPI_Finalize()

  assertion, c++, hpc, openmpi

Hi I am currently debugging an OpenMPI (v 4.0) application on an HPC. During MPI_Finalize I get the following error Message (for each rank):

[fh2n0328:883488:0:883488] rc_mlx5_common.c:1097 Assertion iface->cq[UCT_IB_DIR_TX].cq_ci == uct_ib_mlx5_get_cq_ci(ib_iface->cq[UCT_IB_DIR_TX]) failed

What does this assertion mean? This assertion fails in MPI_Finalize. The Following code produces the failure:

int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();  
}

Note that the failure only occurs if i use more than 2 nodes. I run 20 threads per node. Furthermore, the failure does not occur if use calls like MPI_Comm_size and MPI_Comm_rank in stead of MPI_Barrier. If I exchange MPI_Barrier by an other MPI call which needs communication, the assertion fails too (MPI_Alltoall, MPI_Scatter…)

Source: Windows Questions C++

LEAVE A COMMENT