Reduce "frequency" (amount of time) of tiny memory allocation "ClassC::ClassB[]::int[tiny]"

  allocation, c++, c++17, memory-management

I want to store int[] inside B and store B[] inside C.
There are also other fields that act the same way.

enter image description here

In a certain run-time situation, I know that all vector (.bf1,.bf2,.cf1,.cf2 and cs) that created in a code scope must have certain size (e.g. num_bf1,num_bf2,num_cf1,num_cf2 and num_c).

#include <iostream>
#include <vector>
class B{public:
    std::vector<int> bf1; //b's field 1   <---------
    std::vector<float> bf2;//b's field 2
    //... many of it
};
class C{public:
    std::vector<B> cf1;  // <---------
    std::vector<int> cf2;
    //... many of it
};
int main(){
   //"num_xxx" are known only after some complex algo only at run time.
   //    Their value are also different for each game time-step.
    int num_bf1=3;   // <---------
    int num_bf2=4;
    int num_cf1=2;   // <---------
    int num_cf2=8;
    int num_c=6;
    std::vector<C> cs;
    for(int m=0;m<num_c;m++){
        C c;
        for(int n=0;n<num_cf1;n++){
            B b;
            b.bf1.resize(num_bf1);   // <---------
            b.bf2.resize(num_bf2);
            c.cf1.push_back(b);      // <---------
        }
        c.cf2.resize(num_cf2);
        cs.push_back(c);  // "cs" is now finished as i wish
    }
}

However, this code lead to a bottleneck.
In real case, the memory allocation alone use ~35% of CPU time.

My poor approach

I would like to avoid it by allocate a large memory and split it.

With known values of num_xxx, I can calculate amount of int that C::B::bf1 want.

enter image description here

Therefore, I can std::vector<int> bf1; bf1.resize(num_c*num_cf1*num_bf1); .

Here is the full code :-

#include <iostream>
#include <string>
#include <vector>
#include <cmath>

class B{public:
    int* bf1; //b's field 1
    float* bf2;//b's field 2
    //... many of it
};
class C{public:
    B* cf1;
    int* cf2;
    //... many of it
};

Here is the new main :-

int main(){
    int num_bf1=3;
    int num_bf2=4;
    int num_cf1=2;
    int num_cf2=8;
    int num_c=6;
    std::vector<C> cs;
    std::vector<int> bf1;  bf1.resize(num_c*num_cf1*num_bf1);
    std::vector<float> bf2;  bf2.resize(num_c*num_cf1*num_bf2);
    std::vector<B> cf1;  cf1.resize(num_c*num_cf1);
    std::vector<int> cf2;  bf2.resize(num_c*num_cf2);
    
    for(int m=0;m<num_c;m++){
        C c;
        for(int n=0;n<num_cf1;n++){
            B* b=&(cf1[m*num_cf1+n]);
            b->bf1=&bf1[(m*num_cf1+n)*num_bf1];
            // .... something about bf2 that is hard ...
           
        }
        c.cf1=&(cf1[m*num_cf1+0]);
        // .... something about cf2 that is hard ...
        cs.push_back(c);
    }
}

This make my program much faster, but it is so hard to code, leads to a nightmare for maintainability and readability.

My poor approach V.2

I tried to use a custom stack allocator (and plan to use one-frame allocator).
The issue is that, in multithread, the mutex lock for allocator is called & blocked so often.
One solution is to create stack allocator for each thread, but as I profiled, it use too much memory.

This approach is too low level too. I think the correct solution is to fix tiny-array creation, not create a library for such bad practice(?).

Question

How to solve this tiny-array problem elegantly?

Source: Windows Questions C++

LEAVE A COMMENT