Faster alternatives to sorting std::vector

  alignment, c++, caching, cpu, hashmap

I have a hash_map with an unsinged int (user id) as the key and a vector of dates as the value.
I.e.

// key - user id, value - vector of dates from 2020
hash_map<unsigned, vector<unsigned short>>;

[1] = { 14 }            // 2020/1/14
[2] = { 18, 25, 32 }    // 2020/01/18, 2020/01/25, 2020/02/01

These values in a vector represent when a user will be notified, but there’s one rule: you cannot notify the same user more than twice (can be changed in a config) in a week. To check if I can add a new notify date to a user I search for the first date that has less than 7 days difference with the inserting one, then check if the next date in the vector has more than 7 days difference and insert a new element if true. After inserting I sort the vector. Every 24 hours I clear all elements up to today().

The question is:
I need to do something with the value type, as it’s what makes everything work slow (in average, there’re 300.000.000 changes per day, i.e. the same amount of linear searching and around 25M sorting). Do you have any ideas what structure, datatype or an algorithm to use to get rid of sort()?

Using unordered_set is a bad idea.

  1. linear search in a vector is faster if works with less than ~100 elements compared to unordered_set, and I don’t have more than 5 dates per a person usually.
  2. I can’t allow to waste 3 times more RAM to store such a structure.
  3. I can’t align unordered_set to the 64Bytes of L1 processor cache.

p.s. My hash map takes 470mb memory for the map and keys plus 304mb for the values.

Source: Windows Questions C++

LEAVE A COMMENT