thread_local static variables in a dynamic loaded library – when are they created?

  c++, dynamic-loading, thread-local-storage

cppreference states the following on thread_local variables

The storage for the object is allocated when the thread begins and deallocated when the thread ends. Each thread has its own instance of the object. Only objects declared thread_local have this storage duration.

I think of using a thread_local static member variable declared inside a shared library that is loaded at runtime via dlopen / LoadLibrary. Since it’s totally possible that there are already quite a few threads running at the timepoint this library is loaded and some of them will then access that variable later, I wonder how this works if the storage is allocated when the thread begins? If the variable does not exist in the program at the timepoint the thread is created this obviously cannot work as explained there. Furthermore it seems like a waste of resources if a process running e.g. 100 threads would create an instance of that thread local variable for each of those threads if only a few of them would actually access the variable.

So, is the documentation incorrect here or are there chances that what I’m trying here might lead to undefined behavior? If the documentation is simply incorrect, where can I find a reliable description of what can be expected in reality? In case it’s implementation defined, I’m particularly interested how clang handles it on macOS and Windows.

Source: Windows Questions C++