Is there a workaround fix for disastrously slow VBA7/x64 DLL calling?

  api, excel, profiling, vba, windows

I need to use the DLL calling feature from VBA, and I have recently realized there is a big issue with VBA7/x64 (VBA7/x32 may also be affected, but I do not have it installed, so I cannot verify). I use Excel 2002/VBA6/x32 as my primary dev environment, but I also run tests under Excel 2016/VBA7/x64. I came across a poor performance of the Windows API calls under the VBA7/x64 environment and started looking into the issue further.

C fixtures

To quantify the loss of performance, I have prepared a C-dll with several test stubs:

/*
** memtools.c
*/
#include "memtools.h"

// Volatile loop counter should be used here to prevent optimization.
MEMTOOLSAPI int MEMTOOLSCALL PerfGauge(unsigned int ForCount) {
  struct timeb start, end;
  ftime(&start);
  for (volatile unsigned int i=0; i < ForCount; i++) {
    ;
  }
  ftime(&end);
  const int MSEC_IN_SEC = 1000;
  int diff;
  diff = MSEC_IN_SEC * (end.time - start.time)
                     + (end.millitm - start.millitm);
  return diff;
}

MEMTOOLSAPI void MEMTOOLSCALL DummySub0Args() {
  return;
}

MEMTOOLSAPI void MEMTOOLSCALL DummySub3Args(void* Destination, const void* Source, size_t Length) {
  return;
}

MEMTOOLSAPI int MEMTOOLSCALL DummyFnc0Args() {
  volatile int Result = 10241024;
  return Result;
}

MEMTOOLSAPI int MEMTOOLSCALL DummyFnc3Args(void* Destination, const void* Source, size_t Length) {
  volatile int Result = 10241024;
  return Result;
}

The PerfGauge routine acts as a performance reference, timing an empty For loop within a C-routine. The remaining four stubs are called either from the VBA code or from a C client for comparison purposes. I have a basic C-client:

/*
** memtoolsclient.c
*/
#include "memtools.h"

void DummySub0ArgsGauge();
void DummySub3ArgsGauge();
void DummyFnc0ArgsGauge();
void DummyFnc3ArgsGauge();

int main(int argc, char** argv) { 
  DummySub0ArgsGauge();
  DummySub3ArgsGauge();
  DummyFnc0ArgsGauge();
  DummyFnc3ArgsGauge();
  return 0;
}

void DummySub0ArgsGauge() {
  void (*volatile MEMTOOLSCALL pDummySub0Args)();
  pDummySub0Args = DummySub0Args;

  struct timeb start, end;

  ftime(&start);
  for (volatile int i=0; i < 1e9; i++) {
    pDummySub0Args(); 
  }
  ftime(&end);

  const int MSEC_IN_SEC = 1000;
  int diff;
  diff = MSEC_IN_SEC * (end.time - start.time)
                     + (end.millitm - start.millitm);

  printf("nDummySub0Args - 1e9 times - %u millisecondsn", diff);
}


void DummySub3ArgsGauge() {
  char Src[] = "ABCDEFGHIJKLMNOPGRSTUVWXYZABCDEFGHIJKLMNOPGRSTUVWXYZ";
  char Dst[255];
  size_t SrcLen = sizeof(Src);

  struct timeb start, end;

  ftime(&start);
  for (volatile int i=0; i < 1e9; i++) {
    DummySub3Args(Dst, Src, SrcLen);
  }
  ftime(&end);

  const int MSEC_IN_SEC = 1000;
  int diff;
  diff = MSEC_IN_SEC * (end.time - start.time)
                     + (end.millitm - start.millitm);

  printf("nDummySub3Args - 1e9 times - %u millisecondsn", diff);
}


void DummyFnc0ArgsGauge() {
  int Result __attribute__((unused));
  struct timeb start, end;

  ftime(&start);
  for (volatile int i=0; i < 1e9; i++) {
    Result = DummyFnc0Args(); 
  }
  ftime(&end);

  const int MSEC_IN_SEC = 1000;
  int diff;
  diff = MSEC_IN_SEC * (end.time - start.time)
                     + (end.millitm - start.millitm);

  printf("nDummyFnc0Args - 1e9 times - %u millisecondsn", diff);
}


void DummyFnc3ArgsGauge() {
  char Src[] = "ABCDEFGHIJKLMNOPGRSTUVWXYZABCDEFGHIJKLMNOPGRSTUVWXYZ";
  char Dst[255];
  size_t SrcLen = sizeof(Src);

  int Result __attribute__((unused));
  struct timeb start, end;

  ftime(&start);
  for (volatile int i=0; i < 1e9; i++) {
    Result = DummyFnc3Args(Dst, Src, SrcLen);
  }
  ftime(&end);

  const int MSEC_IN_SEC = 1000;
  int diff;
  diff = MSEC_IN_SEC * (end.time - start.time)
                     + (end.millitm - start.millitm);

  printf("nDummyFnc3Args - 1e9 times - %u millisecondsn", diff);
}

and the header:

/*
** memtools.h
*/
#include <stdlib.h>
#include <stdint.h> 
#include <math.h> 
#include <stdio.h>
#include <time.h> 
#include <string.h> 

#ifndef MEMTOOLS_H
#define MEMTOOLS_H

#ifdef _WIN32

  /* You should define MEMTOOLS_EXPORTS *only* when building the DLL. */
  #ifdef MEMTOOLS_EXPORTS
    #define MEMTOOLSAPI __declspec(dllexport)
  #else
    #define MEMTOOLSAPI __declspec(dllimport)
  #endif

  /* Define calling convention in one place, for convenience. */
  #define MEMTOOLSCALL __stdcall

#else /* _WIN32 not defined. */

  /* Define with no value on non-Windows OSes. */
  #define MEMTOOLSAPI
  #define MEMTOOLSCALL

#endif /* _WIN32 */

/* Make sure functions are exported with C linkage under C++ compilers. */
#ifdef __cplusplus
extern "C" {
#endif

MEMTOOLSAPI  int MEMTOOLSCALL PerfGauge(unsigned int ForCount);
MEMTOOLSAPI void MEMTOOLSCALL DummySub0Args();
MEMTOOLSAPI void MEMTOOLSCALL DummySub3Args(void*, const void*, size_t);
MEMTOOLSAPI  int MEMTOOLSCALL DummyFnc0Args();
MEMTOOLSAPI  int MEMTOOLSCALL DummyFnc3Args(void*, const void*, size_t);

#ifdef __cplusplus
} // extern "C"
#endif

#endif /* MEMTOOLS_H */

The C code has been compiled with MSYS/MinGW toolchains on Windows with no optimization (-O0).

VBA fixtures

In VBA, I have created a class with test fixtures:

'''' ===== DllPerfLib class ===== ''''
Option Explicit

Private Const LIB_NAME As String = "DllManager"
Private Const PATH_SEP As String = ""
Private Const LIB_RPREFIX As String = _
    "Library" & PATH_SEP & LIB_NAME & PATH_SEP & _
    "Demo - DLL - STDCALL and Adapter" & PATH_SEP

Public Enum TargetTypeEnum
    TARGET_DLL = 0&
    TARGET_VBA = 1&
End Enum

#If WIN64 Then
Private Declare PtrSafe Sub DummySub0Args Lib "MemToolsLib" ()
Private Declare PtrSafe Sub DummySub3Args Lib "MemToolsLib" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long)
Private Declare PtrSafe Function DummyFnc0Args Lib "MemToolsLib" () As Long
Private Declare PtrSafe Function DummyFnc3Args Lib "MemToolsLib" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long) As Long
Private Declare PtrSafe Function PerfGauge Lib "MemToolsLib" (ByVal ForCount As Long) As Long
#Else
Private Declare Sub DummySub0Args Lib "MemToolsLib" ()
Private Declare Sub DummySub3Args Lib "MemToolsLib" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long)
Private Declare Function DummyFnc0Args Lib "MemToolsLib" () As Long
Private Declare Function DummyFnc3Args Lib "MemToolsLib" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long) As Long
Private Declare Function PerfGauge Lib "MemToolsLib" (ByVal ForCount As Long) As Long
#End If

Private Type TDllPerfLib
    DllMan As DllManager
    DummyForCount As Long
    GaugeForCount As Long
    PrintToImmediate As Boolean
    Src() As Byte
    Dst() As Byte
    SrcLen As Long
End Type
Private this As TDllPerfLib


Public Function Create( _
            Optional ByVal DummyForCount As Long = 10000000, _
            Optional ByVal GaugeForCount As Long = 10000000) As DllPerfLib
    Dim Instance As DllPerfLib
    Set Instance = New DllPerfLib
    Instance.Init DummyForCount, GaugeForCount
    Set Create = Instance
End Function

Friend Sub Init(Optional ByVal DummyForCount As Long = 10000000, _
                Optional ByVal GaugeForCount As Long = 10000000)
    With this
        .DummyForCount = DummyForCount
        .GaugeForCount = GaugeForCount
        .PrintToImmediate = True
    End With
    Set this.DllMan = DllManager.Singleton
    If DllManager.Singleton Is Nothing Then LoadDlls

    this.Src = "ABCDEFGHIJKLMNOPGRSTUVWXYZ"
    this.Dst = String(255, "_")
    this.SrcLen = (UBound(this.Src) - LBound(this.Src) + 1 + Len(vbNullChar)) * 2
End Sub

Private Sub Class_Terminate()
    UnLoadDlls
End Sub

Public Property Get DummyForCount() As Long
    DummyForCount = this.DummyForCount
End Property

Public Property Let DummyForCount(ByVal Value As Long)
    this.DummyForCount = Value
End Property

Public Property Get GaugeForCount() As Long
    GaugeForCount = this.GaugeForCount
End Property

Public Property Let GaugeForCount(ByVal Value As Long)
    this.GaugeForCount = Value
End Property

Public Sub TogglePrint()
    this.PrintToImmediate = Not this.PrintToImmediate
End Sub

Private Sub LoadDlls()
    Dim DllPath As String
    DllPath = ThisWorkbook.Path & PATH_SEP & LIB_RPREFIX & "memtools" & ARCH
    Dim DllName As String
    DllName = "MemToolsLib.dll"
    Set this.DllMan = DllManager.Create(DllPath, DllName, True)
End Sub

Private Sub UnLoadDlls()
    this.DllMan.ForgetSingleton
    this.DllMan.FreeMultiple
    Set this.DllMan = Nothing
End Sub

Public Function PerfGaugeGet(Optional ByVal GaugeForCount As Long = -1) As Long
    Dim GaugeMax As Long
    GaugeMax = IIf(GaugeForCount > 0, GaugeForCount, this.GaugeForCount)
    Dim TimeDiffMs As Long
    TimeDiffMs = PerfGauge(GaugeMax)
    If this.PrintToImmediate Then
        Debug.Print "PerfGauge" & ":" & " - " & Format$(GaugeMax, "#,##0") & _
            " times in " & TimeDiffMs & " ms"
    End If
    PerfGaugeGet = TimeDiffMs
End Function

Public Function Sub0ArgsDLLVBA(Optional ByVal DummyForCount As Long = -1, _
            Optional ByVal TargetType As TargetTypeEnum = TARGET_DLL) As Long
    Dim DummyMax As Long
    DummyMax = IIf(DummyForCount > 0, DummyForCount, this.DummyForCount)
    Dim CycleIndex As Long
    Dim Start As Single
    Start = Timer
    If TargetType = TARGET_DLL Then
        For CycleIndex = 0 To DummyMax
            DummySub0Args
        Next CycleIndex
    Else
        For CycleIndex = 0 To DummyMax
            DummySub0ArgsVBA
        Next CycleIndex
    End If
    Dim TimeDiffMs As Long
    TimeDiffMs = Round((Timer - Start) * 1000, 0)
    Dim Source As String
    Source = "Sub0ArgsDLLVBA/" & Array("DLL", "VBA")(TargetType)
    If this.PrintToImmediate Then
        Debug.Print Source & ":" & " - " & Format$(DummyMax, "#,##0") _
            & " times in " & TimeDiffMs & " ms"
    End If
    Sub0ArgsDLLVBA = TimeDiffMs
End Function

Public Function Sub3ArgsDLLVBA(Optional ByVal DummyForCount As Long = -1, _
            Optional ByVal TargetType As TargetTypeEnum = TARGET_DLL) As Long
    Dim Src() As Byte
    Src = this.Src
    Dim Dst() As Byte
    Dst = this.Dst
    Dim SrcLen As Long
    SrcLen = this.SrcLen
    Dim DummyMax As Long
    DummyMax = IIf(DummyForCount > 0, DummyForCount, this.DummyForCount)
    Dim CycleIndex As Long
    Dim Start As Single
    Start = Timer
    If TargetType = TARGET_DLL Then
        For CycleIndex = 0 To DummyMax
            DummySub3Args Dst(0), Src(0), SrcLen
        Next CycleIndex
    Else
        For CycleIndex = 0 To DummyMax
            DummySub3ArgsVBA Dst(0), Src(0), SrcLen
        Next CycleIndex
    End If
    Dim TimeDiffMs As Long
    TimeDiffMs = Round((Timer - Start) * 1000, 0)
    If this.PrintToImmediate Then
        Debug.Print "Sub3ArgsDLLVBA" & ":" & " - " & Format$(DummyMax, "#,##0") _
            & " times in " & TimeDiffMs & " ms"
    End If
    Sub3ArgsDLLVBA = TimeDiffMs
End Function

Public Function Fnc0ArgsDLLVBA(Optional ByVal DummyForCount As Long = -1, _
            Optional ByVal TargetType As TargetTypeEnum = TARGET_DLL) As Long
    Dim Result As Long
    Dim DummyMax As Long
    DummyMax = IIf(DummyForCount > 0, DummyForCount, this.DummyForCount)
    Dim CycleIndex As Long
    Dim Start As Single
    Start = Timer
    If TargetType = TARGET_DLL Then
        For CycleIndex = 0 To DummyMax
            Result = DummyFnc0Args
        Next CycleIndex
    Else
        For CycleIndex = 0 To DummyMax
            Result = DummyFnc0ArgsVBA
        Next CycleIndex
    End If
    Dim TimeDiffMs As Long
    TimeDiffMs = Round((Timer - Start) * 1000, 0)
    If this.PrintToImmediate Then
        Debug.Print "Fnc0ArgsDLLVBA" & ":" & " - " & Format$(DummyMax, "#,##0") _
            & " times in " & TimeDiffMs & " ms"
    End If
    Fnc0ArgsDLLVBA = TimeDiffMs
End Function

Public Function Fnc3ArgsDLLVBA(Optional ByVal DummyForCount As Long = -1, _
            Optional ByVal TargetType As TargetTypeEnum = TARGET_DLL) As Long
    Dim Src() As Byte
    Src = this.Src
    Dim Dst() As Byte
    Dst = this.Dst
    Dim SrcLen As Long
    SrcLen = this.SrcLen
    Dim Result As Long
    Dim DummyMax As Long
    DummyMax = IIf(DummyForCount > 0, DummyForCount, this.DummyForCount)
    Dim CycleIndex As Long
    Dim Start As Single
    Start = Timer
    If TargetType = TARGET_DLL Then
        For CycleIndex = 0 To DummyMax
            Result = DummyFnc3Args(Dst(0), Src(0), SrcLen)
        Next CycleIndex
    Else
        For CycleIndex = 0 To DummyMax
            Result = DummyFnc3ArgsVBA(Dst(0), Src(0), SrcLen)
        Next CycleIndex
    End If
    Dim TimeDiffMs As Long
    TimeDiffMs = Round((Timer - Start) * 1000, 0)
    If this.PrintToImmediate Then
        Debug.Print "Fnc3ArgsDLLVBA" & ":" & " - " & Format$(DummyMax, "#,##0") _
            & " times in " & TimeDiffMs & " ms"
    End If
    Fnc3ArgsDLLVBA = TimeDiffMs
End Function

Private Sub DummySub0ArgsVBA()
End Sub

Private Sub DummySub3ArgsVBA(ByRef Destination As Byte, _
            ByRef Source As Byte, ByVal Length As Long)
End Sub

Private Function DummyFnc0ArgsVBA() As Long
    Dim Result As Long
    Result = 10241024
    DummyFnc0ArgsVBA = Result
End Function

Private Function DummyFnc3ArgsVBA(ByRef Destination As Byte, _
            ByRef Source As Byte, ByVal Length As Long) As Long
    Dim Result As Long
    Result = 10241024
    DummyFnc3ArgsVBA = Result
End Function

and a standard module running the tests:

'''' ===== DllPerfRun module ===== ''''
Option Explicit

Private Sub Runner()
    Dim GaugeMax As Long
    GaugeMax = 10 ^ 9
    Dim DummyMax As Long
    DummyMax = 10 ^ 7
    
    Dim PerfTool As DllPerfLib
    Set PerfTool = DllPerfLib.Create(DummyMax, GaugeMax)
    
    Dim TimeDiffMs As Long
    Dim LoopIndex As Long
    With PerfTool
        .TogglePrint
        
        Dim AverageCountGAU As Long
        Dim AverageCountDLL As Long
        Dim AverageCountVBA As Long
        AverageCountGAU = 20
        AverageCountDLL = 2
        AverageCountVBA = 10
        '''' ========== PerfGauge ========== ''''
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountGAU
            TimeDiffMs = TimeDiffMs + .PerfGaugeGet
        Next LoopIndex
        If AverageCountGAU > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountGAU
            Debug.Print "PerfGauge" & ":" & " - " & Format$(GaugeMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        '''' ---------- PerfGauge ---------- ''''
    
        '''' ========== Sub0ArgsDLLVBA ========== ''''
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountDLL
            TimeDiffMs = TimeDiffMs + .Sub0ArgsDLLVBA(, TARGET_DLL)
        Next LoopIndex
        If AverageCountDLL > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountDLL
            Debug.Print "Sub0ArgsDLLVBA/DLL" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountVBA
            TimeDiffMs = TimeDiffMs + .Sub0ArgsDLLVBA(, TARGET_VBA)
        Next LoopIndex
        If AverageCountVBA > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountVBA
            Debug.Print "Sub0ArgsDLLVBA/VBA" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        '''' ---------- Sub0ArgsDLLVBA ---------- ''''
    
        '''' ========== Sub3ArgsDLLVBA ========== ''''
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountDLL
            TimeDiffMs = TimeDiffMs + .Sub3ArgsDLLVBA(, TARGET_DLL)
        Next LoopIndex
        If AverageCountDLL > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountDLL
            Debug.Print "Sub3ArgsDLLVBA/DLL" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountVBA
            TimeDiffMs = TimeDiffMs + .Sub3ArgsDLLVBA(, TARGET_VBA)
        Next LoopIndex
        If AverageCountVBA > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountVBA
            Debug.Print "Sub3ArgsDLLVBA/VBA" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        '''' ---------- Sub3ArgsDLLVBA ---------- ''''
    
        '''' ========== Fnc0ArgsDLLVBA ========== ''''
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountDLL
            TimeDiffMs = TimeDiffMs + .Fnc0ArgsDLLVBA(, TARGET_DLL)
        Next LoopIndex
        If AverageCountDLL > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountDLL
            Debug.Print "Fnc0ArgsDLLVBA/DLL" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountVBA
            TimeDiffMs = TimeDiffMs + .Fnc0ArgsDLLVBA(, TARGET_VBA)
        Next LoopIndex
        If AverageCountVBA > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountVBA
            Debug.Print "Fnc0ArgsDLLVBA/VBA" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        '''' ---------- Fnc0ArgsDLLVBA ---------- ''''
    
        '''' ========== Fnc3ArgsDLLVBA ========== ''''
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountDLL
            TimeDiffMs = TimeDiffMs + .Fnc3ArgsDLLVBA(, TARGET_DLL)
        Next LoopIndex
        If AverageCountDLL > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountDLL
            Debug.Print "Fnc3ArgsDLLVBA/DLL" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        
        TimeDiffMs = 0
        For LoopIndex = 1 To AverageCountVBA
            TimeDiffMs = TimeDiffMs + .Fnc3ArgsDLLVBA(, TARGET_VBA)
        Next LoopIndex
        If AverageCountVBA > 0 Then
            TimeDiffMs = TimeDiffMs / AverageCountVBA
            Debug.Print "Fnc3ArgsDLLVBA/VBA" & ":" & " - " & Format$(DummyMax, "#,##0") & _
                " times in " & TimeDiffMs & " ms"
        End If
        DoEvents
        '''' ---------- Fnc3ArgsDLLVBA ---------- ''''
    End With
End Sub

The DllPerfLib class above also includes VBA stubs with signatures matching those in the DLL. The VBA code calls the VBA stubs to assess the native routine calling performance and the twin stubs in the DLL to assess the performance of these calls. Separately, the C client is also executed, timing the calls to the same DLL stubs.


Test results

Table 1. Time in seconds required for completion of 109 repetitions

Well, I can’t include a visual, thanks SO!

C-client timings

The leftmost column of Table 1 contains the PerfGauge timing. While I am not examining the disassembled code (which is a prudent thing to do), an empty unoptimized C-language For loop should require at least three machine instructions:

  • increment the loop variable,
  • compare the loop variable with the target,
  • perform a conditional jump.

On a 2.2 GHz multi-core processor with dynamic frequency adjustment (Intel Core i7-8750H @2.2GHz), the number of 2.1 s for 109 repetitions, therefore, appears to be qualitatively reasonable. The second column show timing for calling DummySub0Args from the C-client (see DummySub0ArgsGauge routine). I expect a higher value here, and I do not have sufficient experience to explain it. Nevertheless, the primary focus of this question is due to the results in the right half of the table.

VBA timings

The green rectangle highlights the efficiency of calling a DLL routine from VBA6/x32/Excel 2002. This number indicates that a DLL call taking no arguments and returning no value is only 5x times slower than the same call from a compiled C-client. Further, this call is 7x times faster than calling a native VBA routine with the same signature. When the called routine either takes arguments or returns a value, the difference is less pronounced. Still, with the other three implemented mock calls, the tendency is qualitatively similar.

Native VBA calls under VBA7/x64/Excel 2016 appear to be slightly faster, but DLL calls under this environment are more than two orders of magnitude slower! Is there a fix for this problem?

Source: Windows Questions

LEAVE A COMMENT