I’m currently trying to read an excel file that contains unicode surrogate pairs. The Excel spreadsheet contains the following table.
I’m then trying to read the table via OLEDB using the following query
SELECT DISTINCT IIF([ProductName] IS NULL, 'BLANK', [ProductName]) AS [ProductName] FROM [Sheet1$]
This returns the following
In which is has treated all the surrogate pair based entries as non-distinct, which isn’t what I would expect the result to be. I would expect all the entries in the above table to be distinct.
I previously hit a snag when accessing SQL via OLEDB, but that was related to the collation of the SQL column being incorrect for distinguishing surrogate pairs.
I imagine I am doing something wrong with regards to the Excel columnfile format, but haven’t been able to solve my issue. Is there an equivalent of collation for Excel files that might fix this issue? As I’m guessing when it’s deciding if something is distinct, the comparison is treating all surrogate pair based entries as the same.
Source: Windows Questions C++