|
| 1 | +# Performance Optimizations Summary |
| 2 | + |
| 3 | +This PR implements 5 targeted optimizations to the data fetching hot path in `ddbc_bindings.cpp`, focusing on eliminating redundant work and reducing overhead in the row construction loop. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## ✅ OPTIMIZATION #1: Direct PyUnicode_DecodeUTF16 for NVARCHAR Conversion (Linux/macOS) |
| 8 | + |
| 9 | +**Commit:** 081f3e2 |
| 10 | + |
| 11 | +### Problem |
| 12 | +On Linux/macOS, fetching `NVARCHAR` columns performed a double conversion: |
| 13 | +1. `SQLWCHAR` (UTF-16) → `std::wstring` via `SQLWCHARToWString()` (character-by-character with endian swapping) |
| 14 | +2. `std::wstring` → Python unicode via pybind11 |
| 15 | + |
| 16 | +This created an unnecessary intermediate `std::wstring` allocation and doubled the conversion work. |
| 17 | + |
| 18 | +### Solution |
| 19 | +Replace the two-step conversion with a single call to Python's C API `PyUnicode_DecodeUTF16()`: |
| 20 | +- **Before**: `SQLWCHAR` → `std::wstring` → Python unicode (2 conversions + intermediate allocation) |
| 21 | +- **After**: `SQLWCHAR` → Python unicode via `PyUnicode_DecodeUTF16()` (1 conversion, no intermediate) |
| 22 | + |
| 23 | +### Code Changes |
| 24 | +```cpp |
| 25 | +// BEFORE (Linux/macOS) |
| 26 | +std::wstring wstr = SQLWCHARToWString(wcharData, numCharsInData); |
| 27 | +row[col - 1] = wstr; |
| 28 | + |
| 29 | +// AFTER (Linux/macOS) |
| 30 | +PyObject* pyStr = PyUnicode_DecodeUTF16( |
| 31 | + reinterpret_cast<const char*>(wcharData), |
| 32 | + numCharsInData * sizeof(SQLWCHAR), |
| 33 | + NULL, NULL |
| 34 | +); |
| 35 | +if (pyStr) { |
| 36 | + row[col - 1] = py::reinterpret_steal<py::object>(pyStr); |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +### Impact |
| 41 | +- ✅ Eliminates one full conversion step per `NVARCHAR` cell |
| 42 | +- ✅ Removes intermediate `std::wstring` memory allocation |
| 43 | +- ✅ Platform-specific: Only benefits Linux/macOS (Windows already uses native `wchar_t`) |
| 44 | +- ⚠️ **Does NOT affect regular `VARCHAR`/`CHAR` columns** (already optimal with direct `py::str()`) |
| 45 | + |
| 46 | +### Affected Data Types |
| 47 | +- `SQL_WCHAR`, `SQL_WVARCHAR`, `SQL_WLONGVARCHAR` (wide-character strings) |
| 48 | +- **NOT** `SQL_CHAR`, `SQL_VARCHAR`, `SQL_LONGVARCHAR` (regular strings - unchanged) |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## 🔜 OPTIMIZATION #2: Direct Python C API for Numeric Types |
| 53 | +*Coming next...* |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## 🔜 OPTIMIZATION #3: Metadata Prefetch Caching |
| 58 | +*Coming next...* |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 🔜 OPTIMIZATION #4: Batch Row Allocation |
| 63 | +*Coming next...* |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## 🔜 OPTIMIZATION #5: Function Pointer Dispatch |
| 68 | +*Coming next...* |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## Testing |
| 73 | +All optimizations: |
| 74 | +- ✅ Build successfully on macOS (Universal2) |
| 75 | +- ✅ Maintain backward compatibility |
| 76 | +- ✅ Preserve existing functionality |
| 77 | +- 🔄 CI validation pending (Windows, Linux, macOS) |
| 78 | + |
| 79 | +## Files Modified |
| 80 | +- `mssql_python/pybind/ddbc_bindings.cpp` - Core optimization implementations |
| 81 | +- `OPTIMIZATION_PR_SUMMARY.md` - This document |
0 commit comments