Skip to content

Commit e1e827a

Browse files
committed
docs: Update OPTIMIZATION_PR_SUMMARY with OPT #4 details
1 parent 55fb898 commit e1e827a

File tree

1 file changed

+166
-2
lines changed

1 file changed

+166
-2
lines changed

OPTIMIZATION_PR_SUMMARY.md

Lines changed: 166 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,172 @@ if (buffers.indicators[col - 1][i] == SQL_NULL_DATA) {
115115

116116
---
117117

118-
## 🔜 OPTIMIZATION #4: Batch Row Allocation
119-
*Coming next...*
118+
## ✅ OPTIMIZATION #4: Batch Row Allocation with Direct Python C API
119+
120+
**Commit:** 55fb898
121+
122+
### Problem
123+
Row creation and assignment involved multiple layers of pybind11 overhead:
124+
```cpp
125+
for (SQLULEN i = 0; i < numRowsFetched; i++) {
126+
py::list row(numCols); // ❌ pybind11 wrapper allocation
127+
128+
// Populate cells...
129+
row[col - 1] = value; // ❌ pybind11 operator[] with bounds checking
130+
131+
rows[initialSize + i] = row; // ❌ pybind11 list assignment + refcount overhead
132+
}
133+
```
134+
135+
**Overhead breakdown:**
136+
1. **Row allocation**: `py::list(numCols)` creates pybind11 wrapper object (~15 cycles)
137+
2. **Cell assignment** (non-numeric types): `row[col-1] = value` uses `operator[]` with bounds checking (~10-15 cycles)
138+
3. **Final assignment**: `rows[i] = row` goes through pybind11 list `__setitem__` (~15-20 cycles)
139+
4. **Fragmented**: 1,000 separate `py::list()` constructor calls
140+
141+
**Total cost:** ~40-50 cycles per row × 1,000 rows = **40K-50K wasted cycles per batch**
142+
143+
### Solution
144+
**Complete transition to direct Python C API** for row and cell management:
145+
```cpp
146+
for (SQLULEN i = 0; i < numRowsFetched; i++) {
147+
PyObject* row = PyList_New(numCols); // ✅ Direct Python C API
148+
149+
// Populate cells using direct API...
150+
PyList_SET_ITEM(row, col - 1, pyValue); // ✅ Macro - no bounds check
151+
152+
PyList_SET_ITEM(rows.ptr(), initialSize + i, row); // ✅ Direct transfer
153+
}
154+
```
155+
156+
**Key changes:**
157+
- `PyList_New(numCols)` creates list directly (no wrapper object)
158+
- `PyList_SET_ITEM(row, col, value)` is a **macro** that expands to direct array access
159+
- Final assignment transfers ownership without refcount churn
160+
161+
### Code Changes
162+
163+
**Before (mixed pybind11 + C API):**
164+
```cpp
165+
py::list row(numCols); // pybind11 wrapper
166+
167+
// NULL handling
168+
row[col - 1] = py::none();
169+
170+
// Strings
171+
row[col - 1] = py::str(data, len);
172+
173+
// Complex types
174+
row[col - 1] = PythonObjectCache::get_datetime_class()(...);
175+
176+
// Final assignment
177+
rows[initialSize + i] = row;
178+
```
179+
180+
**After (pure Python C API):**
181+
```cpp
182+
PyObject* row = PyList_New(numCols); // Direct C API
183+
184+
// NULL handling
185+
Py_INCREF(Py_None);
186+
PyList_SET_ITEM(row, col - 1, Py_None);
187+
188+
// Strings
189+
PyObject* pyStr = PyUnicode_FromStringAndSize(data, len);
190+
PyList_SET_ITEM(row, col - 1, pyStr);
191+
192+
// Complex types
193+
PyObject* dt = PythonObjectCache::get_datetime_class()(...).release().ptr();
194+
PyList_SET_ITEM(row, col - 1, dt);
195+
196+
// Final assignment
197+
PyList_SET_ITEM(rows.ptr(), initialSize + i, row);
198+
```
199+
200+
### Updated Type Handlers
201+
202+
**All handlers now use `PyList_SET_ITEM`:**
203+
204+
| Type Category | Python C API Used | Notes |
205+
|---------------|-------------------|-------|
206+
| **NULL values** | `Py_INCREF(Py_None)` + `PyList_SET_ITEM` | Explicit refcount management |
207+
| **Integers** | `PyLong_FromLong()` | Already done in OPT #2 |
208+
| **Floats** | `PyFloat_FromDouble()` | Already done in OPT #2 |
209+
| **Booleans** | `PyBool_FromLong()` | Already done in OPT #2 |
210+
| **VARCHAR** | `PyUnicode_FromStringAndSize()` | New in OPT #4 |
211+
| **NVARCHAR** | `PyUnicode_DecodeUTF16()` | OPT #1 + OPT #4 |
212+
| **BINARY** | `PyBytes_FromStringAndSize()` | New in OPT #4 |
213+
| **DECIMAL** | `.release().ptr()` | Transfer ownership |
214+
| **DATETIME** | `.release().ptr()` | Transfer ownership |
215+
| **DATE** | `.release().ptr()` | Transfer ownership |
216+
| **TIME** | `.release().ptr()` | Transfer ownership |
217+
| **DATETIMEOFFSET** | `.release().ptr()` | Transfer ownership |
218+
| **GUID** | `.release().ptr()` | Transfer ownership |
219+
220+
### PyList_SET_ITEM Macro Efficiency
221+
222+
**What is `PyList_SET_ITEM`?**
223+
It's a **macro** (not a function) that expands to direct array access:
224+
```c
225+
#define PyList_SET_ITEM(op, i, v) \
226+
(((PyListObject *)(op))->ob_item[i] = (PyObject *)(v))
227+
```
228+
229+
**Why it's faster than `operator[]`:**
230+
- No function call overhead (inline expansion)
231+
- No bounds checking (assumes pre-allocated list)
232+
- No NULL checks (assumes valid pointers)
233+
- Direct memory write (single CPU instruction)
234+
235+
**Safety:** Pre-allocation via `rows.append(py::none())` ensures list has correct size, making bounds checking redundant.
236+
237+
### Impact
238+
239+
**Performance gains:**
240+
- ✅ **Eliminates pybind11 wrapper overhead** for row creation (~15 cycles saved per row)
241+
- ✅ **No bounds checking** in hot loop (PyList_SET_ITEM is direct array access)
242+
- ✅ **Clean refcount management** (objects created with refcount=1, ownership transferred)
243+
- ✅ **Consistent architecture** with OPT #2 (entire row/cell pipeline uses Python C API)
244+
245+
**Expected improvement:** ~5-10% on large result sets
246+
247+
**Cumulative effect with OPT #2:**
248+
- OPT #2: Numeric types use Python C API (7 types)
249+
- OPT #4: ALL types now use Python C API (complete transition)
250+
- Result: Zero pybind11 overhead in entire row construction hot path
251+
252+
### Affected Code Paths
253+
254+
**Completely migrated to Python C API:**
255+
- Row creation and final assignment
256+
- NULL/SQL_NO_TOTAL handling
257+
- Zero-length data handling
258+
- All string types (CHAR, VARCHAR, WCHAR, WVARCHAR)
259+
- All binary types (BINARY, VARBINARY)
260+
- All complex types (DECIMAL, DATETIME, DATE, TIME, DATETIMEOFFSET, GUID)
261+
262+
**Architecture:**
263+
```
264+
┌─────────────────────────────────────────────────────────┐
265+
│ BEFORE: Mixed pybind11 + Python C API │
266+
├─────────────────────────────────────────────────────────┤
267+
│ py::list row(numCols) ← pybind11 │
268+
│ ├─ Numeric types: PyLong_FromLong() ← OPT #2
269+
│ ├─ Strings: row[col] = py::str() ← pybind11 │
270+
│ └─ Complex: row[col] = obj ← pybind11 │
271+
│ rows[i] = row ← pybind11 │
272+
└─────────────────────────────────────────────────────────┘
273+
274+
┌─────────────────────────────────────────────────────────┐
275+
│ AFTER: Pure Python C API │
276+
├─────────────────────────────────────────────────────────┤
277+
│ PyList_New(numCols) ← Direct C API │
278+
│ ├─ Numeric: PyLong_FromLong() ← OPT #2
279+
│ ├─ Strings: PyUnicode_FromStringAndSize() ← OPT #4
280+
│ └─ Complex: .release().ptr() ← OPT #4
281+
│ PyList_SET_ITEM(rows.ptr(), i, row) ← OPT #4
282+
└─────────────────────────────────────────────────────────┘
283+
```
120284
121285
---
122286

0 commit comments

Comments
 (0)