Commit 37cc993
committed
perf: add fast path to Ollama chunking to skip processing for small nodes
Critical performance optimization: 99% of code nodes are small (<512 tokens)
and don't need expensive semantic chunking. Added fast path that skips
chunking overhead for these nodes.
Performance issue:
- Before: build_chunk_plan() called for ALL 19,000 nodes
- Impact: Expensive sanitization, tokenization, chunk planning for tiny nodes
- Result: 10 minutes for 19K nodes (vs 5 minutes without chunking)
Optimization:
- Fast path: tokenizer.encode() to check size ONLY (cheap operation)
- If under limit: Return formatted text directly (skip build_chunk_plan)
- If over limit: Use full semantic chunking as before
Benefits:
- Small nodes (99%): Single tokenization → skip chunking → fast!
- Large nodes (1%): Full semantic chunking → correct behavior
- Accurate: Uses actual token count, not character approximation
- Expected: 10min → ~5.5min (close to original speed + chunking benefits)
Code path:
1. Format node text (always)
2. Tokenize to count tokens (fast)
3. If ≤512 tokens: Return immediately (FAST PATH - 99% of nodes)
4. If >512 tokens: build_chunk_plan() → chunk → aggregate (1% of nodes)
This maintains chunking benefits for long functions while avoiding
performance penalty for the vast majority of normal-sized functions.1 parent 9ad8fe1 commit 37cc993
File tree
2 files changed
+27
-9
lines changed- crates/codegraph-vector/src
2 files changed
+27
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
812 | 812 | | |
813 | 813 | | |
814 | 814 | | |
815 | | - | |
| 815 | + | |
816 | 816 | | |
817 | 817 | | |
818 | 818 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
202 | 221 | | |
203 | 222 | | |
204 | | - | |
| 223 | + | |
205 | 224 | | |
206 | 225 | | |
207 | 226 | | |
208 | 227 | | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
215 | 232 | | |
216 | 233 | | |
217 | 234 | | |
| |||
324 | 341 | | |
325 | 342 | | |
326 | 343 | | |
| 344 | + | |
327 | 345 | | |
328 | 346 | | |
329 | 347 | | |
| |||
434 | 452 | | |
435 | 453 | | |
436 | 454 | | |
437 | | - | |
| 455 | + | |
438 | 456 | | |
439 | 457 | | |
440 | 458 | | |
| |||
0 commit comments