Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Nov 7, 2025

Summary

Reduce the memory footprint of NotebookIndex from O(lines) to O(cells).

The NotebookIndex stores a mapping from rows in the concatenated notebook document
to the relative row numbers within each cell and from absolute line number to the corresponding cell.

The way this is implemented today is by having a vector with one entry for every absolute row number
where the value is the index of the cell. While this allows O(1) lookup, it does require an entry for every single line.

This PR rewrites our representation (and uses one Vec instead of two) to only store the
start line number per cell and use a binary search to find to which cell a line number belongs.
This makes lookups slightly slower (from O(1) to O(log(n)) but reduces memory usage
from O(rows) to O(cells).

I noticed this improvement when working on ty's notebook support but decided to extract it into its own PR to make reviewing easier.

Test Plan

Existing tests

@MichaReiser MichaReiser added the internal An internal refactor or improvement label Nov 7, 2025
@MichaReiser MichaReiser force-pushed the micha/refactor-notebook-index branch from c7340bd to fcd67f2 Compare November 7, 2025 16:31
@MichaReiser MichaReiser changed the title Reduce memory footprint of notebooks Reduce notebook memory footprint Nov 7, 2025
@MichaReiser MichaReiser marked this pull request as ready for review November 7, 2025 16:31
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants