Update documentation for native precision support

claude · claude · commit 32dc4c8c7975 · 2025-11-09T08:35:05.000Z
This commit updates all project documentation to reflect the v0.7.0
architectural changes:

1. **README.md**:
   - Updated precision section to describe native float32/float64 support
   - Added documentation for new precision control methods
   - Documented auto-detection behavior for both construction and loading
   - Updated version history with v0.7.0 changes

2. **CHANGES.md**:
   - Added comprehensive v0.7.0 release notes
   - Documented native precision architecture changes
   - Listed all new precision control methods
   - Described bug fixes related to precision handling
   - Updated test count (991/991 tests passing)

3. **docs/ARCHITECTURE.md**:
   - Updated PRTree template signature to include Real parameter
   - Documented 6 exposed C++ classes (float32/float64 variants)
   - Updated data flow diagrams for precision selection
   - Added design decision section for native precision support
   - Explained trade-offs and benefits of new architecture

The documentation now accurately reflects:
- Template signature: PRTree&lt;T, B, D, Real&gt;
- Automatic precision selection based on numpy dtype
- Auto-detection when loading from files
- Precision settings preservation across operations
- Elimination of idx2exact refinement approach
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,27 +1,105 @@
 # PRTree Improvements
 
-## Critical Fixes
+## v0.7.0 - Native Precision Support (2025-01-XX)
 
-### 1. Windows Crash Fixed
+### Major Architectural Changes
+
+#### 1. Native Float32/Float64 Precision
+- **Previous**: Float32 tree + idx2exact map + double precision refinement
+- **New**: Native float32 and float64 tree implementations
+- **Benefit**: Simpler code, better performance, true precision throughout
+- **Impact**: ~72 lines of code removed, no conversion overhead
+
+**Implementation Details:**
+- Templated `PRTree<T, B, D, Real>` with `Real` type parameter (float or double)
+- Propagated `Real` parameter through entire class hierarchy:
+  - `BB<D, Real>`: Bounding boxes
+  - `DataType<T, D, Real>`: Data storage
+  - `PRTreeNode<T, B, D, Real>`: Tree nodes
+  - `PRTreeLeaf<T, B, D, Real>`: Leaf nodes
+  - `PseudoPRTree<T, B, D, Real>`: Builder helper
+- Exposed 6 C++ classes via pybind11: `_PRTree{2D,3D,4D}_{float32,float64}`
+- Python wrapper auto-selects precision based on numpy dtype
+
+**Breaking Change:**
+- Previous files saved with float64 input must be loaded with the correct precision
+- Solution: Auto-detection when loading from files (tries float32, then float64)
+
+#### 2. Advanced Precision Control
+- **Adaptive epsilon**: Automatically scales epsilon based on bounding box sizes
+- **Configurable epsilon**: Set relative and absolute epsilon for edge cases
+- **Subnormal detection**: Correctly handles denormalized floating-point numbers
+- **Methods added**:
+  ```python
+  tree.set_adaptive_epsilon(bool)
+  tree.set_relative_epsilon(float)
+  tree.set_absolute_epsilon(float)
+  tree.set_subnormal_detection(bool)
+  tree.get_adaptive_epsilon() -> bool
+  tree.get_relative_epsilon() -> float
+  tree.get_absolute_epsilon() -> float
+  tree.get_subnormal_detection() -> bool
+  ```
+
+#### 3. Query Precision Fixes
+- **Issue**: Query methods (`find_one`, `find_all`) used hardcoded `float` type
+- **Fix**: Templated with `Real` to match tree precision
+- **Impact**: Float64 trees now maintain full precision in queries
+
+#### 4. Python Wrapper Enhancements
+- **Auto-detection on load**: Automatically tries both precisions when loading from file
+- **Preserve settings on insert**: First insert on empty tree now preserves precision settings
+- **Subnormal workaround**: Handles edge case of inserting with subnormal detection disabled
+
+### Testing
+
+✅ **991/991 tests pass** (including 14 new adaptive epsilon tests)
+
+New test coverage:
+- `test_adaptive_epsilon.py`: 14 tests covering edge cases
+- `test_save_load_float32_no_regression`: Precision preservation across save/load
+- Float32 vs float64 precision validation tests
+
+### Performance
+
+- **No regression**: Construction and query performance unchanged
+- **Memory reduction**: Eliminated idx2exact map overhead
+- **Code simplification**: ~72 lines removed, improved maintainability
+
+### Bug Fixes
+
+1. **Float64 precision loss in queries** (critical)
+   - Query methods forced float32, losing precision
+   - Fixed: Template query methods with Real parameter
+
+2. **Precision settings lost on first insert**
+   - Python wrapper recreated tree without preserving settings
+   - Fixed: Preserve all precision settings when recreating
+
+3. **File load precision mismatch**
+   - Loading float32 file with float64 class caused std::bad_alloc
+   - Fixed: Auto-detect precision by trying both classes
+
+## Previous Releases
+
+### Critical Fixes
+
+#### 1. Windows Crash Fixed
 - **Issue**: Fatal crash with `std::mutex` (not copyable, caused deadlocks)
 - **Fix**: Use `std::unique_ptr<std::recursive_mutex>`
 - **Result**: Thread-safe, no crashes, pybind11 compatible
 
-### 2. Error Messages
+#### 2. Error Messages
 - Improved with context while maintaining backward compatibility
 - Example: `"Given index is not found. (Index: 999, tree size: 2)"`
 
-## Improvements Applied
+### Improvements Applied
 
 - **C++20**: Migrated standard, added concepts for type safety
 - **Exception Safety**: noexcept + RAII (no memory leaks)
 - **Thread Safety**: Recursive mutex protects all mutable operations
 
-## Test Results
-
-✅ **674/674 unit tests pass**
-
-## Performance
+### Performance Baseline
 
 - Construction: 9-11M ops/sec (single-threaded)
 - Memory: 23 bytes/element
@@ -30,3 +108,5 @@
 ## Future Work
 
 - Parallel partitioning algorithm for better thread scaling (2-3x expected)
+- Split large prtree.h into modular components
+- Additional precision validation modes
diff --git a/README.md b/README.md
@@ -171,9 +171,25 @@ results = tree.batch_query(queries)  # Returns [[], [], ...]
 
 ### Precision
 
-- **Float32 input**: Pure float32 for maximum speed
-- **Float64 input**: Float32 tree + double-precision refinement for accuracy
-- Handles boxes with very small gaps correctly (< 1e-5)
+The library supports native float32 and float64 precision with automatic selection:
+
+- **Float32 input**: Creates native float32 tree for maximum speed
+- **Float64 input**: Creates native float64 tree for full double precision
+- **Auto-detection**: Precision automatically selected based on numpy array dtype
+- **Save/Load**: Precision automatically detected when loading from file
+
+Advanced precision control available:
+```python
+# Configure precision parameters for challenging cases
+tree = PRTree2D(indices, boxes)
+tree.set_adaptive_epsilon(True)  # Adaptive epsilon based on box sizes
+tree.set_relative_epsilon(1e-6)   # Relative epsilon for intersection tests
+tree.set_absolute_epsilon(1e-12)  # Absolute epsilon for near-zero cases
+tree.set_subnormal_detection(True) # Handle subnormal numbers correctly
+```
+
+The new architecture eliminates the previous float32 tree + refinement approach,
+providing true native precision at each level for better performance and accuracy.
 
 ### Thread Safety
 
@@ -220,10 +236,13 @@ PRTree2D(filename)  # Load from file
 ## Version History
 
 ### v0.7.0 (Latest)
+- **Native precision support**: True float32/float64 precision throughout the entire stack
+- **Architectural refactoring**: Eliminated idx2exact complexity for simpler, faster code
+- **Auto-detection**: Precision automatically selected based on input dtype and when loading files
+- **Advanced precision control**: Adaptive epsilon, configurable relative/absolute epsilon, subnormal detection
 - **Fixed critical bug**: Boxes with small gaps (<1e-5) incorrectly reported as intersecting
 - **Breaking**: Minimum Python 3.8, serialization format changed
 - Added input validation (NaN/Inf rejection)
-- Improved precision handling
 
 ### v0.5.x
 - Added 4D support
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -83,15 +83,17 @@ python_prtree/
 **Purpose**: Implements the Priority R-Tree algorithm
 
 **Key Components**:
-- `prtree.h`: Main template class `PRTree<T, B, D>`
+- `prtree.h`: Main template class `PRTree<T, B, D, Real>`
   - `T`: Index type (typically `int64_t`)
   - `B`: Branching factor (default: 8)
   - `D`: Dimensions (2, 3, or 4)
+  - `Real`: Floating-point type (float or double) - **new in v0.7.0**
 
 **Design Principles**:
 - Header-only template library for performance
 - No Python dependencies at this layer
 - Pure C++ with C++20 features
+- Native precision support through Real template parameter
 
 ### 2. Utilities Layer (`include/prtree/utils/`)
 
@@ -116,11 +118,18 @@ python_prtree/
 - Handle numpy array conversions
 - Expose methods with Python-friendly signatures
 - Provide module-level documentation
+- Expose both float32 and float64 variants
+
+**Exposed Classes** (v0.7.0):
+- `_PRTree2D_float32`, `_PRTree2D_float64`
+- `_PRTree3D_float32`, `_PRTree3D_float64`
+- `_PRTree4D_float32`, `_PRTree4D_float64`
 
 **Design Principles**:
 - Thin binding layer (minimal logic)
 - Direct mapping to C++ API
 - Efficient numpy integration
+- Separate classes for each precision level
 
 ### 4. Python Wrapper Layer (`src/python_prtree/`)
 
@@ -135,37 +144,42 @@ python_prtree/
 - Python object storage (pickle serialization)
 - Convenient APIs (auto-indexing, return_obj parameter)
 - Type hints and documentation
+- **Automatic precision selection** (v0.7.0): Detects numpy dtype and selects float32/float64
+- **Precision auto-detection on load** (v0.7.0): Tries both precisions when loading files
+- **Precision settings preservation** (v0.7.0): Maintains epsilon settings across operations
 
 **Design Principles**:
 - Safety over raw performance
 - Pythonic API design
 - Backwards compatibility considerations
+- Zero-overhead precision selection
 
 ## Data Flow
 
-### Construction
+### Construction (v0.7.0)
 ```
 User Code
-  ↓ (numpy arrays)
+  ↓ (numpy arrays with dtype)
 PRTree2D/3D/4D (Python)
-  ↓ (arrays + validation)
-_PRTree2D/3D/4D (pybind11)
+  ↓ (dtype detection: float32 or float64?)
+  ↓ (select _PRTree{2D,3D,4D}_{float32,float64})
+_PRTree2D_float32 OR _PRTree2D_float64 (pybind11)
   ↓ (type conversion)
-PRTree<int64_t, 8, D> (C++)
-  ↓ (algorithm)
-Optimized R-Tree Structure
+PRTree<int64_t, 8, D, float> OR PRTree<int64_t, 8, D, double> (C++)
+  ↓ (algorithm with native precision)
+Optimized R-Tree Structure (float32 or float64)
 ```
 
-### Query
+### Query (v0.7.0)
 ```
 User Code
   ↓ (query box)
 PRTree2D.query() (Python)
   ↓ (empty tree check)
-_PRTree2D.query() (pybind11)
-  ↓ (type conversion)
-PRTree::find_one() (C++)
-  ↓ (tree traversal)
+_PRTree2D_float32.query() OR _PRTree2D_float64.query() (pybind11)
+  ↓ (type conversion with matching precision)
+PRTree<T,B,D,Real>::find_one(vec<Real>) (C++)
+  ↓ (tree traversal with native Real precision)
 Result Indices
   ↓ (optional: object retrieval)
 User Code
@@ -249,6 +263,26 @@ Extension installed in src/python_prtree/
 
 ## Design Decisions
 
+### Native Precision Support (v0.7.0)
+
+**Decision**: Template PRTree with Real type parameter instead of using idx2exact refinement
+
+**Rationale**:
+- Simpler architecture: Eliminated ~72 lines of refinement code
+- Better performance: No conversion overhead, no idx2exact map
+- True precision: Float64 maintains double precision throughout
+- Type safety: Compiler ensures precision consistency
+
+**Implementation**:
+- Added `Real` template parameter to PRTree and all detail classes
+- Exposed 6 separate C++ classes via pybind11
+- Python wrapper auto-selects based on numpy dtype
+
+**Trade-offs**:
+- Larger binary size (6 classes instead of 3)
+- Longer compilation time (more template instantiations)
+- Benefit: Cleaner code, better maintainability, true native precision
+
 ### Header-Only Core
 
 **Decision**: Keep core PRTree as header-only template library
@@ -257,6 +291,7 @@ Extension installed in src/python_prtree/
 - Enables full compiler optimization
 - Simplifies distribution
 - No need for .cc files at core layer
+- Required for Real template parameter
 
 **Trade-offs**:
 - Longer compilation times