Skip to content

Conversation

@atksh
Copy link
Owner

@atksh atksh commented Nov 8, 2025

This commit addresses multiple precision and validation issues identified
in the codebase analysis:

1. Input Validation

  • Add NaN/Inf validation to insert() methods for both float32 and float64
  • Ensures consistency with constructor validation
  • Prevents invalid data from entering the tree structure

2. Float64 Insert Support

  • Add float64 overload for insert() method
  • Maintains idx2exact map for dynamically inserted items
  • Preserves double-precision refinement capability for inserted boxes
  • Uses explicit py::overload_cast in Python bindings to handle overloads

3. Precision Testing

  • Add comprehensive tests for NaN/Inf validation in insert operations
  • Add tests for float64 insert() maintaining precision
  • Add tests verifying rebuild() preserves idx2exact
  • Add systematic precision boundary tests (adjusted for float32 limits)
  • Document float32 precision limitations in test comments

Technical Notes

  • Float64 input is converted to float32 for tree structure
  • Double-precision refinement helps reduce false positives
  • Precision limits: gaps below ~1e-7 may not be reliably detected
  • At large magnitudes (e.g., 1e6), absolute precision degrades

Fixes validation gaps in insert operations and maintains precision
capabilities for dynamically updated trees.

This commit addresses multiple precision and validation issues identified
in the codebase analysis:

## 1. Input Validation
- Add NaN/Inf validation to insert() methods for both float32 and float64
- Ensures consistency with constructor validation
- Prevents invalid data from entering the tree structure

## 2. Float64 Insert Support
- Add float64 overload for insert() method
- Maintains idx2exact map for dynamically inserted items
- Preserves double-precision refinement capability for inserted boxes
- Uses explicit py::overload_cast in Python bindings to handle overloads

## 3. Precision Testing
- Add comprehensive tests for NaN/Inf validation in insert operations
- Add tests for float64 insert() maintaining precision
- Add tests verifying rebuild() preserves idx2exact
- Add systematic precision boundary tests (adjusted for float32 limits)
- Document float32 precision limitations in test comments

## Technical Notes
- Float64 input is converted to float32 for tree structure
- Double-precision refinement helps reduce false positives
- Precision limits: gaps below ~1e-7 may not be reliably detected
- At large magnitudes (e.g., 1e6), absolute precision degrades

Fixes validation gaps in insert operations and maintains precision
capabilities for dynamically updated trees.
Addresses Float32 Precision Issues from precision validation gaps:
- Add adaptive epsilon calculation that scales with coordinate magnitude
- Add configurable precision parameters (relative_epsilon, absolute_epsilon)
- Implement subnormal number detection in validate_box()
- Update both insert() overloads to use adaptive epsilon
- Add precision control methods (setters/getters) to PRTree class
- Expose precision control to Python via pybind11 bindings
- Add comprehensive test suite for adaptive epsilon behavior

This improves precision handling across different coordinate scales,
from small (< 1.0) to large (> 1e6) magnitudes.

Note: Current architecture still forces float32 tree structure on all
users. Float64 data only used for idx2exact refinement. Future work
should consider separating float32/float64 builds or templating Real type.
This commit eliminates the complex idx2exact post-processing architecture
and replaces it with native float32/float64 template specialization,
significantly simplifying the codebase and optimizing for each precision level.

Key Changes:
- Templated PRTree with Real type parameter (float or double)
- Removed idx2exact map and refine_candidates() complexity entirely
- Exposed 6 separate C++ classes (_PRTree{2D,3D,4D}_{float32,float64})
- Added automatic dtype-based precision selection in Python wrapper
- Propagated Real template parameter through all detail classes:
  - BB (bounding_box.h)
  - DataType (data_type.h)
  - PRTreeNode, PRTreeLeaf, PRTreeElement (nodes.h)
  - PseudoPRTree, PseudoPRTreeNode (pseudo_tree.h)

Benefits:
- Eliminates "strange post-processing" that forced float32 on all users
- Each precision level now uses native types throughout
- Simpler, more maintainable codebase
- Better performance through compile-time type optimization
- Users get the precision they request without conversion overhead

All tests passing with new architecture.
- Fix query methods to use Real type instead of hardcoded float
  - find_one() now accepts vec<Real> for proper float64 precision
  - find_all() now accepts py::array_t<Real> matching tree precision
- Fix Python wrapper to preserve precision settings on first insert
  - Handle subnormal detection disabled case with workaround
  - Preserve relative_epsilon, absolute_epsilon, adaptive_epsilon settings
- Remove obsolete query_exact and refine_candidates code
- All precision tests now passing
This commit addresses remaining issues with precision handling:

1. **Auto-detect precision when loading from file**: When loading a tree
   from a saved file via `PRTree3D(filepath)`, the wrapper now
   automatically tries both float32 and float64 to determine which
   precision was used when the tree was saved. This fixes the
   `test_save_load_float32_no_regression` test failure where loading
   a float32 tree defaulted to float64 and caused std::bad_alloc.

2. **Improved error handling**: If both precision attempts fail when
   loading from file, provides an informative error message about
   potential file corruption.

The fix ensures that precision is correctly preserved across save/load
cycles without requiring users to manually specify the precision when
loading.

Fixes:
- test_save_load_float32_no_regression now passes
- test_save_load_float64_matteo_case continues to pass
@atksh
Copy link
Owner Author

atksh commented Nov 9, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 102 to +142
cereal::make_nvp("idx2bb", idx2bb),
cereal::make_nvp("idx2data", idx2data),
cereal::make_nvp("global_idx", global_idx),
cereal::make_nvp("n_at_build", n_at_build),
cereal::make_nvp("idx2exact", idx2exact));
cereal::make_nvp("n_at_build", n_at_build));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop idx2exact during serialization breaks float64 saved trees

The new save/load implementation no longer archives the former idx2exact map, and the associated refinement code has been removed. Trees saved by previous versions with float64 input relied on idx2exact to keep double‑precision coordinates alongside the float32 index. When those existing files are loaded after this change, the exact coordinates are silently discarded and all queries run at float32 precision only, so pairs separated by sub‑float32 gaps (≈1e‑7) that previously stayed disjoint will now be reported as intersecting. This is a regression for anyone restoring persisted trees. Please keep backward‑compatible serialization or migrate the stored map into whatever structure now preserves precision.

Useful? React with 👍 / 👎.

This commit updates all project documentation to reflect the v0.7.0
architectural changes:

1. **README.md**:
   - Updated precision section to describe native float32/float64 support
   - Added documentation for new precision control methods
   - Documented auto-detection behavior for both construction and loading
   - Updated version history with v0.7.0 changes

2. **CHANGES.md**:
   - Added comprehensive v0.7.0 release notes
   - Documented native precision architecture changes
   - Listed all new precision control methods
   - Described bug fixes related to precision handling
   - Updated test count (991/991 tests passing)

3. **docs/ARCHITECTURE.md**:
   - Updated PRTree template signature to include Real parameter
   - Documented 6 exposed C++ classes (float32/float64 variants)
   - Updated data flow diagrams for precision selection
   - Added design decision section for native precision support
   - Explained trade-offs and benefits of new architecture

The documentation now accurately reflects:
- Template signature: PRTree<T, B, D, Real>
- Automatic precision selection based on numpy dtype
- Auto-detection when loading from files
- Precision settings preservation across operations
- Elimination of idx2exact refinement approach
The project uses C++20 features (concepts, likely/unlikely attributes, etc.)
as noted in CHANGES.md. Updated prerequisite documentation to reflect this:

- CONTRIBUTING.md: Updated compiler requirements to C++20 (GCC 10+, Clang 10+, MSVC 2019+)
- docs/DEVELOPMENT.md: Updated compiler requirements to C++20

This ensures developers are aware of the correct C++ standard required
for building the project.
Release v0.7.1 with native precision support and architectural improvements:
- Native float32/float64 precision throughout the entire stack
- Eliminated idx2exact complexity for simpler code
- Auto-detection for precision based on dtype and file loading
- Advanced precision control (adaptive epsilon, configurable epsilon)
- Fixed precision validation gaps and query precision issues

Changes:
- pyproject.toml: 0.7.0 → 0.7.1
- __init__.py: 0.7.0 → 0.7.1
- CHANGES.md: Updated release date to 2025-11-09
- README.md: Updated version history to v0.7.1
@atksh atksh merged commit 2c6288d into main Nov 9, 2025
10 checks passed
@atksh atksh deleted the claude/fix-precision-validation-gaps-011CUurVca3RSrjrzhZh4M1D branch November 9, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants