Skip to content

Conversation

@mkitti
Copy link
Contributor

@mkitti mkitti commented Nov 25, 2025

This pull request proposes a single parameterized Zarr extension data type for binary fixed-point numbers, binary_fixed_point.

This proposal replaces the previous approach of enumerating all possible fixed-point types.

Key Features

  • Single Parameterized Type: binary_fixed_point
  • Configuration:
    • base_data_type: The underlying integer type (e.g., int16, uint8).
    • fractional_bits: Number of bits for the fractional part.
    • integer_bits: Number of bits for the integer part.
  • Documentation: Explains the interpretation of the values, the relationship to Q notation (Qm.n), and provides examples.

Examples

  • Signed Q0.15 (Q15): {"base_data_type": "int16", "fractional_bits": 15, "integer_bits": 0}
  • Unsigned UQ8.8: {"base_data_type": "uint16", "fractional_bits": 8, "integer_bits": 8}

This aligns with feedback to simplify the extension proposal and use parameterization.

@d-v-b
Copy link
Contributor

d-v-b commented Nov 25, 2025

looks like the pad codec is in there -- is that intentional?

@mkitti
Copy link
Contributor Author

mkitti commented Nov 25, 2025

Err... a rebase was meant to happen. Let me fix that. Maybe my main is corrupted.

@mkitti mkitti force-pushed the mkitti-fixed-point-numbers branch from eacd940 to dc58c68 Compare November 25, 2025 13:38

## Fill value representation

The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v3 spec allows strings as well. see https://github.com/zarr-developers/zarr-specs/blob/dc3e95ed36060d9533361364ab7f54fe3e53f82b/docs/v3/data-types/index.rst?plain=1#L68-L79. If the goal is to copy the behavior of the core floating point data types here, maybe say "the fill_value field uses the JSON encoding defined for floating point numbers defined in the v3 spec" and link to the relevant section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no special infinity or NaN values with these types, so the only relevant string might be the hexadecimal string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hex string is only needed for special NaNs, so it seems safe to just use a JSON number for these types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it may be useful. Let's say I'm using N0f8 but I really want the equivalent value for the same bits representing 100.

In Julia, I could express that as follows.

julia> UInt8(100)
0x64

julia> reinterpret(N0f8, UInt8(100))
0.392N0f8

It might be convenient to write that as 0x64 here as the fill value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About to post this change:

-The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata.
-
+The `fill_value` for this data type SHOULD be represented as a JSON number with the value to be represented.
+To represent the underlying integer bits exactly, the `fill_value` MAY be provided as a hexadecimal string representing the underlying integer (e.g., "0x00000000" for a fill value of 0).
+There are no `NaN` or `Infinity` values for fixed-point types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 124220d

@mkitti
Copy link
Contributor Author

mkitti commented Nov 25, 2025

@LDeakin I added some notes about the fixed and q-num crates in the overview README.

fixed uses a slightly different notation than the one I am using here.

  • Qmfn corresponds to the fixed crate alias I{m+1}F{n}.
  • Nmfn corresponds to the fixed crate alias U{m}F{n}.

The notation I used is closer to the one used in q-num.

I'm not sure if you have any experience with either of those crates, but please do tell me if this note is useful.

@mkitti
Copy link
Contributor Author

mkitti commented Nov 25, 2025

I created a demonstration of the Julia package FixedPointNumbers.jl in Google Colab:
https://colab.research.google.com/drive/1asynlDHLQmiWlpYZT6Dxv6xf7E29KSWs?usp=sharing

This commit introduces Zarr extension proposals for binary fixed-point
numbers, based on the FixedPointNumbers.jl library.

-   Adds dedicated directories and READMEs for Qmfn and Nmfn fixed-point
    types across 8-bit, 16-bit, 32-bit, and 64-bit integer underlying
    types.
-   READMEs include detailed configuration, range, fill value
    representation, codec compatibility, Rust crate mapping (fixed and
    q-num), and maintainer information.
-   Renames 'fixed_point' directory to 'fixed-point' for consistency.
-   Adjusts the notation explanation to consistently use 'n' for
    fractional bits and clarifies Rust crate mappings.
-   Adds a note about potential 128-bit extensions.
This commit updates the 'Fill value representation' section in all
fixed-point data type READMEs to clarify how fill values should be
handled.

The changes specify that:
- The `fill_value` SHOULD be a JSON number.
- For exact bit-level precision, it MAY be represented as a
  hexadecimal string (e.g., 0x0000).
- `NaN` and `Infinity` values are not applicable to these types.
@mkitti mkitti force-pushed the mkitti-fixed-point-numbers branch from 124220d to 1d33a40 Compare November 25, 2025 16:42
@jbms
Copy link
Contributor

jbms commented Nov 25, 2025

I think these should be condensed down to a single spec. I'm not sure if these were generated by a script or by an LLM but either way the "template" will be easier for implementers to manage than all of these separate files.

I made the float types separate specs per the suggestion from @normanrz but (1) those have more unique content (2) I think it would be easier for everyone if those were condensed down to fewer files.

A few other comments:

The schema.json files are examples, not valid schemas.

In each README the codec support is described using a comment like "This data type is stored as an int32. It is expected to be compatible with any codec that can handle the int32 data type.". Instead of "it is expected", you could be more clear by saying something like:

This data type is supported by any codec that supports the base integer data type, by encoding it as its base integer data type.

I think that is your intent. However, I'm not sure if that is the best thing to say, because for a codec like numcodecs.fixedscaleoffset or numcodecs.astype that will not do what would be expected. Instead you might want to give an explicit list of codecs which are supported by first encoding as its base integer data type, and then can later specify exceptions for codecs that require different behavior.

The data type names are also not sufficiently clear in my opinion --- they should have something in the name to indicate fixedpoint, e.g. fixedpoint_xxx instead of just xxx.

@d-v-b
Copy link
Contributor

d-v-b commented Nov 25, 2025

The data type names are also not sufficiently clear in my opinion --- they should have something in the name to indicate fixedpoint, e.g. fixedpoint_xxx instead of just xxx.

Seconding this, and it's worth remembering that there is relatively little value in having a compact codec identifier in zarr.json, if a more expressive option is available. For example, many of these data types seem to be generated by a base representation which is generic over a few parameters. In that case it would be more compact to define the base data type, its parameters, and the permitted ranges for those parameters, rather than one separate spec for each parametrized instance.

@d-v-b
Copy link
Contributor

d-v-b commented Nov 25, 2025

for example, something like

{
  "name": "fixed_point", 
  "configuration": {
    "base_data_type": "uint8" | "uint16" | ...
    "num_fractional_bits": ...
    }
}

would be much more literate

@mkitti
Copy link
Contributor Author

mkitti commented Nov 25, 2025

Just calling it fixed_point might be confusing between binary and decimal fixed point, while here we are just implementing binary fixed point for the moment. Would we want to make binary and decimal fixed point two separate types? I'm leaning towards separate types because I think the parameters would be distinct enough between the two.

@mkitti mkitti changed the title feat: Add fixed-point number data type proposals feat: Add binary fixed-point number data type proposal Nov 30, 2025
Co-authored-by: Gemini CLI <gemini-cli@google.com>
@mkitti mkitti force-pushed the mkitti-fixed-point-numbers branch from 2f1e6b7 to 6e96901 Compare November 30, 2025 07:30
@LDeakin
Copy link
Member

LDeakin commented Nov 30, 2025

Looks pretty good after the refactor. Although, what about just parameterising on the fractional bits? Otherwise extra behaviour/valiudation has to be defined. I only note this because the fixed crate and julia implementation only parameterise on the fractional bits.

…int schema

The JSON schema for 'binary_fixed_point' now allows either 'integer_bits' or 'fractional_bits' to be specified, with the other derivable from the base_data_type's bit width. The README has been updated to reflect this flexibility.

Co-authored-by: Gemini CLI <gemini-cli@google.com>
@mkitti
Copy link
Contributor Author

mkitti commented Nov 30, 2025

Looks pretty good after the refactor. Although, what about just parameterising on the fractional bits? Otherwise extra behaviour/valiudation has to be defined. I only note this because the fixed crate and julia implementation only parameterise on the fractional bits.

One could equivalently make the case that integer_bits could also fully describe the type along with the base type. In edb94e4 I allow for only one of the parameters to be defined.

With FixedPointNumbers.jl, I usually refer to the numbers by their abbreviated aliases (e.g. N1f7 rather than Fixed{UInt8,7}. The examples from the fixed crate also suggest that names such as I4F12 are commonly used as well.

Yes, I see that having only one of them in the specification would be simpler. I will sleep on it.

@jbms
Copy link
Contributor

jbms commented Dec 1, 2025

There is now no mention of codecs at all.

@mkitti
Copy link
Contributor Author

mkitti commented Dec 1, 2025

There is now no mention of codecs at all.

I'm still thinking about what to do there. My sense is that we would need an explicit codec to perform a recast to the the base type based on the underlying bits. Otherwise, we should treat the values similar to how we treat floating point.

I'm not sure if anything needs to be explicitly said here. Rather perhaps that needs to be addressed on a per-codec basis.

Perhaps it might make sense to describe modular arithmetic here:

julia> N0f8(1) + N0f8(1)
0.996N0f8

That would address scaleoffset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants