feat: Add binary fixed-point number data type proposal #40

mkitti · 2025-11-25T13:31:14Z

This pull request proposes a single parameterized Zarr extension data type for binary fixed-point numbers, binary_fixed_point.

This proposal replaces the previous approach of enumerating all possible fixed-point types.

Key Features

Single Parameterized Type: binary_fixed_point
Configuration:
- base_data_type: The underlying integer type (e.g., int16, uint8).
- fractional_bits: Number of bits for the fractional part.
- integer_bits: Number of bits for the integer part.
Documentation: Explains the interpretation of the values, the relationship to Q notation (Qm.n), and provides examples.

Examples

Signed Q0.15 (Q15): {"base_data_type": "int16", "fractional_bits": 15, "integer_bits": 0}
Unsigned UQ8.8: {"base_data_type": "uint16", "fractional_bits": 8, "integer_bits": 8}

This aligns with feedback to simplify the extension proposal and use parameterization.

d-v-b · 2025-11-25T13:33:36Z

looks like the pad codec is in there -- is that intentional?

mkitti · 2025-11-25T13:35:01Z

Err... a rebase was meant to happen. Let me fix that. Maybe my main is corrupted.

d-v-b · 2025-11-25T13:39:20Z

data-types/fixed-point/N0f16/README.md

+
+## Fill value representation
+
+The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata.


The v3 spec allows strings as well. see https://github.com/zarr-developers/zarr-specs/blob/dc3e95ed36060d9533361364ab7f54fe3e53f82b/docs/v3/data-types/index.rst?plain=1#L68-L79. If the goal is to copy the behavior of the core floating point data types here, maybe say "the fill_value field uses the JSON encoding defined for floating point numbers defined in the v3 spec" and link to the relevant section.

There are no special infinity or NaN values with these types, so the only relevant string might be the hexadecimal string.

the hex string is only needed for special NaNs, so it seems safe to just use a JSON number for these types

Well it may be useful. Let's say I'm using N0f8 but I really want the equivalent value for the same bits representing 100.

In Julia, I could express that as follows.

julia> UInt8(100) 0x64 julia> reinterpret(N0f8, UInt8(100)) 0.392N0f8

It might be convenient to write that as 0x64 here as the fill value.

About to post this change:

-The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata. - +The `fill_value` for this data type SHOULD be represented as a JSON number with the value to be represented. +To represent the underlying integer bits exactly, the `fill_value` MAY be provided as a hexadecimal string representing the underlying integer (e.g., "0x00000000" for a fill value of 0). +There are no `NaN` or `Infinity` values for fixed-point types.

Applied in 124220d

mkitti · 2025-11-25T14:16:24Z

@LDeakin I added some notes about the fixed and q-num crates in the overview README.

fixed uses a slightly different notation than the one I am using here.

Qmfn corresponds to the fixed crate alias I{m+1}F{n}.
Nmfn corresponds to the fixed crate alias U{m}F{n}.

The notation I used is closer to the one used in q-num.

I'm not sure if you have any experience with either of those crates, but please do tell me if this note is useful.

mkitti · 2025-11-25T14:34:32Z

I created a demonstration of the Julia package FixedPointNumbers.jl in Google Colab:
https://colab.research.google.com/drive/1asynlDHLQmiWlpYZT6Dxv6xf7E29KSWs?usp=sharing

This commit introduces Zarr extension proposals for binary fixed-point numbers, based on the FixedPointNumbers.jl library. - Adds dedicated directories and READMEs for Qmfn and Nmfn fixed-point types across 8-bit, 16-bit, 32-bit, and 64-bit integer underlying types. - READMEs include detailed configuration, range, fill value representation, codec compatibility, Rust crate mapping (fixed and q-num), and maintainer information. - Renames 'fixed_point' directory to 'fixed-point' for consistency. - Adjusts the notation explanation to consistently use 'n' for fractional bits and clarifies Rust crate mappings. - Adds a note about potential 128-bit extensions.

This commit updates the 'Fill value representation' section in all fixed-point data type READMEs to clarify how fill values should be handled. The changes specify that: - The `fill_value` SHOULD be a JSON number. - For exact bit-level precision, it MAY be represented as a hexadecimal string (e.g., 0x0000). - `NaN` and `Infinity` values are not applicable to these types.

jbms · 2025-11-25T17:07:30Z

I think these should be condensed down to a single spec. I'm not sure if these were generated by a script or by an LLM but either way the "template" will be easier for implementers to manage than all of these separate files.

I made the float types separate specs per the suggestion from @normanrz but (1) those have more unique content (2) I think it would be easier for everyone if those were condensed down to fewer files.

A few other comments:

The schema.json files are examples, not valid schemas.

In each README the codec support is described using a comment like "This data type is stored as an int32. It is expected to be compatible with any codec that can handle the int32 data type.". Instead of "it is expected", you could be more clear by saying something like:

This data type is supported by any codec that supports the base integer data type, by encoding it as its base integer data type.

I think that is your intent. However, I'm not sure if that is the best thing to say, because for a codec like numcodecs.fixedscaleoffset or numcodecs.astype that will not do what would be expected. Instead you might want to give an explicit list of codecs which are supported by first encoding as its base integer data type, and then can later specify exceptions for codecs that require different behavior.

The data type names are also not sufficiently clear in my opinion --- they should have something in the name to indicate fixedpoint, e.g. fixedpoint_xxx instead of just xxx.

d-v-b · 2025-11-25T17:21:25Z

The data type names are also not sufficiently clear in my opinion --- they should have something in the name to indicate fixedpoint, e.g. fixedpoint_xxx instead of just xxx.

Seconding this, and it's worth remembering that there is relatively little value in having a compact codec identifier in zarr.json, if a more expressive option is available. For example, many of these data types seem to be generated by a base representation which is generic over a few parameters. In that case it would be more compact to define the base data type, its parameters, and the permitted ranges for those parameters, rather than one separate spec for each parametrized instance.

d-v-b · 2025-11-25T17:25:05Z

for example, something like

{
  "name": "fixed_point", 
  "configuration": {
    "base_data_type": "uint8" | "uint16" | ...
    "num_fractional_bits": ...
    }
}

would be much more literate

mkitti · 2025-11-25T20:10:31Z

Just calling it fixed_point might be confusing between binary and decimal fixed point, while here we are just implementing binary fixed point for the moment. Would we want to make binary and decimal fixed point two separate types? I'm leaning towards separate types because I think the parameters would be distinct enough between the two.

Co-authored-by: Gemini CLI <gemini-cli@google.com>

LDeakin · 2025-11-30T07:37:56Z

Looks pretty good after the refactor. Although, what about just parameterising on the fractional bits? Otherwise extra behaviour/valiudation has to be defined. I only note this because the fixed crate and julia implementation only parameterise on the fractional bits.

…int schema The JSON schema for 'binary_fixed_point' now allows either 'integer_bits' or 'fractional_bits' to be specified, with the other derivable from the base_data_type's bit width. The README has been updated to reflect this flexibility. Co-authored-by: Gemini CLI <gemini-cli@google.com>

mkitti · 2025-11-30T08:25:11Z

Looks pretty good after the refactor. Although, what about just parameterising on the fractional bits? Otherwise extra behaviour/valiudation has to be defined. I only note this because the fixed crate and julia implementation only parameterise on the fractional bits.

One could equivalently make the case that integer_bits could also fully describe the type along with the base type. In edb94e4 I allow for only one of the parameters to be defined.

With FixedPointNumbers.jl, I usually refer to the numbers by their abbreviated aliases (e.g. N1f7 rather than Fixed{UInt8,7}. The examples from the fixed crate also suggest that names such as I4F12 are commonly used as well.

Yes, I see that having only one of them in the specification would be simpler. I will sleep on it.

jbms · 2025-12-01T14:21:47Z

There is now no mention of codecs at all.

mkitti · 2025-12-01T15:19:14Z

There is now no mention of codecs at all.

I'm still thinking about what to do there. My sense is that we would need an explicit codec to perform a recast to the the base type based on the underlying bits. Otherwise, we should treat the values similar to how we treat floating point.

I'm not sure if anything needs to be explicitly said here. Rather perhaps that needs to be addressed on a per-codec basis.

Perhaps it might make sense to describe modular arithmetic here:

julia> N0f8(1) + N0f8(1)
0.996N0f8

That would address scaleoffset.

mkitti force-pushed the mkitti-fixed-point-numbers branch from eacd940 to dc58c68 Compare November 25, 2025 13:38

d-v-b reviewed Nov 25, 2025

View reviewed changes

mkitti added 2 commits November 25, 2025 11:41

mkitti force-pushed the mkitti-fixed-point-numbers branch from 124220d to 1d33a40 Compare November 25, 2025 16:42

mkitti changed the title ~~feat: Add fixed-point number data type proposals~~ feat: Add binary fixed-point number data type proposal Nov 30, 2025

Refactor fixed-point data types to a single parameterized type

6e96901

Co-authored-by: Gemini CLI <gemini-cli@google.com>

mkitti force-pushed the mkitti-fixed-point-numbers branch from 2f1e6b7 to 6e96901 Compare November 30, 2025 07:30


		## Fill value representation

		The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata.

feat: Add binary fixed-point number data type proposal #40

Are you sure you want to change the base?

feat: Add binary fixed-point number data type proposal #40

Uh oh!

Conversation

mkitti commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Examples

Uh oh!

d-v-b commented Nov 25, 2025

Uh oh!

mkitti commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mkitti commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkitti commented Nov 25, 2025

Uh oh!

jbms commented Nov 25, 2025

Uh oh!

d-v-b commented Nov 25, 2025

Uh oh!

d-v-b commented Nov 25, 2025

Uh oh!

mkitti commented Nov 25, 2025

Uh oh!

LDeakin commented Nov 30, 2025

Uh oh!

mkitti commented Nov 30, 2025

Uh oh!

jbms commented Dec 1, 2025

Uh oh!

mkitti commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mkitti commented Nov 25, 2025 •

edited

Loading

mkitti commented Nov 25, 2025 •

edited

Loading

mkitti commented Nov 25, 2025 •

edited

Loading