-
Notifications
You must be signed in to change notification settings - Fork 11
feat: Add binary fixed-point number data type proposal #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
looks like the pad codec is in there -- is that intentional? |
|
Err... a rebase was meant to happen. Let me fix that. Maybe my main is corrupted. |
eacd940 to
dc58c68
Compare
|
|
||
| ## Fill value representation | ||
|
|
||
| The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The v3 spec allows strings as well. see https://github.com/zarr-developers/zarr-specs/blob/dc3e95ed36060d9533361364ab7f54fe3e53f82b/docs/v3/data-types/index.rst?plain=1#L68-L79. If the goal is to copy the behavior of the core floating point data types here, maybe say "the fill_value field uses the JSON encoding defined for floating point numbers defined in the v3 spec" and link to the relevant section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no special infinity or NaN values with these types, so the only relevant string might be the hexadecimal string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hex string is only needed for special NaNs, so it seems safe to just use a JSON number for these types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it may be useful. Let's say I'm using N0f8 but I really want the equivalent value for the same bits representing 100.
In Julia, I could express that as follows.
julia> UInt8(100)
0x64
julia> reinterpret(N0f8, UInt8(100))
0.392N0f8
It might be convenient to write that as 0x64 here as the fill value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About to post this change:
-The `fill_value` for this data type should be represented as a floating-point number in the JSON metadata.
-
+The `fill_value` for this data type SHOULD be represented as a JSON number with the value to be represented.
+To represent the underlying integer bits exactly, the `fill_value` MAY be provided as a hexadecimal string representing the underlying integer (e.g., "0x00000000" for a fill value of 0).
+There are no `NaN` or `Infinity` values for fixed-point types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in 124220d
|
@LDeakin I added some notes about the fixed and q-num crates in the overview README.
The notation I used is closer to the one used in I'm not sure if you have any experience with either of those crates, but please do tell me if this note is useful. |
|
I created a demonstration of the Julia package FixedPointNumbers.jl in Google Colab: |
This commit introduces Zarr extension proposals for binary fixed-point
numbers, based on the FixedPointNumbers.jl library.
- Adds dedicated directories and READMEs for Qmfn and Nmfn fixed-point
types across 8-bit, 16-bit, 32-bit, and 64-bit integer underlying
types.
- READMEs include detailed configuration, range, fill value
representation, codec compatibility, Rust crate mapping (fixed and
q-num), and maintainer information.
- Renames 'fixed_point' directory to 'fixed-point' for consistency.
- Adjusts the notation explanation to consistently use 'n' for
fractional bits and clarifies Rust crate mappings.
- Adds a note about potential 128-bit extensions.
This commit updates the 'Fill value representation' section in all fixed-point data type READMEs to clarify how fill values should be handled. The changes specify that: - The `fill_value` SHOULD be a JSON number. - For exact bit-level precision, it MAY be represented as a hexadecimal string (e.g., 0x0000). - `NaN` and `Infinity` values are not applicable to these types.
124220d to
1d33a40
Compare
|
I think these should be condensed down to a single spec. I'm not sure if these were generated by a script or by an LLM but either way the "template" will be easier for implementers to manage than all of these separate files. I made the float types separate specs per the suggestion from @normanrz but (1) those have more unique content (2) I think it would be easier for everyone if those were condensed down to fewer files. A few other comments: The schema.json files are examples, not valid schemas. In each README the codec support is described using a comment like "This data type is stored as an This data type is supported by any codec that supports the base integer data type, by encoding it as its base integer data type. I think that is your intent. However, I'm not sure if that is the best thing to say, because for a codec like The data type names are also not sufficiently clear in my opinion --- they should have something in the name to indicate fixedpoint, e.g. |
Seconding this, and it's worth remembering that there is relatively little value in having a compact codec identifier in zarr.json, if a more expressive option is available. For example, many of these data types seem to be generated by a base representation which is generic over a few parameters. In that case it would be more compact to define the base data type, its parameters, and the permitted ranges for those parameters, rather than one separate spec for each parametrized instance. |
|
for example, something like {
"name": "fixed_point",
"configuration": {
"base_data_type": "uint8" | "uint16" | ...
"num_fractional_bits": ...
}
}would be much more literate |
|
Just calling it |
Co-authored-by: Gemini CLI <gemini-cli@google.com>
2f1e6b7 to
6e96901
Compare
|
Looks pretty good after the refactor. Although, what about just parameterising on the fractional bits? Otherwise extra behaviour/valiudation has to be defined. I only note this because the |
…int schema The JSON schema for 'binary_fixed_point' now allows either 'integer_bits' or 'fractional_bits' to be specified, with the other derivable from the base_data_type's bit width. The README has been updated to reflect this flexibility. Co-authored-by: Gemini CLI <gemini-cli@google.com>
One could equivalently make the case that With FixedPointNumbers.jl, I usually refer to the numbers by their abbreviated aliases (e.g. Yes, I see that having only one of them in the specification would be simpler. I will sleep on it. |
|
There is now no mention of codecs at all. |
I'm still thinking about what to do there. My sense is that we would need an explicit codec to perform a recast to the the base type based on the underlying bits. Otherwise, we should treat the values similar to how we treat floating point. I'm not sure if anything needs to be explicitly said here. Rather perhaps that needs to be addressed on a per-codec basis. Perhaps it might make sense to describe modular arithmetic here: That would address |
This pull request proposes a single parameterized Zarr extension data type for binary fixed-point numbers,
binary_fixed_point.This proposal replaces the previous approach of enumerating all possible fixed-point types.
Key Features
binary_fixed_pointbase_data_type: The underlying integer type (e.g.,int16,uint8).fractional_bits: Number of bits for the fractional part.integer_bits: Number of bits for the integer part.Qm.n), and provides examples.Examples
{"base_data_type": "int16", "fractional_bits": 15, "integer_bits": 0}{"base_data_type": "uint16", "fractional_bits": 8, "integer_bits": 8}This aligns with feedback to simplify the extension proposal and use parameterization.