Skip to content

Commit a0b7932

Browse files
authored
adds extensions that zarr-python defines (#1)
* adds extensions that zarr-python defines * fill_values * update schema * update schema * oneOf * better spec
1 parent 8870ab3 commit a0b7932

File tree

8 files changed

+235
-0
lines changed

8 files changed

+235
-0
lines changed

codecs/vlen-bytes/README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Vlen-bytes codec
2+
3+
Defines an `array -> bytes` codec that serializes variable-length byte string arrays.
4+
5+
## Codec name
6+
7+
The value of the `name` member in the codec object MUST be `vlen-bytes`.
8+
9+
## Configuration parameters
10+
11+
None.
12+
13+
## Example
14+
15+
For example, the array metadata below specifies that the array contains variable-length byte strings:
16+
17+
```json
18+
{
19+
"data_type": "bytes",
20+
"codecs": [{
21+
"name": "vlen-bytes"
22+
}],
23+
}
24+
```
25+
26+
## Format and algorithm
27+
28+
This is a `array -> bytes` codec.
29+
30+
This codec is only compatible with the [`"bytes"`](../../data-types/bytes/README.md) data type.
31+
32+
In the encoded format, each chunk is prefixed with a 32-bit little-endian unsigned integer (u32le) that specifies the number of elements in the chunk.
33+
This prefix is followed by a sequence of encoded elements in lexicographical order.
34+
Each element in the sequence is encoded by a u32le representing the number of bytes followed by the bytes themselves.
35+
36+
See https://numcodecs.readthedocs.io/en/stable/other/vlen.html#vlenbytes for details about the encoding.
37+
38+
## Change log
39+
40+
No changes yet.
41+
42+
## Current maintainers
43+
44+
* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)

codecs/vlen-bytes/schema.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"oneOf": [
4+
{
5+
"type": "object",
6+
"properties": {
7+
"name": {
8+
"const": "vlen-bytes"
9+
},
10+
"configuration": {
11+
"type": "object",
12+
"additionalProperties": false
13+
}
14+
},
15+
"required": ["name"],
16+
"additionalProperties": false
17+
},
18+
{ "const": "vlen-bytes" }
19+
]
20+
}

codecs/vlen-utf8/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Vlen-utf8 codec
2+
3+
Defines an `array -> bytes` codec that serializes variable-length UTF-8 string arrays.
4+
5+
## Codec name
6+
7+
The value of the `name` member in the codec object MUST be `vlen-utf8`.
8+
9+
## Configuration parameters
10+
11+
None.
12+
13+
## Example
14+
15+
For example, the array metadata below specifies that the array contains variable-length UTF-8 strings:
16+
17+
```json
18+
{
19+
"data_type": "string",
20+
"codecs": [{
21+
"name": "vlen-utf8"
22+
}],
23+
}
24+
```
25+
26+
## Format and algorithm
27+
28+
This is a `array -> bytes` codec.
29+
30+
This codec is only compatible with the [`"string"`](../../data-types/string/README.md) data type.
31+
32+
In the encoded format, each chunk is prefixed with a 32-bit little-endian unsigned integer (u32le) that specifies the number of elements in the chunk.
33+
This prefix is followed by a sequence of encoded elements in lexicographical order.
34+
Each element in the sequence is encoded by a u32le representing the number of bytes followed by the bytes themselves.
35+
The bytes for each element are obtained by encoding the element as UTF8 bytes.
36+
37+
See https://numcodecs.readthedocs.io/en/stable/other/vlen.html#vlenutf8 for details about the encoding.
38+
39+
## Change log
40+
41+
No changes yet.
42+
43+
## Current maintainers
44+
45+
* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)

codecs/vlen-utf8/schema.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"oneOf": [
4+
{
5+
"type": "object",
6+
"properties": {
7+
"name": {
8+
"const": "vlen-utf8"
9+
},
10+
"configuration": {
11+
"type": "object",
12+
"additionalProperties": false
13+
}
14+
},
15+
"required": ["name"],
16+
"additionalProperties": false
17+
},
18+
{ "const": "vlen-utf8" }
19+
]
20+
}

data-types/bytes/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Bytes data type
2+
3+
Defines a data type for variable-length byte strings.
4+
5+
## Permitted fill values
6+
7+
The value of the `fill_value` metadata key must be an array of byte values.
8+
9+
## Example
10+
11+
For example, the array metadata below specifies that the array contains variable-length byte strings:
12+
13+
```json
14+
{
15+
"data_type": "bytes",
16+
"fill_value": [1, 2, 3],
17+
"codecs": [{
18+
"name": "vlen-bytes"
19+
}],
20+
}
21+
```
22+
23+
## Notes
24+
25+
Currently, this data type is only compatible with the [`"vlen-bytes"`](../../codecs/vlen-bytes/README.md) codec.
26+
27+
## Change log
28+
29+
No changes yet.
30+
31+
## Current maintainers
32+
33+
* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)

data-types/bytes/schema.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"oneOf": [
4+
{
5+
"type": "object",
6+
"properties": {
7+
"name": {
8+
"const": "bytes"
9+
},
10+
"configuration": {
11+
"type": "object",
12+
"additionalProperties": false
13+
}
14+
},
15+
"required": ["name"],
16+
"additionalProperties": false
17+
},
18+
{ "const": "bytes" }
19+
]
20+
}

data-types/string/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# String data type
2+
3+
Defines a data type for variable-length UTF8 strings.
4+
5+
## Permitted fill values
6+
7+
The value of the `fill_value` metadata key must be unicode string.
8+
9+
## Example
10+
11+
For example, the array metadata below specifies that the array contains variable-length byte strings:
12+
13+
```json
14+
{
15+
"data_type": "string",
16+
"fill_value": "foo",
17+
"codecs": [{
18+
"name": "vlen-utf8"
19+
}],
20+
}
21+
```
22+
23+
## Notes
24+
25+
Currently, this data type is only compatible with the [`"vlen-utf8"`](../../codecs/vlen-utf8/README.md) codec.
26+
27+
## Change log
28+
29+
No changes yet.
30+
31+
## Current maintainers
32+
33+
* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)

data-types/string/schema.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"oneOf": [
4+
{
5+
"type": "object",
6+
"properties": {
7+
"name": {
8+
"const": "string"
9+
},
10+
"configuration": {
11+
"type": "object",
12+
"additionalProperties": false
13+
}
14+
},
15+
"required": ["name"],
16+
"additionalProperties": false
17+
},
18+
{ "const": "string" }
19+
]
20+
}

0 commit comments

Comments
 (0)