Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ lib/PyLD.egg-info
profiler
tests/test_caching.py
tests/data/test_caching.json

# JetBrains IDEs
.idea

# pyenv
.python-version
40 changes: 34 additions & 6 deletions lib/pyld/jsonld.py
Original file line number Diff line number Diff line change
Expand Up @@ -3661,12 +3661,33 @@ def _object_to_rdf(self, item, issuer, triples, rdfDirection):
elif _is_bool(value):
object['value'] = 'true' if value else 'false'
object['datatype'] = datatype or XSD_BOOLEAN
elif _is_double(value) or datatype == XSD_DOUBLE:
# canonical double representation
object['value'] = re.sub(
r'(\d)0*E\+?0*(\d)', r'\1E\2',
('%1.15E' % value))
object['datatype'] = datatype or XSD_DOUBLE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match style and remove appropriate newlines.

elif _is_double(value):
return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this construct? Would matching the other code and setting value and datatype work? Or if it is to maybe avoid the IRI conditional below, could style use that style and a return, rather than create a new object. But in that case, everything else here should match. Maybe better to use existing style and refactor it all later if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of taste. In the matters of taste, there is no right and wrong.

To my taste, the creation of this object makes it very easy for a person reading the code to understand what will be returned, making it very visual, if you will. Very declarative.

If the community agrees on a formatter rule which bans this kind of expression, I will be fine with removing it.

Match the other code

Looking at the other code, it is not clear which of its mannerisms are

  • a conscious engineering choice,
  • or, part of style dictated for the project,
  • or, an expression of someone's subjective taste.

My opnion is as follows.

  • Style choices which can be deterministically maintained by linters & formatters can be implemented as project policies.
  • Style choices which cannot be maintained in such a way should not be enforced at all.

**object,
'value': _canonicalize_double(value),
'datatype': datatype or XSD_DOUBLE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match style and remove trailing commas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to add trailing commas is to reduce the size of a diff on a subsequent change.

If, below

'datatype': datatype or XSD_DOUBLE

someone would decide another item in the dictionary, — the diff will cover two lines:

  • the existing line, with an extra comma being added,
  • and the next line, which has been inserted.

Is there a style guide in this project which would forbid certain style elements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of pros/cons of this and other style choices. The code is the style guide. Please match it for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I will refrain from making any additional changes to this PR until an automatic formatter, which would take care of the code style, is introduced. Otherwise, we are stuck in a discussion about "how I would have done it", which has no resolution. A formatter alleviates such questions altogether. I might look into its introduction later.

}

elif datatype == XSD_DOUBLE:
# Since the previous branch did not activate, we know that `value` is not a float number.
try:
float_value = float(value)
except (ValueError, TypeError):
# If `value` is not convertible to float, we will return it as-is.
return {
**object,
'value': value,
'datatype': XSD_DOUBLE,
}
else:
# We have a float, and canonicalization may proceed.
return {
**object,
'value': _canonicalize_double(float_value),
'datatype': XSD_DOUBLE,
}

elif _is_integer(value):
object['value'] = str(value)
object['datatype'] = datatype or XSD_INTEGER
Expand Down Expand Up @@ -6390,6 +6411,13 @@ def _is_double(v):
return not isinstance(v, Integral) and isinstance(v, Real)


def _canonicalize_double(value: float) -> str:
"""Convert a float value to canonical lexical form of `xsd:double`."""
return re.sub(
r'(\d)0*E\+?0*(\d)', r'\1E\2',
('%1.15E' % value))


def _is_numeric(v):
"""
Returns True if the given value is numeric.
Expand Down
69 changes: 69 additions & 0 deletions tests/test_double_to_rdf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
"""
Tests for to_rdf functionality, specifically focusing on double/float handling bugs.
"""

import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

import sys
import os
import unittest

# Add the lib directory to the path so we can import pyld
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'lib'))

import pyld.jsonld


class TestDoubleToRdf(unittest.TestCase):
"""Test cases for to_rdf functionality with double/float values."""

def test_offline_pyld_bug_reproduction(self):
"""Test reproducing the PyLD bug with captured Wikidata data structure."""
# This is the exact problematic data structure captured from Wikidata Q399
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here and below should be adjust to only say what is happening, if needed at all. Text about a contemporary bug will not make much sense after the issue is fixed. That seems like text for the PR comments, not something that needs to be in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was primarily written as an illustration to prove that the bug exists to those who read this PR. Thanks for pointing this out, I will clean it up if/when this PR is being prepared to merge after the more urgent changes. At that point, this test might also require refactoring because it will be running under an updated test system.

# The bug occurs when PyLD tries to convert this to RDF
data = {
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"geoLongitude": "http://www.w3.org/2003/01/geo/wgs84_pos#longitude"
},
"@graph": [
{
"@id": "http://www.wikidata.org/entity/Q399",
"geoLongitude": {
"@type": "xsd:double",
"@value": "45" # This string number causes the PyLD bug
}
}
]
}
Comment on lines +23 to +37
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests should be as minimal as possible. This could inline xsd prefix, but if this were to be slightly expanded to have various values, xsd: is easier to read. How about something like:

Suggested change
data = {
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"geoLongitude": "http://www.w3.org/2003/01/geo/wgs84_pos#longitude"
},
"@graph": [
{
"@id": "http://www.wikidata.org/entity/Q399",
"geoLongitude": {
"@type": "xsd:double",
"@value": "45" # This string number causes the PyLD bug
}
}
]
}
data = {
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@graph": [
{
"@id": "ex:1",
"ex:p": {
"@type": "xsd:double",
"@value": "45"
}
}
]
}


# This should work now that the bug is fixed
# The bug was in PyLD's _object_to_rdf method where string values
# with @type: "xsd:double" were not being converted to float
result = pyld.jsonld.to_rdf(data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing whitespace here and elsewhere.

# Expected result after bug fix
expected = {
"@default": [
{
"subject": {
"type": "IRI",
"value": "http://www.wikidata.org/entity/Q399"
},
"predicate": {
"type": "IRI",
"value": "http://www.w3.org/2003/01/geo/wgs84_pos#longitude"
},
"object": {
"type": "literal",
"value": "4.5E1",
"datatype": "http://www.w3.org/2001/XMLSchema#double"
}
}
]
}

self.assertEqual(result, expected)


if __name__ == '__main__':
unittest.main()