-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Overview
I've ran into an issue trying to normalize N-Quads using URDNA2015 normalization from jsonld library.
pyld.jsonld.JsonLdError: ('Could not convert input to RDF dataset before normalization.',)
Type: jsonld.NormalizeError
Cause: ('Error while parsing N-Quads invalid quad.',)
Type: jsonld.ParseErrorDetails
In dkg.js we're normalizing N-Quads using the following function:
async toNQuads(content, inputFormat) {
const options = {
algorithm: 'URDNA2015',
format: 'application/n-quads',
};
if (inputFormat) {
options.inputFormat = inputFormat;
}
const canonized = await jsonld.canonize(content, options);
return canonized.split('\n').filter((x) => x !== '');
}I've tried to reproduce the same logic in dkg.py, but I've ran into issues trying to normalized N-Quads (JSON-LD works fine). It may be either wrong usage of the library from my side or bug in the jsonld as it seems it's not supported anymore.
Python normalization function:
def normalize_dataset(
dataset: JSONLD | NQuads,
input_format: Literal["JSON-LD", "N-Quads"] = "JSON-LD",
) -> NQuads:
normalization_options = {
"algorithm": "URDNA2015",
"format": "application/n-quads",
}
match input_format.lower():
case "json-ld" | "jsonld":
pass
case "n-quads" | "nquads":
normalization_options["inputFormat"] = "application/n-quads"
case _:
raise DatasetInputFormatNotSupported(
f"Dataset input format isn't supported: {input_format}. "
"Supported formats: JSON-LD / N-Quads."
)
n_quads = jsonld.normalize(dataset, normalization_options)
assertion = [quad for quad in n_quads.split("\n") if quad]
if not assertion:
raise InvalidDataset("Invalid dataset, no quads were extracted.")
return assertionMetadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working