-
Notifications
You must be signed in to change notification settings - Fork 674
GH-2740: Faster parsing of RDF/XML #2744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de49918 to
0feaf9e
Compare
0feaf9e to
3b9a9cf
Compare
|
3b9a9cf to
a8b14d9
Compare
|
@afs |
rvesse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it would be worth splitting this PR
The HttpClient delayed init seems like a quick and independent win whereas the RDF/XML parsing and caching changes are more involved and warrants separate review IMO
jena-arq/src/main/java/org/apache/jena/riot/RDFParserBuilder.java
Outdated
Show resolved
Hide resolved
jena-arq/src/main/java/org/apache/jena/riot/lang/rdfxml/SysRRX.java
Outdated
Show resolved
Hide resolved
Yes please. |
|
Small request - could you put a blank line after the first line of the commit message? It separates the "subject" (the |
|
The http client delayed setup looks OK but, as Rob says, it is orthogonal. I thought it might require an |
56dc224 to
f12ffcc
Compare
afs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and it is faster!
Just a couple of incidental javadoc suggestions.
| * Configures the parser to be safe and sets necessary properties. | ||
| * This method should be called when another factory than javax.xml.stream.XMLInputFactory#newDefaultFactory() | ||
| * shall be used. | ||
| * @param xmlInputFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Configures the parser to be safe and sets necessary properties. | |
| * This method should be called when another factory than javax.xml.stream.XMLInputFactory#newDefaultFactory() | |
| * shall be used. | |
| * @param xmlInputFactory | |
| * Configures the parser to be safe and sets necessary properties. | |
| * This method should be called when a factory other than | |
| * javax.xml.stream.XMLInputFactory#newDefaultFactory() is used. | |
| * @param xmlInputFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognized, that I kept using javax.xml.stream.XMLInputFactory#newInstance() instead of javax.xml.stream.XMLInputFactory#newDefaultFactory(), so I fixed the javadoc accordingly.
jena-arq/src/main/java/org/apache/jena/riot/system/ParserProfile.java
Outdated
Show resolved
Hide resolved
jena-arq/src/main/java/org/apache/jena/riot/lang/rdfxml/SysRRX.java
Outdated
Show resolved
Hide resolved
Parsers: RRX.RDFXML_SAX, RRX.RDFXML_StAX_ev, RRX.RDFXML_StAX_sr - added "public Node createURI(IRIx iriX, ...);" to the ParserProfile, which simply uses the given IRI instead of resolving it again. - adding general IRIx caching (org.apache.jena.atlas.lib.cache.CacheSimple) in the parsers where the already cached org.apache.jena.riot.system.ParserProfileStd#resolver is not applicable - removed unused code and variables from ParserRRX_StAX_SR and ParserRRX_StAX_EV - added `org.apache.jena.riot.lang.rdfxml.TestXMLParser` in jena-benchmarks-jmh
f12ffcc to
7c43951
Compare
Faster parsing of RDF/XML by avoiding duplicated resolving of IRIs and adding cache for IRIx in parsers
(Parsers: RRX.RDFXML_SAX, RRX.RDFXML_StAX_ev, RRX.RDFXML_StAX_sr )
GitHub issue resolved #2740
Pull request Description:
org.apache.jena.riot.lang.rdfxml.TestXMLParserin jena-benchmarks-jmhBenchmark results:
By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.