-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
How do I scrape a local PDF?
I'm running:
- norma/releases/download/v0.2.26/norma_0.1.SNAPSHOT_all.deb
- ami/releases/download/v0.2.24/ami2_0.1.SNAPSHOT_all.deb
and using one of your test files trying:
norma -i /contentmineself/trialsjournal_15_1_511.pdf -o /contentmineself/test_ct/
but all it seems to do is copy the pdf and rename it fulltext.pdf?
If I add the switch --transform pdf2html, as per #38, I get:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.xmlcml.cmine.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1049)
at org.xmlcml.cmine.args.DefaultArgProcessor.runMethodsOfType(DefaultArgProcessor.java:946)
at org.xmlcml.cmine.args.DefaultArgProcessor.runRunMethodsOnChosenArgOptions(DefaultArgProcessor.java:927)
at org.xmlcml.cmine.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1111)
at org.xmlcml.norma.Norma.run(Norma.java:23)
at org.xmlcml.norma.Norma.main(Norma.java:18)
Caused by: java.lang.RuntimeException: Input must be reserved file; found: /contentmineself/trialsjournal_15_1_511.pdf
at org.xmlcml.norma.NormaArgProcessor.checkAndGetInputFile(NormaArgProcessor.java:282)
at org.xmlcml.norma.NormaTransformer.transform(NormaTransformer.java:114)
at org.xmlcml.norma.NormaArgProcessor.runTransform(NormaArgProcessor.java:202)
... 10 more
0 [main] DEBUG org.xmlcml.cmine.args.DefaultArgProcessor - option in exception or --transform; (1,2147483647); parseTransform; STRING: null / []; pdf2html; [pdf2html]
java.lang.RuntimeException: invoke runTransform fails
at org.xmlcml.cmine.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1052)
at org.xmlcml.cmine.args.DefaultArgProcessor.runMethodsOfType(DefaultArgProcessor.java:946)
at org.xmlcml.cmine.args.DefaultArgProcessor.runRunMethodsOnChosenArgOptions(DefaultArgProcessor.java:927)
at org.xmlcml.cmine.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1111)
at org.xmlcml.norma.Norma.run(Norma.java:23)
at org.xmlcml.norma.Norma.main(Norma.java:18)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.xmlcml.cmine.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1049)
... 5 more
Caused by: java.lang.RuntimeException: Input must be reserved file; found: /contentmineself/trialsjournal_15_1_511.pdf
at org.xmlcml.norma.NormaArgProcessor.checkAndGetInputFile(NormaArgProcessor.java:282)
at org.xmlcml.norma.NormaTransformer.transform(NormaTransformer.java:114)
at org.xmlcml.norma.NormaArgProcessor.runTransform(NormaArgProcessor.java:202)
... 10 more
My complete install is:
RUN apt-get clean -y && apt-get -y update && apt-get -y upgrade && \
apt-get -y update && apt-get install -y wget ant unzip openjdk-7-jdk && \
apt-get clean -y
RUN wget --no-check-certificate https://github.com/ContentMine/norma/releases/download/v0.2.26/norma_0.1.SNAPSHOT_all.deb
RUN wget --no-check-certificate https://github.com/ContentMine/ami/releases/download/v0.2.24/ami2_0.1.SNAPSHOT_all.deb
RUN dpkg -i norma_0.1.SNAPSHOT_all.deb
RUN dpkg -i ami2_0.1.SNAPSHOT_all.deb
RUN npm install --global getpapers
in a basic linux environment with node installed (Dockerhub image node:4.3.2).
Hmm - is this the issue maybe? #21 (comment)
Metadata
Metadata
Assignees
Labels
No labels