poppdf

A python (3.6+) module that wraps poppler's pdftoimage, pdftohtml and pdftotext to extract informations from PDF.

What information is extracted

image
text
infromation about the position of various text lines

How to install

pip install poppdf

Windows

Windows users will have to build or download poppler for Windows. I recommend @oschwartz10612 version which is the most up-to-date. You will then have to add the bin/ folder to PATH or use poppler_path = r"C:\path\to\poppler-xx\bin" as an argument in convert_from_path.

Mac

Mac users will have to install poppler for Mac.

Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

Platform-independant (Using `conda`)

Install poppler: conda install -c conda-forge poppler
Install pdf2image: pip install pdf2image

How does it work?

from pdf2image import image_from_path, xml_from_path, text_from_path

from poppdf.pdfDocument import PdfDocument

Then simply do:

pdf = PdfDocument('example.pdf')

And

print(pdf.pages[1].text)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.idea		.idea
poppdf		poppdf
tests		tests
.DS_Store		.DS_Store
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

poppdf

What information is extracted

How to install

Windows

Mac

Linux

Platform-independant (Using `conda`)

How does it work?

About

Uh oh!

Releases

Packages

Languages

License

mbenhaddou/poppdf

Folders and files

Latest commit

History

Repository files navigation

poppdf

What information is extracted

How to install

Windows

Mac

Linux

Platform-independant (Using conda)

How does it work?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Platform-independant (Using `conda`)

Packages