While working with PDF files in Python, I realized how the PyPDF2 library is useful, especially its PdfFileReader class. Whether you want to extract text, read metadata, or work with pages, PdfFileReader makes it simple. Over my 10+ years as a Python developer, this tool has been invaluable whenever I needed to automate PDF processing.
In this tutorial, I’ll walk you through how to use PdfFileReader with practical examples. If you’re dealing with PDFs in your projects, such as reading reports, invoices, or any document, this guide will help you get started quickly and efficiently.
Let’s get in!
What is PdfFileReader?
PdfFileReader is a class from the PyPDF2 library that allows you to read and extract information from PDF files. It supports operations like:
- Accessing the number of pages
- Extracting text from pages
- Reading document metadata
- Accessing page dimensions and more
It’s a pure Python library, so no extra dependencies are required, and it works well on all platforms.
How to Install PyPDF2
Before we dive into the code, you need to install the PyPDF2 package if you haven’t already.
pip install PyPDF2This command installs the latest version from PyPI.
Check out Access Modifiers in Python
Basic PdfFileReader Example: Reading Text from a PDF
Let me show you a simple example where we open a PDF file and extract text from its first page. For this example, imagine you have a PDF named USA_Economic_Report.pdf containing economic data.
from PyPDF2 import PdfReader
# Path to your PDF file
file_path = "USA_Economic_Report.pdf"
# Open and read the PDF
with open(file_path, 'rb') as file:
reader = PdfReader(file)
num_pages = len(reader.pages)
print(f"Number of pages: {num_pages}")
# Print text of the first page
first_page = reader.pages[0]
text = first_page.extract_text()
print("Text from first page:\n", text)You can see the output in the screenshot below.

What’s happening here?
- We open the PDF file in binary read mode (
'rb'). - We create a
PdfFileReaderobject to interact with the file. - We check how many pages the PDF contains.
- We extract text from the first page using
extract_text().
This method works well for most text-based PDFs.
Extract Text from All Pages
Often, you want to process the entire document. Here’s how I loop through all pages to extract text:
from PyPDF2 import PdfFileReader
with open('USA_Economic_Report.pdf', 'rb') as file:
reader = PdfFileReader(file)
total_pages = reader.numPages
for page_num in range(total_pages):
page = reader.getPage(page_num)
text = page.extract_text()
print(f'--- Page {page_num + 1} ---')
print(text)You can see the output in the screenshot below.

This loop goes through each page and prints its text content. It’s handy when you want to analyze or store the entire PDF content.
Read PDF Metadata
PDF files often contain metadata like author, creation date, and title. You can access this information easily:
from PyPDF2 import PdfFileReader
with open('USA_Economic_Report.pdf', 'rb') as file:
reader = PdfFileReader(file)
info = reader.getDocumentInfo()
print('PDF Metadata:')
for key, value in info.items():
print(f'{key}: {value}')Metadata can be useful for cataloging documents or verifying their origin.
Check out Fastest Sorting Algorithm in Python
Check if a PDF is Encrypted
Sometimes PDFs are encrypted and require a password to open. Here’s how you can check and handle that:
from PyPDF2 import PdfFileReader
with open('Confidential_USA_Report.pdf', 'rb') as file:
reader = PdfFileReader(file)
if reader.isEncrypted:
print('PDF is encrypted. Trying to decrypt...')
# If you know the password, provide it here
if reader.decrypt('your_password_here'):
print('Decryption successful!')
# Now you can read pages as usual
page = reader.getPage(0)
print(page.extract_text())
else:
print('Failed to decrypt PDF.')
else:
print('PDF is not encrypted.')This is essential when working with protected documents.
Combine PdfFileReader with PdfFileWriter
In many cases, you might want to read a PDF, modify it, or extract certain pages. PdfFileReader works together with PdfFileWriter for such tasks. Here’s a quick example to extract pages 2 to 4 into a new PDF:
from PyPDF2 import PdfFileReader, PdfFileWriter
with open('USA_Economic_Report.pdf', 'rb') as infile:
reader = PdfFileReader(infile)
writer = PdfFileWriter()
# Extract pages 2 to 4 (page indices 1 to 3)
for page_num in range(1, 4):
page = reader.getPage(page_num)
writer.addPage(page)
# Write the extracted pages to a new PDF
with open('Extracted_Pages.pdf', 'wb') as outfile:
writer.write(outfile)
print('Pages 2 to 4 extracted successfully.')This method is useful for creating summaries or splitting large reports.
Read raw_input Function in Python for User Input
Tips from My Experience
- Text extraction quality depends on the PDF: Some PDFs are scanned images and require OCR tools instead.
- Always open files in binary mode (
'rb'): This avoids issues with file reading. - Use context managers (
withstatements): They ensure files close properly. - Check for encryption: Many official documents are password-protected.
- Keep PyPDF2 updated: The library is actively maintained with improvements.
By using PdfFileReader, you can automate many PDF-related tasks, saving time and effort. Whether you’re processing financial reports, government documents, or any PDF files, these techniques will get you started.
If you want to dive deeper, explore the official PyPDF2 documentation for advanced features like merging PDFs, rotating pages, and adding annotations.
I hope you found this tutorial helpful. Feel free to try out the examples with your own PDFs and see how easy it is to work with PDFs in Python!
You may also read:
- Difference Between = and == in Python
- Difference Between {} and [] in Python
- Comment Out a Block of Code in Python

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.