Skip to content

Codeguruu03/Web_scrapper_policy_document

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDFC Policy Scraper

This project is a Python script that:

  1. Starts from the HDFC Life homepage: https://www.hdfclife.com
  2. Finds links to individual policy pages (first 5 only).
  3. For each policy page:
    • Extracts:
      • Policy Name
      • Unique Identification Number (UIN)
      • Policy Type (Protection / Savings / Retirement / Unknown)
    • Detects and downloads one associated PDF (e.g., brochure or policy document).
  4. Saves:
    • All PDFs into a folder: HDFC_Policy_Documents/
    • Extracted details into: HDFC_Policy_Documents/policy_data.csv

This is made to satisfy the Assignment 1 requirements (navigation, PDF download, data extraction, organization, error handling, and code quality). :contentReference[oaicite:1]{index=1}


Project Structure

hdfc_policy_scraper/
├── hdfc_policy_scraper.py      # Main script
├── requirements.txt            # Python dependencies
└── README.md                   # Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages