This project is a Python script that:
- Starts from the HDFC Life homepage: https://www.hdfclife.com
- Finds links to individual policy pages (first 5 only).
- For each policy page:
- Extracts:
- Policy Name
- Unique Identification Number (UIN)
- Policy Type (Protection / Savings / Retirement / Unknown)
- Detects and downloads one associated PDF (e.g., brochure or policy document).
- Extracts:
- Saves:
- All PDFs into a folder:
HDFC_Policy_Documents/ - Extracted details into:
HDFC_Policy_Documents/policy_data.csv
- All PDFs into a folder:
This is made to satisfy the Assignment 1 requirements (navigation, PDF download, data extraction, organization, error handling, and code quality). :contentReference[oaicite:1]{index=1}
hdfc_policy_scraper/
├── hdfc_policy_scraper.py # Main script
├── requirements.txt # Python dependencies
└── README.md # Documentation