Fraud Detection with MongoDB, PyMongo, and Spark

This project demonstrates how to detect fraudulent transactions using a dataset of credit card transactions. It involves using MongoDB for CRUD operations, PyMongo to connect and interact with MongoDB, and Apache Spark for advanced data processing and analysis.

Project Overview

The main purpose of this project is to upload and manage a credit card transaction dataset in MongoDB, perform CRUD operations using PyMongo, and leverage Apache Spark for data processing. This dataset includes anonymized transaction features (V1 to V28), as well as feedback and classifications for transaction fraud detection.

Features

MongoDB Integration: The pymongo_connect.py script enables CRUD operations on MongoDB.
Data Processing with Spark: The sparkconnect.py script connects to Apache Spark for scalable data analysis.
Credit Card Fraud Detection Dataset: Sample data to experiment with fraud detection methods.

Project Setup

Prerequisites

Python 3.8+: Make sure Python is installed.
MongoDB: Install and run MongoDB locally.
Apache Spark: Install Apache Spark on your system.
PyMongo & pyspark Libraries: Install these libraries using pip.

Installation Instructions

Clone the Repository

git clone https://github.com/CoderSoham/BDTProject.git
cd BDTProject

Install Dependencies

pip install pymongo pyspark pandas

Set Up MongoDB Database

Ensure MongoDB is running locally on the default port 27017. Import the CSV dataset into MongoDB by running the pymongo_connect.py script. Data Import with PyMongo

Run the pymongo_connect.py script to insert data into MongoDB:

python pymongo_connect.py

The script will insert the data into a collection named transactions in a database named fraud_detection.

Spark Setup

Start the sparkconnect.py script to connect to Spark and perform data processing:

python sparkconnect.py

This script connects to Spark, processes the transactions, and outputs key insights.

MongoDB CRUD Operations:

After running pymongo_connect.py, MongoDB should have a transactions collection populated with credit card transaction data. A sample document will be printed to the console after successful insertion. Spark Data Processing:

Running sparkconnect.py will connect to Spark, read the dataset, and output the following: Data summaries (e.g., fraud vs. non-fraud transactions) Basic analytics on transaction features Common Debugging Tips MongoDB Connection Errors Error: pymongo.errors.ServerSelectionTimeoutError Solution: Ensure MongoDB is running on your machine. Start it with mongod if needed, and verify that it’s listening on localhost:27017. Large File Not Uploading to GitHub Error: File creditcard.csv is 143.84 MB; this exceeds GitHub's file size limit of 100.00 MB Solution: This project uses Git LFS for managing large files. Make sure Git LFS is installed (git lfs install) and configured to track creditcard.csv. Spark Connection Issues Error: pyspark.sql.utils.AnalysisException Solution: Make sure Spark is properly installed and configured. Check your environment variables (SPARK_HOME and PATH) to ensure Spark is accessible. License This project is open-source and available for modification. Feel free to contribute or adjust the code to suit your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitattributes		.gitattributes
Execution Commands.txt		Execution Commands.txt
FraudDetectionMapper.java		FraudDetectionMapper.java
FraudDetectionReducer.java		FraudDetectionReducer.java
Output.png		Output.png
Project.java		Project.java
README.md		README.md
Visualisation.twb		Visualisation.twb
connect.py		connect.py
creditcard.csv		creditcard.csv
creditcard_with_feedback.csv		creditcard_with_feedback.csv
script.py		script.py
sparkconnect.py		sparkconnect.py
sparkoutput.txt		sparkoutput.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection with MongoDB, PyMongo, and Spark

Project Overview

Features

Project Setup

Prerequisites

Installation Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

CoderSoham/BDTProject

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection with MongoDB, PyMongo, and Spark

Project Overview

Features

Project Setup

Prerequisites

Installation Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages