News-Article-Classifier

The project have three docker containers:

producer
consumer
processr

Data Ingestion:

producer and consumer containers are used for data ingestion. These will use the below predefined docker images from the docker hub:

zookeeper
kafka
spark-master
spark-worker

As part of data ingestion, we have used two sources of data namely: 1. Freenews API 2. BBC RSS Feed

The data keywords are provided for the Freenews API for fetching the data about the keywords. The returned data from the freenews api is then parsed to fetch the below fields: title
date/ time
summary
topic/ category
source
We are streaming the data to the consumer using the Kafka broker. From the consumer side we are extracting the data from the Kafka broker and pushing the same into MongoDB.

The two docker projects producer and consumer will be run as part of Data Ingestion.

Data Pre-Processing, Model Training and Prediction:

As part of building the model we have done in the below way: 1. Took the existing new classifier data set from kaggle. 2. Processed the data set and retrieved the required features from it. 3. Divided the data into Test and Train Data set. We took around 50k data records to train the model. 4. Trained the model with this Data. 5. Done predictions with test data and measured the accuracy. 6. Save the Model. 7. We then fetched the data from Mongo DB which was filled by the Data Ingestion service. 8. Loaded the already saved model. 9. Re trained the model with the collected data. 10. And then saved it locally.

Then we tested the Predictions with Fast API.

After some predictions, we retrained the model using the above steps 7 to 10.

Flask Web UI is provided to user to enter the news article URL for prediction.

Installation How-to?

1. Every service is embedded as a docker image.  We have the corresponding Docker file and the requirements.txt for each   project separately.
2. Also, we have the docker-compose.yaml file to build the source and making the services up.
3. A pre-requisite is to have the docker and docker-compose installed on the machine or VM.
4. The current implementation was only tested on Ubuntu 20.04.

Below is the highlevel architecture diagram of the application:

Below are the sample sequence diagrams: Training Sequence Diagram: Prediction Sequence Diagram:

Below are some of the sample output screens: Once the docker compose up is ran it will give an url for the flask web UI on the console. This is generated dynamically. So, please take the URL from the console and paste it in the browse. Below are some of the sample output screens: Prediction Result: Retrain Result:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
common_utils		common_utils
consumer		consumer
documents		documents
mongo		mongo
news_models		news_models
processr		processr
producer		producer
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News-Article-Classifier

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News-Article-Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages