STEAM API proyect

We are asked to take the rol of an MLOps (Machine Learning Operations) Engineer and develop an API with the datasets given:

Datasets

Working process:

ETL

We were given three datasets in a JSON format, with a lot of unnested data. We have to exploit the columns with the unnested data, look for null values, replace them, combine similar columns with similar data, search and work with duplicated data. Process in detail:

ETL_process

Feature Engineering

First we made the sentiment analysis function just like was asked, and then we preparated the data for the API analysis. We made a csv for every query in the API (two in the case of one query) and cleaned them so the API can work properly. Further details here:

Feature_Engineering

API fuction testing

Here we made the API's functions in a jupyter notebook to test them easily and to work with them in case of any particular problem while running the API. The complete work here:

API_function_testing

API development and deployement

First of all, we made a virtual enviorment, we installed the libraries. Then, we chose to work with FastAPI, since it's an easy coding and high performance framework to make APIs. We used the same fuctions that we made in the former process, made a presentation function with and HTML and CSS for the main page and then we run the API locally (made a dockerfile for the process). Then we decided to work with Render, as it was sugested. Since it's a service with limitations for free users, we decided to put the API in another repository, yo can find it follow the next hyperlink:

API_Repository For the API live deployement, go to this page: https://steamapi-h3u0.onrender.com/ (maybe it can took a bit enter to the page)

EDA

Here we explore the dataframe that we cleaned before. We explore the data searching for outliers, relationships with the data and general information of every dataframe. We saved a combined dataframe of the previous three with the most important columns. Process in detail:

EDA

Machine learning

Lastly, we made the machine learning model. We decided to make a model that recommends games based on similarities on other games. For this purpose, we use a cosine similarity model, since it works well to analyse text. Based on how it works, we have to do a final csv of the data grouped by the id of the games and with a unique column with all the other text columns combined. Finally, We made a function for an API query with the machine learning model. More info about the process here:

Machine_Learning

Uploading the repo

Since github dont allow to have more than 1 GB of git LFS files, we do not include in the repository the csv made and the original JSON. You can find the JSON's in the hyperlink at the beginning of the readme, download them and run the code to create the same csv that i have in PC.

Posible upgrades

Unfortunatly, the Machine learning query works well locally but doesn't in the live page. This is because of render free acount limitations. Maybe this can be improved with further optimization. The rest of the works fine.
The ETL and EDA process were made very quick, so they are not very detailed and they some inconsistencies and repeated code. Possiblily, a more in depth ETL and EDA process could upgrade this work.
The code in general is a bit messy (especially in the EDA/ETL process), a more neater code could be benefit to comprehend it.
Make a full documentation of all the things done in the proyect

Youtube video:

A video explaining briefly the work done (in spanish): https://youtu.be/9L2wA51Qj2Y

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
Api_DataFrame		Api_DataFrame
__pycache__		__pycache__
static		static
.gitignore		.gitignore
1 - ETL_process.ipynb		1 - ETL_process.ipynb
2 - Feature_ Engineering.ipynb		2 - Feature_ Engineering.ipynb
3 - API_function_testing.ipynb		3 - API_function_testing.ipynb
4 - EDA.ipynb		4 - EDA.ipynb
5 - Machine_Learning.ipynb		5 - Machine_Learning.ipynb
README.md		README.md
dockerfile		dockerfile
functions.py		functions.py
main.py		main.py
presentation.html		presentation.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STEAM API proyect

Working process:

ETL

Feature Engineering

API fuction testing

API development and deployement

EDA

Machine learning

Uploading the repo

Posible upgrades

Youtube video:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pablorobba/STEAM_API_Individual_Project

Folders and files

Latest commit

History

Repository files navigation

STEAM API proyect

Working process:

ETL

Feature Engineering

API fuction testing

API development and deployement

EDA

Machine learning

Uploading the repo

Posible upgrades

Youtube video:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages