Skip to content

pablorobba/STEAM_API_Individual_Project

Repository files navigation

STEAM API proyect

We are asked to take the rol of an MLOps (Machine Learning Operations) Engineer and develop an API with the datasets given:

Working process:

ETL

We were given three datasets in a JSON format, with a lot of unnested data. We have to exploit the columns with the unnested data, look for null values, replace them, combine similar columns with similar data, search and work with duplicated data. Process in detail:

Feature Engineering

First we made the sentiment analysis function just like was asked, and then we preparated the data for the API analysis. We made a csv for every query in the API (two in the case of one query) and cleaned them so the API can work properly. Further details here:

API fuction testing

Here we made the API's functions in a jupyter notebook to test them easily and to work with them in case of any particular problem while running the API. The complete work here:

API development and deployement

First of all, we made a virtual enviorment, we installed the libraries. Then, we chose to work with FastAPI, since it's an easy coding and high performance framework to make APIs. We used the same fuctions that we made in the former process, made a presentation function with and HTML and CSS for the main page and then we run the API locally (made a dockerfile for the process). Then we decided to work with Render, as it was sugested. Since it's a service with limitations for free users, we decided to put the API in another repository, yo can find it follow the next hyperlink:

EDA

Here we explore the dataframe that we cleaned before. We explore the data searching for outliers, relationships with the data and general information of every dataframe. We saved a combined dataframe of the previous three with the most important columns. Process in detail:

Machine learning

Lastly, we made the machine learning model. We decided to make a model that recommends games based on similarities on other games. For this purpose, we use a cosine similarity model, since it works well to analyse text. Based on how it works, we have to do a final csv of the data grouped by the id of the games and with a unique column with all the other text columns combined. Finally, We made a function for an API query with the machine learning model. More info about the process here:

Uploading the repo

Since github dont allow to have more than 1 GB of git LFS files, we do not include in the repository the csv made and the original JSON. You can find the JSON's in the hyperlink at the beginning of the readme, download them and run the code to create the same csv that i have in PC.

Posible upgrades

  • Unfortunatly, the Machine learning query works well locally but doesn't in the live page. This is because of render free acount limitations. Maybe this can be improved with further optimization. The rest of the works fine.

  • The ETL and EDA process were made very quick, so they are not very detailed and they some inconsistencies and repeated code. Possiblily, a more in depth ETL and EDA process could upgrade this work.

  • The code in general is a bit messy (especially in the EDA/ETL process), a more neater code could be benefit to comprehend it.

  • Make a full documentation of all the things done in the proyect

Youtube video:

A video explaining briefly the work done (in spanish): https://youtu.be/9L2wA51Qj2Y

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages