Artificial Data Analyst is a web application designed to help users analyze and visualize data using Large Language Models (LLMs). It leverages LIDA (Library for Intelligent Data Analysis) to automatically generate visualizations and infographics from datasets. This application also helps the users to apply feature engineering and feature selection techniques as part of Data Cleaning functionality along with providing a summary of the dataset used. The application includes user authentication, data upload capabilities, data summarization, data cleaning techniques and advanced visualization features.
- User Authentication: Secure JWT-based authentication system with role-based access control
- Data Upload & Management: Support for CSV, Excel, and JSON file formats
- Data Cleaning: Supports Feature Engineering and Feature selection techniques for the uploaded dataset.
- Automated Data Analysis:
- Data Summarization
- Goal Generation
- Visualization Generation
- Visualization Editing
- Visualization Explanation
- Visualization Evaluation and Repair
- Multiple LLM Provider Support: Compatible with OpenAI, Azure OpenAI, PaLM, Cohere, local HuggingFace models and ChatGroq Inference models.
- Interactive Visualizations: Using libraries like matplotlib, seaborn, altair, and plotly
- Core Components:
auth.py: JWT authentication and user managementdatacontrol.py: Data upload and managementdatasummarizer.py: Data analysis and summarizationdashboard_visualize.py: Visualization generationdatacleaner.py: Data preprocessing and cleaning
- Modern UI built with React and TypeScript
- Tailwind CSS for styling
- Responsive design for various screen sizes
- Python 3.9+
- Node.js
- MongoDB
- Docker (optional)
- Modified LLMX and LIDA ( if planning to use ChatGroq Inference models) : Refer my GitHub Repositories.
Create a .env file with:
MONGODB_URL=<your-mongodb-url>
SECRET_KEY=<your-secret-ke>
GITHUB_TOKEN=<your-github-token>
GITHUB_USERNAME=<your-github-username>
GITHUB_REPO=<your-github-repo>
ALGORITHM=HS256
LANGCHAIN_API_KEY=<your-langchain-key>
GROQ_API_KEY=<your-groq-key>
MODEL_NAME=<your-model-name>
PROVIDER=<your-provider>
LANGCHAIN_TRACING_V2=trueNote : Secret key can be generated from secret_key_generator.py.
- Clone the repository:
git clone <repository-url>
cd artificial-data-analyst- Install backend dependencies:
cd backend
pip install -r requirements.txt-
Copy the Modified LIDA and LLMX repo contents in your site packages or you can directly use them from a seperate folder and modify the imports in code accordingly(for ChatGroq Inference pipeline).
-
Install frontend dependencies:
cd frontend
npm install- Run the development servers:
Backend
cd backend
uvicorn main:app --reloadFrontend
cd frontend
npm run devdocker-compose up --buildPOST /users/token: Get access tokenPOST /users/register: Register new userPOST /users/login: Login existing user
GET /datacontrol/get: Retrieve dataPOST /datacontrol/create: Upload dataPOST /datacontrol/update: Update data
GET /datacleaner/dtaaframe-info: Get existing dataframePOST /datacleaner/engineering: Feature EngineeringPOST /datacleaner/selection: Feature Selection
POST /datasummarizer: Generate data summaryPOST /visualize/goalgenerator: Generate visualization goalsPOST /visualize/goaladdition: Adding new visualization goalsPOST /visualize/visualization-titles: Generate visualization titlesPOST /visualize/visualizations: Generate visualizationsPOST /visualize/edit-visualizations: Edit visualizations using Natural LanguagePOST /visualize/explain-visualizations: Generate Visualization ExplanationPOST /visualize/evaluate-visualizations: Evaluate generated visualizations
- FastAPI
- MongoDB
- LIDA (Library for Intelligent Data Analysis)
- LLMX
- LangChain
- Pandas
- Matplotlib
- Seaborn
- Altair
- Feature Engine
- React
- TypeScript
- Tailwind CSS
- React Icons
- Docker
- MongoDB
- GitHub (for data storage)
- JWT-based authentication
- Role-based access control
- Secure password hashing
- Environment variable protection
- CORS configuration
Please read our contributing guidelines before submitting pull requests.
This project is licensed under the MIT License.
- LIDA library by Microsoft
- Feature Engine framework
- FastAPI framework
- React and its community
