diff --git a/README-EN.md b/README-EN.md
new file mode 100644
index 0000000..9c8cfa8
--- /dev/null
+++ b/README-EN.md
@@ -0,0 +1,835 @@
+
+
+
+# ๐ DAT (Data Ask Tool)
+
+**Enterprise-grade AI Tool for Conversing with Data Using Natural Language**
+
+*Dating with your data*
+
+[](https://github.com/hexinfo/dat/releases/latest)
+[](https://github.com/hexinfo/dat)
+[](https://github.com/hexinfo/dat/releases/latest)
+[](https://github.com/hexinfo/dat/blob/main/LICENSE)
+[](https://openjdk.java.net/projects/jdk/17/)
+[](https://maven.apache.org/)
+[](https://deepwiki.com/hexinfo/dat)
+[](https://zread.ai/hexinfo/dat)
+
+
+
+
+
+---
+
+> **[๐จ๐ณ ไธญๆ็ๆฌ](./README.md)**
+
+---
+
+## ๐ฏ Project Vision
+
+> We are entering a new era of generative artificial intelligence, where **language is the interface, and data is the fuel**.
+
+DAT is dedicated to solving the last-mile problem of enterprise data querying โ enabling business users to converse directly with databases using natural language, without writing complex SQL queries. Through a pre-modeled semantic layer, DAT ensures that AI can express itself not only confidently but also correctly.
+
+The core driving force of DAT does not entirely stem from another intelligence explosion of large language models themselves, but rather from the Askdata Agent workflow we designed for it.
+Everything we do is essentially trading `"more precise and complete knowledge"` (**currently the main development focus**), `"more computational steps"`, and `"longer thinking time"` for something crucial in the real business world โ the `"high quality"` and `"certainty"` of results.
+
+
+## โจ Core Features
+
+### ๐๏ธ Enterprise-grade Architecture Design
+- **๐ Pluggable SPI Architecture** - Flexible extension support for multiple databases, LLMs, and embedding models
+- **๐ญ Factory Pattern Implementation** - Standardized component creation and management mechanism
+- **๐ฆ Modular Design** - Clear separation of responsibilities for easy maintenance and extension
+
+### ๐๏ธ Multi-database Support
+- **MySQL** - Full support including connection pooling and dialect conversion
+- **PostgreSQL** - Enterprise-grade database support
+- **Oracle** - Legacy enterprise database compatibility
+- **More Databases** - Easily extensible through SPI mechanism
+
+### ๐ค Intelligent Semantic SQL Generation
+- **Natural Language Understanding** - LLM-based semantic parsing
+- **SQL Dialect Conversion** - Automatic adaptation to different database syntaxes
+- **Semantic Model Binding** - Query accuracy ensured through predefined models
+
+### ๐ Rich Semantic Modeling
+- **Entities** - Primary key and foreign key relationship definitions
+- **Dimensions** - Time, categorical, and enumeration dimension support
+- **Measures** - Aggregation functions and calculated field definitions
+- **YAML Configuration** - Intuitive model definition approach
+
+### ๐ Vector-enhanced Retrieval
+- **Content Storage** - Vectorization of SQL Q&A pairs, synonyms, and business knowledge
+- **Semantic Retrieval** - Intelligent matching based on embedding models
+- **Multiple Storage Backends** - Storage options including DuckDB, Weaviate, PGVector, etc.
+
+
+---
+
+## ๐๏ธ System Architecture
+
+```
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+โ DAT Framework โ
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
+โ ๐ฏ DAT Language (Authoring Layer) โ
+โ โโโ ๐ Semantic Model Definition (YAML) โ
+โ โโโ ๐๏ธ Data Model Configuration โ
+โ โโโ ๐ค Intelligent Agent Configuration โ
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
+โ โ๏ธ DAT Engine (Execution Layer) โ
+โ โโโ ๐ค NLU โ ๐ Semantic SQL Gen โ ๐๏ธ Query Exec โ
+โ โโโ ๐ง LLM Management โ ๐ Vector Retrieval โ ๐ Result Format โ
+โ โโโ ๐ SPI Management โ ๐ญ Factory Creation โ โก Cache Optim โ
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+```
+
+- 1. DAT CLI is used for local development, unit testing, and debugging. It allows local development of DAT intelligent Q&A projects through IDEs (VSCode, IDEA, or Eclipse), `transforming prompt (context) engineering into data engineering`.
+For this reason, the DAT Project development model naturally aligns with AI Coding tools (such as Cursor, Claude Code, etc.), helping achieve smarter, automated intelligent Q&A development workflows.
+
+- 2. DAT is not a platform, but a `framework`; secondary developers can build their own Web UI based on `dat-sdk`, which can be a web IDE, drag-and-drop workflow, list-based interaction, etc.; or provide external `OpenAPI` or `MCP` services.
+
+- 3. This model `allows data engineers or data analysts to develop intelligent Q&A applications just like software engineers develop applications`.
+
+
+---
+
+## ๐ Quick Start
+
+### ๐ Requirements
+
+- **Java 17+** - OpenJDK recommended
+- **Database** - MySQL / PostgreSQL / Oracle / DuckDB (choose one)
+- **LLM API** - OpenAI / Anthropic / Ollama / Gemini, etc.
+
+### โก 5-Minute Quick Experience
+
+#### 1๏ธโฃ Install DAT CLI
+
+##### ๐ง Linux/macOS Systems
+
+```bash
+# Download the latest version
+wget https://github.com/hexinfo/dat/releases/latest/download/dat-cli-0.7.2-full.tar.gz
+
+# Extract and configure environment variables
+tar -xzf dat-cli-x.x.x.tar.gz
+mv dat-cli-x.x.x dat-cli
+export PATH=$PATH:$(pwd)/dat-cli/bin
+```
+
+##### ๐ช Windows Systems
+
+1. Visit the [Releases page](https://github.com/hexinfo/dat/releases/latest)
+2. Download the `dat-cli-x.x.x.tar.gz` file
+3. Extract using WinRAR, 7-Zip, or Windows built-in extraction tool
+4. Add the extracted `dat-cli\bin` directory to the system PATH environment variable:
+ - Right-click "This PC" โ "Properties" โ "Advanced system settings"
+ - Click "Environment Variables" โ Edit "Path" variable
+ - Add the DAT CLI bin directory path
+
+#### 2๏ธโฃ Initialize Project
+
+```bash
+# Create a new DAT project
+dat init
+
+# Follow the prompts to enter project information
+# Project name: my-dat-project
+# Description: My first intelligent Q&A project
+# Database type: mysql
+```
+
+
+
+> ๐ก **Tip:** If you don't have an existing database to access, or just want to query local CSV data, you can select `duckdb` as the database when initializing the project. By default, it will create a local embedded data store with a 'duckdb' prefix in the project's `.dat` directory.
+
+
+#### 3๏ธโฃ Configure Data Source
+
+Edit the generated `dat_project.yaml`:
+
+```yaml
+version: 1
+name: my-dat-project
+description: My first intelligent Q&A project
+
+# Database configuration
+db:
+ provider: mysql
+ configuration:
+ url: jdbc:mysql://localhost:3306/mydb
+ username: your_username
+ password: your_password
+ timeout: 1 min
+
+# LLM configuration
+llm:
+ provider: openai
+ configuration:
+ api-key: your-openai-api-key
+ model-name: gpt-4
+ base-url: https://api.openai.com/v1
+
+# Embedding model configuration
+embedding:
+ provider: bge-small-zh-v15-q
+```
+
+> ๐ก **Tip:** For more project configuration options, please refer to the `dat_project.yaml.template` in your project.
+
+> ๐ก **Tip:**
+>
+> If you don't have existing data to use, you can execute the `seed` command to load the sample seed data from the initialized project into the database.
+>
+> ```
+> # Load seed data
+> dat seed -p ./my-dat-project
+> ```
+>
+> Then skip step 4๏ธโฃ and use the sample semantic model from the initialized project to proceed with step 5๏ธโฃ "Start Intelligent Q&A".
+
+
+#### 4๏ธโฃ Create Semantic Model
+
+Create `sales.yaml` in the `models/` directory:
+
+```yaml
+version: 1
+
+semantic_models:
+ - name: sales_data
+ description: Sales data analysis model
+ model: ref('sales_table')
+ entities:
+ - name: product_id
+ description: Product ID
+ type: primary
+ dimensions:
+ - name: sale_date
+ description: Sale date
+ type: time
+ type_params:
+ time_granularity: day
+ - name: region
+ description: Sales region
+ type: categorical
+ enum_values:
+ - value: "North"
+ label: "North Region"
+ - value: "South"
+ label: "South Region"
+ measures:
+ - name: sales_amount
+ description: Sales amount
+ agg: sum
+ - name: order_count
+ description: Order count
+ agg: count
+```
+
+> ๐ก **Tip:** This is just an example. Please configure according to your actual data.
+> For more semantic model configuration instructions, please check the `MODEL_GUIDE.md` manual in your project.
+
+
+#### 5๏ธโฃ Start Intelligent Q&A
+
+```bash
+# Start interactive Q&A
+dat run -p ./my-dat-project -a default
+
+# Or start API service
+dat server openapi -p ./my-dat-project
+```
+
+Now you can query data using natural language!
+
+```
+๐ฌ What was the sales amount in the North region last month?
+๐ Analyzing your question...
+๐ Generated Semantic SQL: SELECT SUM(sales_amount) FROM sales_data WHERE region='North' AND sale_date >= '2024-11-01'
+โ
Query result: North region sales amount last month was $1,234,567
+```
+
+### ๐ Multiple Usage Methods
+
+DAT provides multiple usage methods (CLI is mainly for development and debugging) to meet different scenario requirements:
+
+#### 1๏ธโฃ Use via Dify Plugin (Web-based Q&A)
+
+If you need to conduct intelligent Q&A through a **Web interface** without developing your own frontend, you can directly use the DAT plugin on the **Dify platform**.
+
+๐ **Plugin URL**: [https://marketplace.dify.ai/plugins/hexinfo/dat](https://marketplace.dify.ai/plugins/hexinfo/dat)
+
+First [start the DAT OpenAPI service](#-dat-server---service-deployment), then install the DAT plugin in Dify and configure the `DAT OpenAPI Base URL` to connect with it. You can then create intelligent Q&A applications in Dify's visual interface, providing a friendly web interaction experience.
+
+#### 2๏ธโฃ Integrate into Your Own Project (Streaming Q&A API)
+
+If you need to integrate streaming Q&A functionality into your **own Web project**, you can [start the DAT OpenAPI service](#-dat-server---service-deployment) for integration.
+
+#### 3๏ธโฃ Integrate into Agent (MCP Tool Support)
+
+If you use Agents that support **MCP (Model Context Protocol)** (such as Claude Desktop, Cline, etc.), you can [start the DAT MCP service](#-mcp-service) to integrate intelligent Q&A capabilities into these Agents.
+
+
+---
+
+## ๐ ๏ธ CLI Command Reference
+
+### ๐ Command Overview
+
+
+
+### ๐ฏ Core Commands
+
+#### ๐ `dat init` - Project Initialization
+
+```bash
+dat init --help
+```
+
+
+**Usage Examples**:
+```bash
+# Interactive initialization of DAT project in current working directory
+dat init
+
+# Interactive initialization of DAT project in specified workspace directory
+dat init -w ./my-workspace
+```
+
+
+
+#### ๐ค `dat run` - Intelligent Q&A
+
+```bash
+dat run --help
+```
+
+
+**Usage Examples**:
+```bash
+# Current working directory is DAT project directory, start default agent
+dat run
+
+# Current working directory is DAT project directory, start specific agent
+dat run -a sales-agent
+
+# Specify DAT project directory and start specific agent
+dat run -p ./my-project -a sales-agent
+```
+
+
+
+#### ๐ `dat server` - Service Deployment
+
+```bash
+dat server --help
+```
+
+
+##### ๐ OpenAPI Service
+
+```bash
+dat server openapi --help
+```
+
+
+**Start Service**:
+```bash
+# Current working directory is DAT project directory
+dat server openapi
+
+# Specify DAT project directory
+dat server openapi -p ./my-project
+
+# Custom port
+dat server openapi --port=9090
+```
+
+
+
+**Swagger UI Interface**:
+
+
+**API Call Example**:
+```bash
+# Streaming Q&A API
+curl -X POST http://localhost:8080/api/v1/ask/stream \
+ -H "Content-Type: application/json" \
+ -d '{"question": "Total cases by country"}' \
+ --no-buffer
+```
+
+##### ๐ MCP Service
+
+```bash
+dat server mcp --help
+```
+
+
+**Start Service**:
+```bash
+# Current working directory is DAT project directory
+dat server mcp
+
+# Specify DAT project directory
+dat server mcp -p ./my-project
+
+# Custom port
+dat server mcp --port=9091
+```
+
+
+
+
+#### ๐ฑ `dat seed` - Load Seed Data
+
+```bash
+dat seed --help
+```
+
+
+**Usage Examples**:
+```bash
+# Current working directory is DAT project directory, load seed CSV files
+dat seed
+
+# Specify DAT project directory and load seed CSV files
+dat seed -p ./my-project
+```
+
+
+
+
+---
+
+## ๐๏ธ Development Guide
+
+### ๐ฆ Module Architecture
+
+DAT adopts a modular design with clear responsibilities for each module:
+
+```
+dat-parent/
+โโโ โค๏ธ dat-core/ # Core interfaces and factory management
+โโโ ๐ dat-adapters/ # Database adapters
+โ โโโ dat-adapter-duckdb/ # [Built-in local database]
+โ โโโ dat-adapter-mysql/
+โ โโโ dat-adapter-oracle/
+โ โโโ dat-adapter-postgresql/
+โโโ ๐ง dat-llms/ # LLM integration modules
+โ โโโ dat-llm-anthropic/
+โ โโโ dat-llm-gemini/
+โ โโโ dat-llm-ollama/
+โ โโโ dat-llm-openai/
+โ โโโ dat-llm-xinference/
+โ โโโ dat-llm-azure-openai/
+โโโ ๐ dat-embedders/ # Embedding model integration
+โ โโโ dat-embedder-bge-small-zh/ # [Built-in local embedding model]
+โ โโโ dat-embedder-bge-small-zh-q/ # [Built-in local embedding model]
+โ โโโ dat-embedder-bge-small-zh-v15/ # [Built-in local embedding model]
+โ โโโ dat-embedder-bge-small-zh-v15-q/ # [Built-in local embedding model]
+โ โโโ dat-embedder-jina/
+โ โโโ dat-embedder-ollama/
+โ โโโ dat-embedder-openai/
+โ โโโ dat-embedder-xinference/
+โ โโโ dat-embedder-azure-openai/
+โโโ โ๏ธ dat-rerankers/ # Reranking model integration
+โ โโโ dat-reranker-onnx-builtin/
+โ โโโ dat-reranker-ms-marco-minilm-l6-v2/ # [Built-in local reranking model]
+โ โโโ dat-reranker-ms-marco-minilm-l6-v2-q/ # [Built-in local reranking model]
+โ โโโ dat-reranker-ms-marco-tinybert-l2-v2/ # [Built-in local reranking model]
+โ โโโ dat-reranker-ms-marco-tinybert-l2-v2-q/ # [Built-in local reranking model]
+โ โโโ dat-reranker-onnx-local/ # [Local reranking model invocation]
+โ โโโ dat-reranker-jina/
+โ โโโ dat-reranker-xinference/
+โโโ ๐พ dat-storers/ # Vector storage backends
+โ โโโ dat-storer-duckdb/ # [Built-in local vector storage]
+โ โโโ dat-storer-pgvector/
+โ โโโ dat-storer-weaviate/
+โ โโโ dat-storer-qdrant/
+โ โโโ dat-storer-milvus/
+โโโ ๐ค dat-agents/ # Intelligent agent implementations
+โ โโโ dat-agent-agentic/
+โโโ ๐ dat-servers/ # Server components
+โ โโโ dat-server-mcp/
+โ โโโ dat-server-openapi/
+โโโ ๐ฆ dat-sdk/ # Development toolkit
+โโโ ๐ฅ๏ธ dat-cli/ # Command-line tool
+```
+
+### ๐ง Local Development Environment
+
+#### Environment Setup
+```bash
+# Clone the project
+git clone https://github.com/hexinfo/dat.git
+cd dat
+
+# Install dependencies and compile
+mvn clean install -DskipTests
+```
+
+### ๐ Secondary Development Guide
+
+DAT provides the `dat-sdk` development toolkit, making it convenient for developers to integrate DAT's intelligent Q&A capabilities into their own Java applications. You can develop custom Web UIs, API services, or integrate into existing systems based on the SDK.
+
+#### Maven Dependency Configuration
+
+Add the following dependency to your project's `pom.xml`:
+
+```xml
+
+ cn.hexinfo
+ dat-sdk
+ 0.7.2
+
+```
+
+#### Quick Start Example
+
+```java
+import ai.dat.boot.ProjectRunner;
+import ai.dat.core.agent.data.StreamAction;
+import ai.dat.core.agent.data.StreamEvent;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Collections;
+import java.util.Map;
+
+public class DatProjectRunnerExample {
+
+ private static final ObjectMapper JSON_MAPPER = new ObjectMapper();
+
+ public static void main(String[] args) {
+ // Initialize project runner
+ Path projectPath = Paths.get("/path/to/your/dat-project").toAbsolutePath();
+ String agentName = "default";
+ Map variables = Collections.emptyMap();
+ ProjectRunner runner = new ProjectRunner(projectPath, agentName, variables);
+
+ // Ask a question
+ StreamAction action = runner.ask("Total cases by country");
+
+ // Handle various stream events
+ for (StreamEvent event : action) {
+ System.out.println("-------------------" + event.name() + "-------------------");
+ event.getIncrementalContent().ifPresent(content -> System.out.println(content));
+ event.getSemanticSql().ifPresent(content -> System.out.println(content));
+ event.getQuerySql().ifPresent(content -> System.out.println(content));
+ event.getQueryData().ifPresent(data -> {
+ try {
+ System.out.println(JSON_MAPPER.writeValueAsString(data));
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException(e);
+ }
+ });
+ event.getToolExecutionRequest().ifPresent(request -> System.out.println("id: " + request.id()
+ + "\nname: " + request.name() + "\narguments: " + request.arguments()));
+ event.getToolExecutionResult().ifPresent(result -> System.out.println("result: " + result));
+ event.getHitlAiRequest().ifPresent(request -> System.out.println(request));
+ event.getHitlToolApproval().ifPresent(request -> System.out.println(request));
+ event.getMessages().forEach((k, v) -> {
+ try {
+ System.out.println(k + ": " + JSON_MAPPER.writeValueAsString(v));
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException(e);
+ }
+ });
+ }
+ }
+}
+```
+
+It is recommended to use high-level classes such as `ai.dat.boot.ProjectRunner`, `ai.dat.boot.ProjectBuilder`, and `ai.dat.boot.ProjectSeeder`.
+
+For more SDK usage examples and best practices, please refer to:
+- [Example 1: OpenAPI Server](./dat-servers/dat-server-openapi)
+- [Example 2: MCP Server](./dat-servers/dat-server-mcp)
+
+Add other implemented modules as needed, such as:
+```xml
+
+
+ cn.hexinfo
+ dat-storer-duckdb
+
+
+ cn.hexinfo
+ dat-storer-weaviate
+
+
+ cn.hexinfo
+ dat-storer-pgvector
+
+
+ cn.hexinfo
+ dat-storer-qdrant
+
+
+ cn.hexinfo
+ dat-storer-milvus
+
+
+
+
+ cn.hexinfo
+ dat-embedder-bge-small-zh
+
+
+ cn.hexinfo
+ dat-embedder-bge-small-zh-q
+
+
+ cn.hexinfo
+ dat-embedder-bge-small-zh-v15
+
+
+ cn.hexinfo
+ dat-embedder-bge-small-zh-v15-q
+
+
+ cn.hexinfo
+ dat-embedder-onnx-local
+
+
+ cn.hexinfo
+ dat-embedder-openai
+
+
+ cn.hexinfo
+ dat-embedder-ollama
+
+
+ cn.hexinfo
+ dat-embedder-jina
+
+
+ cn.hexinfo
+ dat-embedder-xinference
+
+
+ cn.hexinfo
+ dat-embedder-azure-openai
+
+
+
+
+ cn.hexinfo
+ dat-reranker-ms-marco-minilm-l6-v2
+
+
+ cn.hexinfo
+ dat-reranker-ms-marco-minilm-l6-v2-q
+
+
+ cn.hexinfo
+ dat-reranker-ms-marco-tinybert-l2-v2
+
+
+ cn.hexinfo
+ dat-reranker-ms-marco-tinybert-l2-v2-q
+
+
+ cn.hexinfo
+ dat-reranker-onnx-local
+
+
+ cn.hexinfo
+ dat-reranker-jina
+
+
+ cn.hexinfo
+ dat-reranker-xinference
+
+
+
+
+ cn.hexinfo
+ dat-llm-openai
+
+
+ cn.hexinfo
+ dat-llm-anthropic
+
+
+ cn.hexinfo
+ dat-llm-ollama
+
+
+ cn.hexinfo
+ dat-llm-gemini
+
+
+ cn.hexinfo
+ dat-llm-xinference
+
+
+ cn.hexinfo
+ dat-llm-azure-openai
+
+
+
+
+ cn.hexinfo
+ dat-adapter-duckdb
+
+
+ cn.hexinfo
+ dat-adapter-mysql
+
+
+ cn.hexinfo
+ dat-adapter-oracle
+
+
+ cn.hexinfo
+ dat-adapter-postgresql
+
+
+
+
+ cn.hexinfo
+ dat-agent-agentic
+
+```
+
+You can also develop your own interface implementations on top of `dat-core`.
+
+```xml
+
+ cn.hexinfo
+ dat-core
+
+```
+
+---
+
+## ๐ค Contribution Guide
+
+We welcome all forms of contributions! Whether it's bug reports, feature suggestions, documentation improvements, or code submissions.
+
+### ๐ Reporting Issues
+
+Before submitting an issue, please ensure:
+
+1. **Search existing issues** - Avoid duplicate submissions
+2. **Provide detailed information** - Include error logs, configuration files, and reproduction steps
+3. **Use issue templates** - Help us understand the problem quickly
+
+### ๐ก Submitting Feature Suggestions
+
+We encourage innovative ideas! When submitting feature suggestions, please include:
+
+- **Use case description** - What real-world problem does it solve
+- **Design concept** - Initial implementation ideas
+- **Impact scope** - Assessment of impact on existing features
+
+### ๐ง Code Contributions
+
+#### Development Process
+
+1. **Fork the project** and create a feature branch
+```bash
+git checkout -b feature/awesome-new-feature
+```
+
+2. **Follow coding standards**:
+ - Use Chinese comments to explain business logic
+ - Follow Alibaba Java Coding Guidelines
+ - Maintain test coverage > 80%
+
+3. **Commit code**:
+```bash
+git commit -m "feat: Add ClickHouse database adapter
+
+- Implement ClickHouse connection and query functionality
+- Add SQL dialect conversion support
+- Complete unit test coverage
+- Update related documentation
+
+Closes #123"
+```
+
+4. **Create Pull Request**:
+ - Describe changes in detail
+ - Link related issues
+ - Ensure CI checks pass
+
+#### Code Review Standards
+
+- โ๏ธ **Feature completeness** - Implementation meets specifications
+- โ๏ธ **Code quality** - Follows design patterns and best practices
+- โ๏ธ **Test coverage** - Includes unit and integration tests
+- โ๏ธ **Documentation updates** - Synchronously update related documentation
+- โ๏ธ **Backward compatibility** - Does not break existing APIs
+
+### ๐ฏ Development Roadmap
+
+- โ
Data model (table or view) configuration;
+- โ
Semantic model (bound to data model) configuration, including: entities, dimensions, measures, etc.;
+- โ
LLM-based semantic SQL generation, converting semantic SQL to real SQL, and executing to return data;
+- โ
Intelligent Q&A supports HITL (Human-in-the-Loop) interaction;
+- โ
Support for providing OpenAPI services externally for intelligent Q&A projects;
+- โ
Support for providing MCP services externally for intelligent Q&A projects;
+- โ
Support seed command to initialize and load CSV files into the database;
+- โ
Vectorization and retrieval of SQL Q&A pairs, synonyms, and business knowledge;
+- โ
Support Jinja template language in data models, enabling data permission control through command-line variable passing;
+- โฌ Provide plugins for IDEs like VSCode, IDEA, Eclipse for DAT project development assistance;
+- โฌ LLM-based data exploration to assist in generating semantic models;
+- โฌ Unit testing for data models, semantic models, and intelligent Q&A;
+- โฌ Metric configuration (can further add metrics after building semantic models);
+
+
+---
+
+## ๐ Community & Support
+
+### ๐ฌ Communication Channels
+
+- **GitHub Discussions** - Technical discussions and Q&A
+- **WeChat Group** - Add WeChat `slime_liu` with note `DAT` to join the community group
+
+### ๐ Contributor Acknowledgments
+
+Thanks to all developers who have contributed to the DAT project!
+
+
+
+
+
+---
+
+## ๐ Project Statistics
+
+### โญ Star History
+
+[](https://star-history.com/#hexinfo/dat&Date)
+
+---
+
+## ๐ License
+
+This project is licensed under the Apache 2.0 License. For details, please see the [LICENSE](https://github.com/hexinfo/dat/blob/main/LICENSE) file.
+
+---
+
+
+
+**๐ฏ Making Data Queries Simple and Natural**
+
+**โญ If this project helps you, please give us a Star!**
+
+[๐ Quick Start](#-quick-start) โข [๐ Documentation](https://github.com/hexinfo/dat) โข [๐ฌ Join Community](#-community--support) โข [๐ค Contribute](#-contribution-guide)
+
+---
+
+*Built with โค๏ธ by the DAT Community*
+
+
diff --git a/README.md b/README.md
index 38a7e7b..be5b7ed 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,10 @@
---
+> **[๐ฌ๐ง English Version](./README-EN.md)**
+
+---
+
## ๐ฏ ้กน็ฎๆฟๆฏ
> ๆไปฌๆญฃๅจ่ฟๅ
ฅ็ๆๅผไบบๅทฅๆบ่ฝ็ๆฐๆถไปฃ๏ผ**่ฏญ่จๆฏ็้ข๏ผๆฐๆฎๆฏ็ๆ**ใ
diff --git a/dat-cli/src/main/java/ai/dat/cli/commands/InitCommand.java b/dat-cli/src/main/java/ai/dat/cli/commands/InitCommand.java
index d9a7473..17c393f 100644
--- a/dat-cli/src/main/java/ai/dat/cli/commands/InitCommand.java
+++ b/dat-cli/src/main/java/ai/dat/cli/commands/InitCommand.java
@@ -9,6 +9,7 @@
import ai.dat.core.data.project.EmbeddingConfig;
import ai.dat.core.data.project.EmbeddingStoreConfig;
import ai.dat.core.factories.*;
+import ai.dat.core.utils.DatProjectUtil;
import ai.dat.core.utils.FactoryUtil;
import ai.dat.core.utils.JinjaTemplateUtil;
import lombok.Getter;
@@ -439,11 +440,10 @@ private void createProjectReadmeFile(Path projectPath) throws IOException {
* @throws IOException
*/
private void createProjectYamlFile(Path projectPath) throws IOException {
- DatProjectFactory factory = new DatProjectFactory();
Map variables = new HashMap<>();
variables.put("project", projectConfig);
- variables.put("project_configuration", factory.getProjectConfiguration());
- variables.put("agent_configuration", factory.getDefaultAgentConfiguration());
+ variables.put("project_configuration", DatProjectUtil.getProjectConfiguration());
+ variables.put("agent_configuration", DatProjectUtil.getDefaultAgentConfiguration());
String yamlContent = JinjaTemplateUtil.render(PROJECT_YAML_TEMPLATE_CONTENT, variables);
Path projectYamlPath = projectPath.resolve(PROJECT_CONFIG_FILE_NAME);
Files.write(projectYamlPath, yamlContent.getBytes());
@@ -457,7 +457,7 @@ private void createProjectYamlFile(Path projectPath) throws IOException {
* @throws IOException
*/
private void createProjectYamlTemplateFile(Path projectPath) throws IOException {
- String yamlContent = new DatProjectFactory().yamlTemplate();
+ String yamlContent = DatProjectUtil.yamlTemplate();
Path projectYamlTemplatePath = projectPath.resolve(PROJECT_CONFIG_TEMPLATE_FILE_NAME);
Files.write(projectYamlTemplatePath, yamlContent.getBytes());
log.info("Create " + PROJECT_CONFIG_TEMPLATE_FILE_NAME + " file: {}", projectYamlTemplatePath);
diff --git a/dat-core/src/main/java/ai/dat/core/utils/YamlTemplateUtil.java b/dat-core/src/main/java/ai/dat/core/utils/YamlTemplateUtil.java
index 61e2b19..e3dee78 100644
--- a/dat-core/src/main/java/ai/dat/core/utils/YamlTemplateUtil.java
+++ b/dat-core/src/main/java/ai/dat/core/utils/YamlTemplateUtil.java
@@ -13,6 +13,7 @@
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
+import java.util.Map;
import java.util.Set;
/**
@@ -68,7 +69,7 @@ private static String configurationTemplate(List configs) {
if (!config.isRequired()) {
sb.append("#");
}
- sb.append(" ").append(str).append("\n");
+ sb.append(" ").append(str).append("\n");
}
} else {
sb.append(" ").append(value);
@@ -113,9 +114,15 @@ private static String toValue(Object value) {
}
if (value instanceof Duration val) {
return TimeUtils.formatWithHighestUnit(val);
+ } else if (value instanceof List || value instanceof Map) {
+ try {
+ return YAML_MAPPER.writeValueAsString(value);
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException(e);
+ }
} else {
try {
- return YAML_MAPPER.writeValueAsString(value).trim();
+ return YAML_MAPPER.writeValueAsString(value).stripTrailing();
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
@@ -134,10 +141,10 @@ private static String toDescription(boolean required, ConfigOption> configOpti
}
String classSimpleName = configOption.getClazz().getSimpleName();
String prefix = "("
- + (configOption.isList() ? "List<" + classSimpleName + ">" : classSimpleName) + ", "
- + (required ? "[Required]" : "[Optional]")
- + defaultValueDescription
- + ")";
+ + (configOption.isList() ? "List<" + classSimpleName + ">" : classSimpleName) + ", "
+ + (required ? "[Required]" : "[Optional]")
+ + defaultValueDescription
+ + ")";
if (description.contains("\n")) {
return prefix + "\n\n" + description;
}
diff --git a/dat-sdk/src/main/java/ai/dat/boot/PreBuildValidator.java b/dat-sdk/src/main/java/ai/dat/boot/PreBuildValidator.java
index b0f4507..617f930 100644
--- a/dat-sdk/src/main/java/ai/dat/boot/PreBuildValidator.java
+++ b/dat-sdk/src/main/java/ai/dat/boot/PreBuildValidator.java
@@ -9,10 +9,10 @@
import ai.dat.core.configuration.ReadableConfig;
import ai.dat.core.data.project.DatProject;
import ai.dat.core.exception.ValidationException;
-import ai.dat.core.factories.DatProjectFactory;
import ai.dat.core.semantic.data.Dimension;
import ai.dat.core.semantic.data.Element;
import ai.dat.core.semantic.data.SemanticModel;
+import ai.dat.core.utils.DatProjectUtil;
import ai.dat.core.utils.FactoryUtil;
import ai.dat.core.utils.JinjaTemplateUtil;
import ai.dat.core.utils.SemanticModelUtil;
@@ -29,7 +29,7 @@
import java.util.stream.Collectors;
import java.util.stream.Stream;
-import static ai.dat.core.factories.DatProjectFactory.*;
+import static ai.dat.core.utils.DatProjectUtil.*;
/**
* @Author JunjieM
@@ -53,9 +53,8 @@ public PreBuildValidator(@NonNull DatProject project, @NonNull Path projectPath,
public void validate() {
ReadableConfig config = project.getConfiguration();
- DatProjectFactory factory = new DatProjectFactory();
- Set> requiredOptions = factory.projectRequiredOptions();
- Set> optionalOptions = factory.projectOptionalOptions();
+ Set> requiredOptions = DatProjectUtil.projectRequiredOptions();
+ Set> optionalOptions = DatProjectUtil.projectOptionalOptions();
FactoryUtil.validateFactoryOptions(requiredOptions, optionalOptions, config);
Map> semanticModels = ChangeSemanticModelsCacheUtil.get(project.getName())
@@ -106,7 +105,7 @@ private void validateModelSqls(@NonNull Map> semanti
StringBuffer sb = new StringBuffer();
validations.forEach((relativePath, validationMessages) -> {
sb.append("There has exceptions in the model SQL syntax validation of the semantic model, " +
- "in the YAML file relative path: ").append(relativePath).append("\n");
+ "in the YAML file relative path: ").append(relativePath).append("\n");
validationMessages.forEach(m -> sb.append(" - ").append(m.semanticModelName)
.append(": ").append(m.exception.getMessage()).append("\n"));
sb.append("\n");
@@ -143,7 +142,7 @@ private void validateSemanticModelSqls(@NonNull Map>
StringBuffer sb = new StringBuffer();
validations.forEach((relativePath, validationMessages) -> {
sb.append("There has exceptions in the semantic model SQL syntax validation of the semantic model, " +
- "in the YAML file relative path: ").append(relativePath).append("\n");
+ "in the YAML file relative path: ").append(relativePath).append("\n");
validationMessages.forEach(m -> sb.append(" - ").append(m.semanticModelName)
.append(": ").append(m.exception.getMessage()).append("\n"));
sb.append("\n");
@@ -189,7 +188,7 @@ private void validateDimensionsEnumValues(@NonNull Map {
sb.append("There has exceptions in the dimension enum values validation of the semantic model, " +
- "in the YAML file relative path: ").append(relativePath).append("\n");
+ "in the YAML file relative path: ").append(relativePath).append("\n");
validationMessages.forEach(m -> sb.append(" - ").append(m.semanticModelName)
.append(": ").append(m.exception.getMessage()).append("\n"));
sb.append("\n");
@@ -228,11 +227,11 @@ private ValidationMessage validateDimensionEnumValues(@NonNull DatabaseAdapter d
try {
if (dimensionDistinctCount(d, databaseAdapter, semanticModelSql) > 1000) {
return "Dimension '" + d.getName()
- + "' -> The number of COUNT DISTINCT in this dimension field " +
- "in the database exceeds 1000, and not recommended to set enum values";
+ + "' -> The number of COUNT DISTINCT in this dimension field " +
+ "in the database exceeds 1000, and not recommended to set enum values";
}
String sql = "SELECT DISTINCT " + d.getName()
- + " FROM (" + semanticModelSql + ") AS __dat_semantic_model";
+ + " FROM (" + semanticModelSql + ") AS __dat_semantic_model";
Set values = databaseAdapter.executeQuery(sql).stream()
.map(map -> map.entrySet().iterator().next().getValue())
.filter(Objects::nonNull).map(Object::toString).collect(Collectors.toSet());
@@ -240,9 +239,9 @@ private ValidationMessage validateDimensionEnumValues(@NonNull DatabaseAdapter d
return null;
}
return "Dimension '" + d.getName()
- + "' -> Enum values contain values that do not exist in the database. " +
- "\n \t\tvalues: [" + String.join(", ", values) + "], " +
- "\n \t\tenum_values: [" + String.join(", ", enumValues) + "]";
+ + "' -> Enum values contain values that do not exist in the database. " +
+ "\n \t\tvalues: [" + String.join(", ", values) + "], " +
+ "\n \t\tenum_values: [" + String.join(", ", enumValues) + "]";
} catch (SQLException e) {
return "Dimension '" + d.getName() + "' -> " + e.getMessage();
}
@@ -259,14 +258,14 @@ private long dimensionDistinctCount(Dimension dimension,
DatabaseAdapter databaseAdapter,
String semanticModelSql) throws SQLException {
String sql = "SELECT COUNT(DISTINCT " + dimension.getName() + ") AS distinct_count"
- + " FROM (" + semanticModelSql + ") AS __dat_semantic_model";
+ + " FROM (" + semanticModelSql + ") AS __dat_semantic_model";
Object value = databaseAdapter.executeQuery(sql).get(0)
.entrySet().iterator().next().getValue();
if (value instanceof Number number) {
return number.longValue();
} else {
throw new ValidationException("The type " + value.getClass().getSimpleName()
- + " cannot be converted to a numeric type");
+ + " cannot be converted to a numeric type");
}
}
@@ -286,7 +285,7 @@ private void validateDataTypes(@NonNull Map> semanti
StringBuffer sb = new StringBuffer();
validations.forEach((relativePath, validationMessages) -> {
sb.append("There has exceptions in the data types validation of the semantic model, " +
- "in the YAML file relative path: ").append(relativePath).append("\n");
+ "in the YAML file relative path: ").append(relativePath).append("\n");
validationMessages.forEach(m -> sb.append(" - ").append(m.semanticModelName)
.append(": ").append(m.exception.getMessage()).append("\n"));
sb.append("\n");
@@ -354,7 +353,7 @@ private void autoCompleteDataTypes(@NonNull Map> sem
StringBuffer sb = new StringBuffer();
validations.forEach((relativePath, validationMessages) -> {
sb.append("There has exceptions in the data types validation of the semantic model, " +
- "in the YAML file relative path: ").append(relativePath).append("\n");
+ "in the YAML file relative path: ").append(relativePath).append("\n");
validationMessages.forEach(m -> sb.append(" - ").append(m.semanticModelName)
.append(": ").append(m.exception.getMessage()).append("\n"));
sb.append("\n");
diff --git a/dat-sdk/src/main/java/ai/dat/boot/utils/ProjectUtil.java b/dat-sdk/src/main/java/ai/dat/boot/utils/ProjectUtil.java
index 679be3d..dcd03c6 100644
--- a/dat-sdk/src/main/java/ai/dat/boot/utils/ProjectUtil.java
+++ b/dat-sdk/src/main/java/ai/dat/boot/utils/ProjectUtil.java
@@ -64,8 +64,7 @@ public static String contentStoreFingerprint(@NonNull Path projectPath) {
}
public static String contentStoreFingerprint(@NonNull DatProject project) {
- DatProjectFactory projectFactory = new DatProjectFactory();
- Map projectFingerprintConfigs = projectFactory
+ Map projectFingerprintConfigs = DatProjectUtil
.projectFingerprintConfigs(project.getConfiguration());
EmbeddingConfig embedding = project.getEmbedding();
EmbeddingStoreConfig embeddingStore = project.getEmbeddingStore();
@@ -84,11 +83,11 @@ public static String contentStoreFingerprint(@NonNull DatProject project) {
.fingerprintConfigs(contentStore.getConfiguration());
try {
String configStr = String.format("project:name=%s;" +
- "project:configuration=%s;" +
- "embedding:provider=%s;" +
- "embedding:configuration=%s;" +
- "embeddingStore:provider=%s;" +
- "embeddingStore:configuration=%s;",
+ "project:configuration=%s;" +
+ "embedding:provider=%s;" +
+ "embedding:configuration=%s;" +
+ "embeddingStore:provider=%s;" +
+ "embeddingStore:configuration=%s;",
project.getName(),
JSON_MAPPER.writeValueAsString(projectFingerprintConfigs),
embedding.getProvider(),
@@ -99,7 +98,7 @@ public static String contentStoreFingerprint(@NonNull DatProject project) {
// For backward compatibility
if (!contentStoreFingerprintConfigs.isEmpty()) {
configStr += String.format("contentStore:provider=%s;" +
- "contentStore:configuration=%s;",
+ "contentStore:configuration=%s;",
contentStore.getProvider(),
JSON_MAPPER.writeValueAsString(contentStoreFingerprintConfigs)
);
@@ -140,14 +139,14 @@ public static ContentStore createContentStore(@NonNull DatProject project, @NonN
private static void adjustEmbeddingStoreConfig(@NonNull DatProject project, @NonNull Path projectPath) {
EmbeddingStoreConfig embeddingStore = project.getEmbeddingStore();
if (EmbeddingStoreConfig.DUCKDB_PROVIDER.equals(embeddingStore.getProvider())
- && embeddingStore.getConfiguration().getOptional(EmbeddingStoreConfig.DUCKDB_FILE_PATH).isEmpty()) {
+ && embeddingStore.getConfiguration().getOptional(EmbeddingStoreConfig.DUCKDB_FILE_PATH).isEmpty()) {
Path datDirPath = projectPath.resolve(DAT_DIR_NAME);
if (!Files.exists(datDirPath)) {
try {
Files.createDirectories(datDirPath);
} catch (IOException e) {
- throw new RuntimeException(
- "The creation of the .dat directory under the project root directory failed", e);
+ throw new RuntimeException("The creation of the " + DAT_DIR_NAME
+ + " directory under the project root directory failed", e);
}
}
String storeFileName = DUCKDB_EMBEDDING_STORE_FILE_PREFIX + contentStoreFingerprint(project);
@@ -195,7 +194,7 @@ public static AskdataAgent createAskdataAgent(@NonNull DatProject project,
validateAgent(agentConfig, allSemanticModels);
semanticModels = allSemanticModels.stream()
.filter(model -> semanticModelNames.contains(model.getName())
- || model.getTags().stream().anyMatch(semanticModelTags::contains))
+ || model.getTags().stream().anyMatch(semanticModelTags::contains))
.collect(Collectors.toList());
}
@@ -282,7 +281,7 @@ public static DatabaseAdapter createDatabaseAdapter(@NonNull DatProject project,
private static void adjustDatabaseConfig(@NonNull DatProject project, @NonNull Path projectPath) {
DatabaseConfig databaseConfig = project.getDb();
if (DatabaseConfig.DUCKDB_PROVIDER.equals(databaseConfig.getProvider())
- && databaseConfig.getConfiguration().getOptional(DatabaseConfig.DUCKDB_FILE_PATH).isEmpty()) {
+ && databaseConfig.getConfiguration().getOptional(DatabaseConfig.DUCKDB_FILE_PATH).isEmpty()) {
Path datDirPath = projectPath.resolve(DAT_DIR_NAME);
if (!Files.exists(datDirPath)) {
try {
@@ -350,13 +349,13 @@ private static void validateAgent(@NonNull AgentConfig agentConfig,
String message = Stream.of(
!missingNames.isEmpty() ?
String.format("There are non-existent semantic model names %s in the agent '%s'. " +
- "Please check the semantic models YAML in your project!",
+ "Please check the semantic models YAML in your project!",
missingNames.stream().map(n -> String.format("'%s'", n)).collect(joining(", ")),
agentConfig.getName())
: null,
!missingTags.isEmpty() ?
String.format("There are non-existent semantic model tags %s in the agent '%s'. " +
- "Please check the semantic models YAML in your project!",
+ "Please check the semantic models YAML in your project!",
missingTags.stream().map(n -> String.format("'%s'", n)).collect(joining(", ")),
agentConfig.getName())
: null
@@ -369,15 +368,15 @@ public static DatProject loadProject(@NonNull Path projectPath) {
Path filePath = findProjectConfigFile(projectPath);
if (filePath == null) {
throw new RuntimeException("The project configuration file not found "
- + PROJECT_CONFIG_FILE_NAME_YAML + " or " + PROJECT_CONFIG_FILE_NAME_YML
- + ", please ensure that the project configuration file exists in the project root directory.");
+ + PROJECT_CONFIG_FILE_NAME_YAML + " or " + PROJECT_CONFIG_FILE_NAME_YML
+ + ", please ensure that the project configuration file exists in the project root directory.");
}
try {
String yamlContent = Files.readString(filePath);
return DatProjectUtil.datProject(yamlContent);
} catch (Exception e) {
throw new RuntimeException("The " + projectPath.relativize(filePath)
- + " YAML file content does not meet the requirements: \n" + e.getMessage(), e);
+ + " YAML file content does not meet the requirements: \n" + e.getMessage(), e);
}
}
@@ -405,7 +404,7 @@ public static DatSchema loadSchema(@NonNull Path filePath, @NonNull Path dirPath
return DatSchemaUtil.datSchema(content);
} catch (Exception e) {
throw new RuntimeException("The " + dirPath.relativize(filePath)
- + " YAML file content does not meet the requirements: \n" + e.getMessage(), e);
+ + " YAML file content does not meet the requirements: \n" + e.getMessage(), e);
}
}
@@ -474,7 +473,7 @@ public static DatModel loadModel(@NonNull Path filePath, @NonNull Path modelsPat
return DatModel.from(name, content);
} catch (Exception e) {
throw new RuntimeException("The " + modelsPath.relativize(filePath)
- + " SQL file content does not meet the requirements: \n" + e.getMessage(), e);
+ + " SQL file content does not meet the requirements: \n" + e.getMessage(), e);
}
}
@@ -515,7 +514,7 @@ private static DatSeed loadSeed(@NonNull Path filePath, @NonNull Path seedsPath)
return DatSeed.from(name, content);
} catch (Exception e) {
throw new RuntimeException("The " + seedsPath.relativize(filePath)
- + " CSV file content does not meet the requirements: \n" + e.getMessage(), e);
+ + " CSV file content does not meet the requirements: \n" + e.getMessage(), e);
}
}
@@ -558,7 +557,8 @@ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
}
});
} catch (IOException e) {
- throw new RuntimeException("The scan for the YAML file in the 'models' directory failed", e);
+ throw new RuntimeException("The scan for the YAML file in the '"
+ + MODELS_DIR_NAME + "' directory failed", e);
}
return files;
}
@@ -570,7 +570,7 @@ private static boolean isYamlFile(@NonNull String fileName) {
public static List scanSqlFiles(@NonNull Path modelsPath) {
List files = new ArrayList<>();
Preconditions.checkArgument(Files.exists(modelsPath),
- "There is no 'models' directory in the project root directory");
+ "There is no '" + MODELS_DIR_NAME + "' directory in the project root directory");
try {
Files.walkFileTree(modelsPath, new SimpleFileVisitor<>() {
@Override
@@ -583,7 +583,8 @@ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
}
});
} catch (IOException e) {
- throw new RuntimeException("The scan for the SQL file in the 'models' directory failed", e);
+ throw new RuntimeException("The scan for the SQL file in the '"
+ + MODELS_DIR_NAME + "' directory failed", e);
}
return files;
}
@@ -595,7 +596,7 @@ private static boolean isSqlFile(@NonNull String fileName) {
private static List scanCsvFiles(@NonNull Path seedsPath) {
List files = new ArrayList<>();
Preconditions.checkArgument(Files.exists(seedsPath),
- "There is no 'seeds' directory in the project root directory");
+ "There is no '" + SEEDS_DIR_NAME + "' directory in the project root directory");
try {
Files.walkFileTree(seedsPath, new SimpleFileVisitor<>() {
@Override
@@ -608,7 +609,7 @@ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
}
});
} catch (IOException e) {
- throw new RuntimeException("The scan for the CSV file in the 'seeds' directory failed", e);
+ throw new RuntimeException("The scan for the CSV file in the '" + SEEDS_DIR_NAME + "' directory failed", e);
}
return files;
}
diff --git a/dat-sdk/src/main/java/ai/dat/core/factories/DatProjectFactory.java b/dat-sdk/src/main/java/ai/dat/core/factories/DatProjectFactory.java
deleted file mode 100644
index 22d96f8..0000000
--- a/dat-sdk/src/main/java/ai/dat/core/factories/DatProjectFactory.java
+++ /dev/null
@@ -1,210 +0,0 @@
-package ai.dat.core.factories;
-
-import ai.dat.core.configuration.ConfigOption;
-import ai.dat.core.configuration.ConfigOptions;
-import ai.dat.core.configuration.ReadableConfig;
-import ai.dat.core.data.project.*;
-import ai.dat.core.exception.ValidationException;
-import ai.dat.core.utils.DatProjectUtil;
-import ai.dat.core.utils.JinjaTemplateUtil;
-import ai.dat.core.utils.YamlTemplateUtil;
-import com.fasterxml.jackson.dataformat.yaml.YAMLMapper;
-import com.networknt.schema.Error;
-import lombok.Getter;
-import lombok.NonNull;
-
-import java.io.IOException;
-import java.io.InputStream;
-import java.nio.charset.StandardCharsets;
-import java.util.*;
-import java.util.concurrent.atomic.AtomicInteger;
-import java.util.stream.Collectors;
-
-/**
- * @Author JunjieM
- * @Date 2025/8/7
- */
-public class DatProjectFactory {
-
- private static final YAMLMapper YAML_MAPPER = new YAMLMapper();
-
- private static final String LLM_NAME_PREFIX = "llm_";
- private static final String AGENT_NAME_PREFIX = "agent_";
-
- private static final String PROJECT_YAML_TEMPLATE;
-
- static {
- PROJECT_YAML_TEMPLATE = loadText("templates/project_yaml_template.jinja");
- }
-
- private static String loadText(String fromResource) {
- try (InputStream inputStream = DatProjectFactory.class.getClassLoader()
- .getResourceAsStream(fromResource)) {
- return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
- } catch (IOException e) {
- throw new RuntimeException("Failed to load text from resources: " + fromResource, e);
- }
- }
-
- public static final ConfigOption BUILDING_VERIFY_MDL_DIMENSIONS_ENUM_VALUES =
- ConfigOptions.key("building.verify-mdl-dimensions-enum-values")
- .booleanType()
- .defaultValue(true)
- .withDescription("Whether to verify the enumeration values of dimensions " +
- "in the semantic model during building");
-
- public static final ConfigOption BUILDING_VERIFY_MDL_DATA_TYPES =
- ConfigOptions.key("building.verify-mdl-data-types")
- .booleanType()
- .defaultValue(true)
- .withDescription("Whether to verify the data types of " +
- "entities, dimensions, measures in the semantic model during building");
-
- public static final ConfigOption BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES =
- ConfigOptions.key("building.auto-complete-mdl-data-types")
- .booleanType()
- .defaultValue(true)
- .withDescription("Whether to automatically complete the data types of " +
- "entities, dimensions, measures in the semantic model during building");
-
- public Set> projectRequiredOptions() {
- return Collections.emptySet();
- }
-
- public Set> projectOptionalOptions() {
- return new LinkedHashSet<>(List.of(
- BUILDING_VERIFY_MDL_DIMENSIONS_ENUM_VALUES,
- BUILDING_VERIFY_MDL_DATA_TYPES,
- BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES
- ));
- }
-
- public static Set> fingerprintOptions() {
- return Set.of(BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES);
- }
-
- public Map projectFingerprintConfigs(@NonNull ReadableConfig config) {
- List keys = fingerprintOptions().stream()
- .map(ConfigOption::key)
- .toList();
- return config.toMap().entrySet().stream()
- .filter(e -> keys.contains(e.getKey()))
- .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
- }
-
- public DatProject create(@NonNull String yamlContent) throws IOException {
- List errors = DatProjectUtil.validate(yamlContent);
- if (!errors.isEmpty()) {
- throw new ValidationException("The YAML verification not pass: \n" + errors);
- }
- return YAML_MAPPER.readValue(yamlContent, DatProject.class);
- }
-
- public String yamlTemplate() {
- List dbs = DatabaseAdapterFactoryManager.getSupports().stream()
- .map(identifier -> {
- DatabaseAdapterFactory factory = DatabaseAdapterFactoryManager.getFactory(identifier);
- boolean display = DatabaseConfig.DEFAULT_PROVIDER.equals(identifier);
- return new SingleItemTemplate(identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- AtomicInteger llmNameAtomic = new AtomicInteger(1);
- List llms = ChatModelFactoryManager.getSupports().stream()
- .map(identifier -> {
- ChatModelFactory factory = ChatModelFactoryManager.getFactory(identifier);
- boolean display = LlmConfig.DEFAULT_PROVIDER.equals(identifier);
- String name = display ? LlmConfig.DEFAULT_NAME : LLM_NAME_PREFIX + (llmNameAtomic.getAndIncrement());
- return new MultipleItemTemplate(name, identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- List embeddings = EmbeddingModelFactoryManager.getSupports().stream()
- .map(identifier -> {
- EmbeddingModelFactory factory = EmbeddingModelFactoryManager.getFactory(identifier);
- boolean display = EmbeddingConfig.DEFAULT_PROVIDER.equals(identifier);
- return new SingleItemTemplate(identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- List embeddingStores = EmbeddingStoreFactoryManager.getSupports().stream()
- .map(identifier -> {
- EmbeddingStoreFactory factory = EmbeddingStoreFactoryManager.getFactory(identifier);
- boolean display = EmbeddingStoreConfig.DEFAULT_PROVIDER.equals(identifier);
- return new SingleItemTemplate(identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- List rerankings = ScoringModelFactoryManager.getSupports().stream()
- .map(identifier -> {
- ScoringModelFactory factory = ScoringModelFactoryManager.getFactory(identifier);
- boolean display = RerankingConfig.DEFAULT_PROVIDER.equals(identifier);
- return new SingleItemTemplate(identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- List contentStores = ContentStoreFactoryManager.getSupports().stream()
- .map(identifier -> {
- ContentStoreFactory factory = ContentStoreFactoryManager.getFactory(identifier);
- boolean display = ContentStoreConfig.DEFAULT_PROVIDER.equals(identifier);
- return new SingleItemTemplate(identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- AtomicInteger agentNameAtomic = new AtomicInteger(1);
- List agents = AskdataAgentFactoryManager.getSupports().stream()
- .map(identifier -> {
- AskdataAgentFactory factory = AskdataAgentFactoryManager.getFactory(identifier);
- boolean display = AgentConfig.DEFAULT_PROVIDER.equals(identifier);
- String name = display ? AgentConfig.DEFAULT_NAME : AGENT_NAME_PREFIX + (agentNameAtomic.getAndIncrement());
- return new MultipleItemContainCommentTemplate(factory.factoryDescription(), name,
- identifier, display, getConfiguration(factory));
- })
- .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
- .collect(Collectors.toList());
-
- Map variables = new HashMap<>();
- variables.put("project_configuration", getProjectConfiguration());
- variables.put("dbs", dbs);
- variables.put("llms", llms);
- variables.put("embeddings", embeddings);
- variables.put("rerankings", rerankings);
- variables.put("embedding_stores", embeddingStores);
- variables.put("content_stores", contentStores);
- variables.put("agents", agents);
-
- return JinjaTemplateUtil.render(PROJECT_YAML_TEMPLATE, variables);
- }
-
- public String getProjectConfiguration() {
- return YamlTemplateUtil.getConfiguration(projectRequiredOptions(), projectOptionalOptions());
- }
-
- public String getDefaultAgentConfiguration() {
- return getConfiguration(new DefaultAskdataAgentFactory());
- }
-
- private String getConfiguration(Factory factory) {
- return YamlTemplateUtil.getConfiguration(factory);
- }
-
-
- private record SingleItemTemplate(@Getter String provider, @Getter boolean display,
- @Getter String configuration) {
- }
-
- private record MultipleItemTemplate(@Getter String name, @Getter String provider, @Getter boolean display,
- @Getter String configuration) {
- }
-
- private record MultipleItemContainCommentTemplate(@Getter String comment, @Getter String name,
- @Getter String provider, @Getter boolean display,
- @Getter String configuration) {
- }
-}
diff --git a/dat-sdk/src/main/java/ai/dat/core/utils/DatProjectUtil.java b/dat-sdk/src/main/java/ai/dat/core/utils/DatProjectUtil.java
index 5d4566d..875f805 100644
--- a/dat-sdk/src/main/java/ai/dat/core/utils/DatProjectUtil.java
+++ b/dat-sdk/src/main/java/ai/dat/core/utils/DatProjectUtil.java
@@ -1,20 +1,26 @@
package ai.dat.core.utils;
-import ai.dat.core.data.project.DatProject;
-import ai.dat.core.factories.DatProjectFactory;
+import ai.dat.core.configuration.ConfigOption;
+import ai.dat.core.configuration.ConfigOptions;
+import ai.dat.core.configuration.ReadableConfig;
+import ai.dat.core.data.project.*;
+import ai.dat.core.exception.ValidationException;
+import ai.dat.core.factories.*;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.json.JsonMapper;
import com.fasterxml.jackson.dataformat.yaml.YAMLMapper;
import com.google.common.base.Preconditions;
-import com.networknt.schema.*;
import com.networknt.schema.Error;
-import com.networknt.schema.dialect.Dialects;
+import com.networknt.schema.*;
+import lombok.Getter;
import lombok.NonNull;
import java.io.IOException;
import java.io.InputStream;
-import java.util.List;
-import java.util.Locale;
+import java.nio.charset.StandardCharsets;
+import java.util.*;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.stream.Collectors;
/**
* DAT้กน็ฎ้
็ฝฎๅทฅๅ
ท็ฑป
@@ -33,8 +39,13 @@ public class DatProjectUtil {
.withDefaultDialect(SpecificationVersion.DRAFT_2020_12,
builder -> builder.schemaRegistryConfig(SCHEMA_CONFIG));
private static final String SCHEMA_PATH = "schemas/project_schema.json";
+ private static final String TEXT_PATH = "templates/project_yaml_template.jinja";
+
+ private static final String LLM_NAME_PREFIX = "llm_";
+ private static final String AGENT_NAME_PREFIX = "agent_";
private static final Schema SCHEMA;
+ private static final String TEMPLATE;
static {
try {
@@ -42,6 +53,7 @@ public class DatProjectUtil {
} catch (IOException e) {
throw new ExceptionInInitializerError("Failed to load project schema file: " + e.getMessage());
}
+ TEMPLATE = loadText();
}
private static Schema loadProjectSchema() throws IOException {
@@ -54,11 +66,40 @@ private static Schema loadProjectSchema() throws IOException {
return SCHEMA_REGISTRY.getSchema(schemaNode);
} catch (IOException e) {
throw new IOException("Failed to parse project schema file: " + SCHEMA_PATH
- + " - " + e.getMessage(), e);
+ + " - " + e.getMessage(), e);
}
}
}
+ private static String loadText() {
+ try (InputStream inputStream = DatProjectUtil.class.getClassLoader().getResourceAsStream(TEXT_PATH)) {
+ return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
+ } catch (IOException e) {
+ throw new RuntimeException("Failed to load text from resources: " + TEXT_PATH, e);
+ }
+ }
+
+ public static final ConfigOption BUILDING_VERIFY_MDL_DIMENSIONS_ENUM_VALUES =
+ ConfigOptions.key("building.verify-mdl-dimensions-enum-values")
+ .booleanType()
+ .defaultValue(true)
+ .withDescription("Whether to verify the enumeration values of dimensions " +
+ "in the semantic model during building");
+
+ public static final ConfigOption BUILDING_VERIFY_MDL_DATA_TYPES =
+ ConfigOptions.key("building.verify-mdl-data-types")
+ .booleanType()
+ .defaultValue(true)
+ .withDescription("Whether to verify the data types of " +
+ "entities, dimensions, measures in the semantic model during building");
+
+ public static final ConfigOption BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES =
+ ConfigOptions.key("building.auto-complete-mdl-data-types")
+ .booleanType()
+ .defaultValue(true)
+ .withDescription("Whether to automatically complete the data types of " +
+ "entities, dimensions, measures in the semantic model during building");
+
private DatProjectUtil() {
}
@@ -73,6 +114,142 @@ public static List validate(@NonNull String yamlContent) throws IOExcepti
}
public static DatProject datProject(@NonNull String yamlContent) throws IOException {
- return new DatProjectFactory().create(yamlContent);
+ List errors = DatProjectUtil.validate(yamlContent);
+ if (!errors.isEmpty()) {
+ throw new ValidationException("The YAML verification not pass: \n" + errors);
+ }
+ return YAML_MAPPER.readValue(yamlContent, DatProject.class);
+ }
+
+ public static Set> projectRequiredOptions() {
+ return Collections.emptySet();
+ }
+
+ public static Set> projectOptionalOptions() {
+ return new LinkedHashSet<>(List.of(
+ BUILDING_VERIFY_MDL_DIMENSIONS_ENUM_VALUES,
+ BUILDING_VERIFY_MDL_DATA_TYPES,
+ BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES
+ ));
+ }
+
+ public static Set> fingerprintOptions() {
+ return Set.of(BUILDING_AUTO_COMPLETE_MDL_DATA_TYPES);
+ }
+
+ public static Map projectFingerprintConfigs(@NonNull ReadableConfig config) {
+ List keys = fingerprintOptions().stream()
+ .map(ConfigOption::key)
+ .toList();
+ return config.toMap().entrySet().stream()
+ .filter(e -> keys.contains(e.getKey()))
+ .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+ }
+
+ public static String yamlTemplate() {
+ List dbs = DatabaseAdapterFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ DatabaseAdapterFactory factory = DatabaseAdapterFactoryManager.getFactory(identifier);
+ boolean display = DatabaseConfig.DEFAULT_PROVIDER.equals(identifier);
+ return new SingleItemTemplate(identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ AtomicInteger llmNameAtomic = new AtomicInteger(1);
+ List llms = ChatModelFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ ChatModelFactory factory = ChatModelFactoryManager.getFactory(identifier);
+ boolean display = LlmConfig.DEFAULT_PROVIDER.equals(identifier);
+ String name = display ? LlmConfig.DEFAULT_NAME : LLM_NAME_PREFIX + (llmNameAtomic.getAndIncrement());
+ return new MultipleItemTemplate(name, identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ List embeddings = EmbeddingModelFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ EmbeddingModelFactory factory = EmbeddingModelFactoryManager.getFactory(identifier);
+ boolean display = EmbeddingConfig.DEFAULT_PROVIDER.equals(identifier);
+ return new SingleItemTemplate(identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ List embeddingStores = EmbeddingStoreFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ EmbeddingStoreFactory factory = EmbeddingStoreFactoryManager.getFactory(identifier);
+ boolean display = EmbeddingStoreConfig.DEFAULT_PROVIDER.equals(identifier);
+ return new SingleItemTemplate(identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ List rerankings = ScoringModelFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ ScoringModelFactory factory = ScoringModelFactoryManager.getFactory(identifier);
+ boolean display = RerankingConfig.DEFAULT_PROVIDER.equals(identifier);
+ return new SingleItemTemplate(identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ List contentStores = ContentStoreFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ ContentStoreFactory factory = ContentStoreFactoryManager.getFactory(identifier);
+ boolean display = ContentStoreConfig.DEFAULT_PROVIDER.equals(identifier);
+ return new SingleItemTemplate(identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ AtomicInteger agentNameAtomic = new AtomicInteger(1);
+ List agents = AskdataAgentFactoryManager.getSupports().stream()
+ .map(identifier -> {
+ AskdataAgentFactory factory = AskdataAgentFactoryManager.getFactory(identifier);
+ boolean display = AgentConfig.DEFAULT_PROVIDER.equals(identifier);
+ String name = display ? AgentConfig.DEFAULT_NAME : AGENT_NAME_PREFIX + (agentNameAtomic.getAndIncrement());
+ return new MultipleItemContainCommentTemplate(factory.factoryDescription(), name,
+ identifier, display, getConfiguration(factory));
+ })
+ .sorted((o1, o2) -> Boolean.compare(o2.display, o1.display))
+ .collect(Collectors.toList());
+
+ Map variables = new HashMap<>();
+ variables.put("project_configuration", getProjectConfiguration());
+ variables.put("dbs", dbs);
+ variables.put("llms", llms);
+ variables.put("embeddings", embeddings);
+ variables.put("rerankings", rerankings);
+ variables.put("embedding_stores", embeddingStores);
+ variables.put("content_stores", contentStores);
+ variables.put("agents", agents);
+
+ return JinjaTemplateUtil.render(TEMPLATE, variables);
+ }
+
+ public static String getProjectConfiguration() {
+ return YamlTemplateUtil.getConfiguration(projectRequiredOptions(), projectOptionalOptions());
+ }
+
+ public static String getDefaultAgentConfiguration() {
+ return getConfiguration(new DefaultAskdataAgentFactory());
+ }
+
+ private static String getConfiguration(Factory factory) {
+ return YamlTemplateUtil.getConfiguration(factory);
+ }
+
+ private record SingleItemTemplate(@Getter String provider, @Getter boolean display,
+ @Getter String configuration) {
+ }
+
+ private record MultipleItemTemplate(@Getter String name, @Getter String provider, @Getter boolean display,
+ @Getter String configuration) {
+ }
+
+ private record MultipleItemContainCommentTemplate(@Getter String comment, @Getter String name,
+ @Getter String provider, @Getter boolean display,
+ @Getter String configuration) {
}
}
\ No newline at end of file