Skip to content

feat: add support reranking using LLM#147

Merged
junjiem merged 4 commits intomainfrom
feat/reranking_using_llm
Nov 20, 2025
Merged

feat: add support reranking using LLM#147
junjiem merged 4 commits intomainfrom
feat/reranking_using_llm

Conversation

@junjiem
Copy link
Copy Markdown
Collaborator

@junjiem junjiem commented Nov 20, 2025

Summary by CodeRabbit

  • New Features

    • LLM-based reranking enabled with configurable model selection for result scoring.
    • Results can be reranked by the chosen LLM when reranking mode is active.
  • Bug Fixes / Behavior

    • Validation updated so reranking can use an LLM without requiring a pre-existing scoring model; otherwise a scoring model is required.
  • Documentation

    • Added default system and user prompts for LLM relevance scoring.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

Adds end-to-end LLM-based reranking: two new config options (USE_LLM_RERANKING, RERANKING_LLM), a new LlmScoringModel that uses a ChatModel for scoring, prompt templates for system/user scoring prompts, and factory logic to instantiate and validate LLM-based scoring when reranking is enabled.

Changes

Cohort / File(s) Summary
Factory Configuration
dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java
Added USE_LLM_RERANKING and RERANKING_LLM public ConfigOptions; imported LlmScoringModel and ChatModel; included new options in optional options and fingerprinting; updated create(...) to resolve ChatModel and instantiate LlmScoringModel when reranking is enabled; adjusted validateConfigOptions(...) to allow reranking with either an existing scoringModel or LLM-based reranking and to enforce constraints when LLM reranking is disabled.
Scoring Model
dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java
New public class implementing ScoringModel; builder-based constructor accepts a ChatModel; uses AiServices to create an Assistant that prompts system/user templates; scoreAll(...) queries the assistant per segment and returns normalized Double scores wrapped in a Response.
Prompt Resources
dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt, dat-core/src/main/resources/prompts/default/scoring_user_prompt_template.txt
Added system prompt defining a strict 0–10 integer scoring scheme and relevance ranges; added user prompt template with Query and Document placeholders for per-segment evaluation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Factory as DefaultContentStoreFactory
    participant LlmScorer as LlmScoringModel
    participant LLM as ChatModel/LLM

    Client->>Factory: create(config)
    Factory->>Factory: validateConfigOptions()
    alt RERANK_MODE && USE_LLM_RERANKING
        Factory->>Factory: resolve ChatModel (RERANKING_LLM or default)
        Factory->>LlmScorer: new LlmScoringModel(chatModel)
        Factory->>Factory: set scoringModel = LlmScorer
    else RERANK_MODE && !USE_LLM_RERANKING
        Factory->>Factory: require existing scoringModel
    end
    Factory->>Client: return ContentStore

    Note over Client,LlmScorer: Runtime scoring flow
    Client->>LlmScorer: scoreAll(segments, query)
    LlmScorer->>LLM: Assistant.score(query, document)
    LLM->>LlmScorer: integer score (0-10)
    LlmScorer->>Client: Response<List<Double>>
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to: factory conditional logic for resolving/instantiating ChatModel and LlmScoringModel; validation branches enforcing presence or absence of a pre-set scoringModel; prompt loading paths and Assistant annotations in LlmScoringModel.java.

Poem

🐇 Hop lightly through prompts and code,

I nudge each doc down a scoring road.
Ten to zero, the numbers sing,
A rabbit’s nudge to rank each thing.
Hooray — the rerank bells ring!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add support reranking using LLM' directly matches the main change: introducing LLM-based reranking support with new config options (USE_LLM_RERANKING, RERANKING_LLM) and a new LlmScoringModel class.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/reranking_using_llm

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df651ad and 5bd057f.

📒 Files selected for processing (4)
  • dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java (5 hunks)
  • dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1 hunks)
  • dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt (1 hunks)
  • dat-core/src/main/resources/prompts/default/scoring_user_prompt_template.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java (2)
dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)
  • LlmScoringModel (19-44)
dat-core/src/main/java/ai/dat/core/utils/FactoryUtil.java (1)
  • FactoryUtil (29-376)
🪛 LanguageTool
dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt

[grammar] ~4-~4: Ensure spelling is correct
Context: ...the given query Guidelines: Score 0-2: Documentis completely irrelevant Score 3-5: Docume...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...mentis completely irrelevant Score 3-5: Documenthas some relevant information but doesn't d...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~7-~7: This phrase is redundant. Consider writing “relevant”.
Context: ...swers the query Score 9-10: Document is highly relevant and directly answers the query You MUST...

(HIGHLY_RELEVANT)

🔇 Additional comments (6)
dat-core/src/main/resources/prompts/default/scoring_user_prompt_template.txt (1)

1-5: LGTM!

The template structure is clear and appropriate for the scoring use case.

dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)

23-28: LGTM!

The builder-based constructor and AiServices integration is well-structured.

dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java (4)

12-12: LGTM!

The new imports are appropriate for the LLM reranking integration.

Also applies to: 16-16


73-84: LGTM!

The new configuration options are well-documented and appropriate for controlling LLM-based reranking behavior.


296-296: LGTM!

The new options are correctly added to the optional options set.


497-500: LGTM!

The validation logic correctly ensures that when reranking is enabled, either a scoring model is provided or LLM reranking is enabled.

Comment on lines +31 to +37
public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
List<Double> scores = segments.stream()
.map(segment -> assistant.scoring(query, segment.text()))
.map(Double::valueOf)
.toList();
return Response.from(scores);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for LLM response parsing.

The current implementation assumes the LLM will always return a valid integer, but LLMs can fail to follow instructions or return unexpected output. If assistant.scoring returns null or the LLM returns non-numeric text, this will throw NullPointerException or cause parsing failures at runtime.

Consider adding defensive error handling:

 @Override
 public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
     List<Double> scores = segments.stream()
             .map(segment -> assistant.scoring(query, segment.text()))
+            .map(score -> score != null ? score : 0)
             .map(Double::valueOf)
             .toList();
     return Response.from(scores);
 }

Alternatively, add try-catch and logging to handle parse failures gracefully.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
List<Double> scores = segments.stream()
.map(segment -> assistant.scoring(query, segment.text()))
.map(Double::valueOf)
.toList();
return Response.from(scores);
}
public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
List<Double> scores = segments.stream()
.map(segment -> assistant.scoring(query, segment.text()))
.map(score -> score != null ? score : "0")
.map(score -> {
try {
return Double.valueOf(score);
} catch (NumberFormatException e) {
// Log error and return default value
return 0.0;
}
})
.toList();
return Response.from(scores);
}
🤖 Prompt for AI Agents
In dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java around lines
31 to 37, the code assumes assistant.scoring always returns a valid numeric
value; add defensive parsing and error handling so null or non-numeric responses
don't crash. For each segment call assistant.scoring, check for null, and parse
the returned string inside a try-catch (catch NumberFormatException and any
runtime exceptions); on failure log a warning including segment identifier or
snippet and the raw LLM response, then use a safe fallback score (e.g., 0.0) or
skip the segment per project convention; collect the safe Double values and
return Response.from(scores) so the method never throws due to LLM parse errors.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt (1)

1-8: LGTM! Prompt is clear and ready for use.

The typos from the previous review have been fixed. The static analysis hint about "highly relevant" being redundant can be safely ignored—the phrase effectively distinguishes the 9-10 score range from the 6-8 "relevant" range.

dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)

31-38: Add comprehensive error handling and logging for LLM scoring failures.

While the null check on Line 34 addresses null responses, the implementation lacks protection against other failure modes:

  1. Parse exceptions: If the LLM returns non-numeric text despite instructions, LangChain4j's Integer parsing may throw exceptions rather than returning null.
  2. Runtime exceptions: Network issues, timeouts, or model errors could throw uncaught exceptions.
  3. Silent failures: The fallback to 0 on Line 34 masks failures without logging, making debugging difficult.

Wrap the scoring call in a try-catch block with logging:

 @Override
 public Response<List<Double>> scoreAll(List<TextSegment> segments, String query) {
     List<Double> scores = segments.stream()
-            .map(segment -> assistant.scoring(query, segment.text()))
-            .map(score -> score != null ? score : 0)
+            .map(segment -> {
+                try {
+                    Integer score = assistant.scoring(query, segment.text());
+                    return score != null ? score : 0;
+                } catch (Exception e) {
+                    log.warn("LLM scoring failed for segment: {}. Using fallback score 0. Error: {}", 
+                             segment.text().substring(0, Math.min(50, segment.text().length())), 
+                             e.getMessage());
+                    return 0;
+                }
+            })
             .map(Double::valueOf)
             .toList();
     return Response.from(scores);
 }

Don't forget to add the @Slf4j annotation to the class (Lombok is already imported).

🧹 Nitpick comments (1)
dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)

35-35: Consider using a clearer method for Integer-to-Double conversion.

Double::valueOf works here through unboxing and widening (Integer → int → double), but Number::doubleValue or Integer::doubleValue would be more explicit and easier to understand.

-            .map(Double::valueOf)
+            .map(Number::doubleValue)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd057f and 45b89a3.

📒 Files selected for processing (3)
  • dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java (6 hunks)
  • dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1 hunks)
  • dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java (2)
dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)
  • LlmScoringModel (19-45)
dat-core/src/main/java/ai/dat/core/utils/FactoryUtil.java (1)
  • FactoryUtil (29-376)
🪛 LanguageTool
dat-core/src/main/resources/prompts/default/scoring_system_prompt.txt

[style] ~7-~7: This phrase is redundant. Consider writing “relevant”.
Context: ...swers the query Score 9-10: Document is highly relevant and directly answers the query You MUST...

(HIGHLY_RELEVANT)

🔇 Additional comments (1)
dat-core/src/main/java/ai/dat/core/scoring/LlmScoringModel.java (1)

41-42: Both prompt resource files verified to exist in the classpath.

Verification confirms that both prompt files required by the LlmScoringModel interface are present and accessible:

  • prompts/default/scoring_system_prompt.txt
  • prompts/default/scoring_user_prompt_template.txt

The resource files are located in dat-core/src/main/resources/ and will be found correctly at runtime.

Comment on lines +396 to +402
if (config.get(RERANK_MODE) && config.get(USE_LLM_RERANKING)) {
ChatModel rerankingChatModel = config.getOptional(RERANKING_LLM)
.map(instances::get).orElse(defaultInstance).getChatModel();
builder.scoringModel(LlmScoringModel.builder()
.chatModel(rerankingChatModel)
.build());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

LLM reranking instantiation looks correct.

The logic properly creates an LlmScoringModel only when both RERANK_MODE and USE_LLM_RERANKING are enabled, with a safe fallback to the default LLM instance if RERANKING_LLM is not specified.

Verify that the integration works correctly by checking how the scoringModel is used in DefaultContentStore:


🏁 Script executed:

#!/bin/bash
# Description: Verify scoringModel usage in DefaultContentStore

echo "Finding DefaultContentStore usage of scoringModel..."
ast-grep --pattern 'class DefaultContentStore {
  $$$
  scoringModel
  $$$
}'

echo -e "\n\nSearching for rerank-related methods..."
rg -n -A 5 -B 2 "\.scoringModel\(" dat-core/src/main/java/ai/dat/core/contentstore/DefaultContentStore.java

Length of output: 1639



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Verify validation logic for USE_LLM_RERANKING and RERANK_MODE configuration

echo "Checking validation code around lines 497-500..."
rg -n -A 10 -B 5 "rerankMode\|RERANK_MODE" dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java | head -100

echo -e "\n\nSearching for USE_LLM_RERANKING validation..."
rg -n -A 5 -B 5 "USE_LLM_RERANKING" dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java

Length of output: 330



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Find correct path and verify validation logic

echo "Finding DefaultContentStoreFactory.java..."
fd "DefaultContentStoreFactory.java"

echo -e "\n\nSearching for validation logic..."
rg -n -A 10 -B 5 "rerankMode.*useLlmReranking\|USE_LLM_RERANKING" --type java

Length of output: 155



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: View actual validation logic around lines 497-513

echo "Viewing validation code..."
sed -n '490,520p' dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java

echo -e "\n\nViewing RERANK_MODE and USE_LLM_RERANKING config declarations..."
sed -n '70,85p' dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java

Length of output: 3009


LLM reranking instantiation looks correct, but missing validation for nonsensical configuration.

The logic properly creates an LlmScoringModel only when both RERANK_MODE and USE_LLM_RERANKING are enabled, with a safe fallback to the default LLM instance if RERANKING_LLM is not specified. Integration verified—scoringModel is correctly passed to ReRankingContentAggregator in all reranking flows.

However, add validation to prevent USE_LLM_RERANKING=true without RERANK_MODE=true. Currently, if a user enables USE_LLM_RERANKING but leaves RERANK_MODE false, the LLM scoring model is silently not created (line 396-402 requires both conditions), and the configuration is ignored. Add a check at lines 497-500 (after retrieving both flags) to enforce: if useLlmReranking is true, then rerankMode must also be true.

Comment on lines +497 to +500
Boolean useLlmReranking = config.get(USE_LLM_RERANKING);
Preconditions.checkArgument(!rerankMode || useLlmReranking || scoringModel != null,
"'" + RERANK_MODE.key() + "' is true and '" + USE_LLM_RERANKING.key() + "' is false, " +
"reranking has not been set yet");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add validation that USE_LLM_RERANKING requires RERANK_MODE to be enabled.

The current validation ensures that when RERANK_MODE is true, either USE_LLM_RERANKING is true or a scoringModel exists. However, the reverse constraint is missing: USE_LLM_RERANKING should only be meaningful when RERANK_MODE is also true. Allowing USE_LLM_RERANKING=true with RERANK_MODE=false is a misconfiguration that silently does nothing (Lines 396-402 won't execute).

Add a validation check after Line 500:

         Preconditions.checkArgument(!rerankMode || useLlmReranking || scoringModel != null,
                 "'" + RERANK_MODE.key() + "' is true and '" + USE_LLM_RERANKING.key() + "' is false, " +
                         "reranking has not been set yet");
+        Preconditions.checkArgument(!useLlmReranking || rerankMode,
+                "'" + USE_LLM_RERANKING.key() + "' is true but '" + RERANK_MODE.key() + "' is false. " +
+                        "LLM reranking requires reranking to be enabled.");
🤖 Prompt for AI Agents
In dat-core/src/main/java/ai/dat/core/factories/DefaultContentStoreFactory.java
around lines 497 to 500, add a validation that if USE_LLM_RERANKING is true then
RERANK_MODE must also be enabled: check the boolean from config
(useLlmReranking) and assert that if useLlmReranking is true then rerankMode is
true, using Preconditions.checkArgument with a clear error message that
"'USE_LLM_RERANKING' requires 'RERANK_MODE' to be true" to fail fast on this
misconfiguration.

@junjiem junjiem merged commit b332e22 into main Nov 20, 2025
1 check passed
@junjiem junjiem deleted the feat/reranking_using_llm branch November 20, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant