Skip to content

Conversation

@dingyi222666
Copy link
Member

@dingyi222666 dingyi222666 commented Oct 18, 2025

This PR significantly enhances the agent system with automatic context management and improved guidance prompts.

New Features

Infinite Context System

  • Implemented automatic context compression when conversation history approaches model token limits
  • Added InfiniteContextManager that monitors token usage and triggers compression at 85% threshold
  • Created ChatLunaInfiniteContextChain for intelligent conversation summarization
  • Compresses older messages into structured, topic-based summaries while preserving recent context
  • Maintains compression metadata for transparency and debugging
  • Enables virtually unlimited conversation length within model constraints

Round-Based Message Truncation

  • Refactored message truncation to preserve complete conversation rounds instead of individual messages
  • Prevents context fragmentation by keeping user-assistant exchanges together
  • Applied to both ChatLunaChatPrompt and ChatLunaChatModel for consistency
  • Added _buildConversationRounds() helper to group messages into logical turns

Enhanced Agent Prompts

  • Added after_user_message system prompt to guide agents on task completion evaluation
  • Helps agents determine when to continue using tools vs. when to provide final responses
  • Improves response quality with requirements for uniqueness and language matching
  • Prevents repetitive patterns and maintains conversational engagement

Sydney Preset Improvements

  • Removed artificial "5 searches" limit - agents can now use tools as needed to complete tasks
  • Added workflow documentation for built_user_toast and built_user_confirm tools
  • Strengthened markdown formatting restrictions with explicit forbidden characters list
  • Updated examples with current dates and improved tool usage patterns
  • Enhanced plain text formatting requirements for better compatibility

Configuration Changes

  • Replaced messageCount slider (2-500 range) with infiniteContext boolean toggle
  • Set infiniteContext to true by default for automatic compression
  • Increased chat history buffer from configurable limit to 10,000 messages
  • Updated locale files (en-US, zh-CN) with infinite context feature descriptions
  • Made countMessageTokens() public in ChatLunaChatModel for external token calculations

Other Changes

Token Management

  • Enhanced token counting with _countMessagesTokens() for batch calculations
  • Improved token limit warnings with clearer messages about round-based truncation
  • Better handling of edge cases when no messages can fit within limits

Code Quality

  • Improved user_id resolution with more robust fallback chain
  • Better error handling in infinite context compression
  • Added comprehensive logging for compression operations
  • Enhanced long memory prompt formatting with clearer context usage guidelines

Enhanced the agent system with better guidance prompts and context handling:
- Updated Sydney preset to remove artificial tool usage limits and improve markdown formatting restrictions
- Added after_user_message system prompt to guide agents on task completion evaluation
- Enhanced long memory prompt with clearer context usage guidelines
- Improved user_id resolution in system prompt variables with fallback chain

The after_user_message prompt helps agents:
- Better evaluate task completion status
- Provide more natural, language-appropriate responses
- Avoid repetitive patterns and maintain engagement
- Use context and tools more effectively

These changes improve the agent's ability to use tools efficiently while
maintaining high-quality, contextually appropriate responses.
…ation

Implemented automatic context compression and improved token management:

Core Features:
- Added InfiniteContextManager for automatic history compression when approaching token limits
- Created ChatLunaInfiniteContextChain to compress conversation chunks into compact summaries
- Replaced messageCount config with infiniteContext boolean toggle

Message Handling Improvements:
- Refactored message truncation to preserve complete conversation rounds instead of individual messages
- Updated both ChatLunaChatPrompt and ChatLunaChatModel to use round-based truncation
- Added _buildConversationRounds() to group messages into logical conversation turns
- Messages are now truncated by complete rounds, preventing context fragmentation

Token Management:
- Enhanced token counting with _countMessagesTokens() for batch calculations
- Improved token limit checks with better error messaging
- Made countMessageTokens() public for external token calculations

Infinite Context Features:
- Triggers compression at 85% of model token limit
- Preserves recent 8 conversation rounds by default
- Splits older messages into chunks for compression
- Generates structured summaries with topic-based organization
- Maintains metadata about compression segments
- Reduces token usage while preserving critical information

Configuration Changes:
- Removed messageCount slider (2-500 range)
- Added infiniteContext boolean flag (default: true)
- Increased chat history buffer to 10000 messages
- Updated locale files with infinite context descriptions

This enables virtually unlimited conversation length while staying within
model context limits by automatically compressing older messages into
structured summaries that preserve key information.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 18, 2025

Warning

Rate limit exceeded

@dingyi222666 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c7d41a5 and 156ccf8.

⛔ Files ignored due to path filters (26)
  • packages/adapter-azure-openai/package.json is excluded by !**/*.json
  • packages/adapter-claude/package.json is excluded by !**/*.json
  • packages/adapter-deepseek/package.json is excluded by !**/*.json
  • packages/adapter-dify/package.json is excluded by !**/*.json
  • packages/adapter-doubao/package.json is excluded by !**/*.json
  • packages/adapter-gemini/package.json is excluded by !**/*.json
  • packages/adapter-hunyuan/package.json is excluded by !**/*.json
  • packages/adapter-ollama/package.json is excluded by !**/*.json
  • packages/adapter-openai-like/package.json is excluded by !**/*.json
  • packages/adapter-openai/package.json is excluded by !**/*.json
  • packages/adapter-qwen/package.json is excluded by !**/*.json
  • packages/adapter-rwkv/package.json is excluded by !**/*.json
  • packages/adapter-spark/package.json is excluded by !**/*.json
  • packages/adapter-wenxin/package.json is excluded by !**/*.json
  • packages/adapter-zhipu/package.json is excluded by !**/*.json
  • packages/core/package.json is excluded by !**/*.json
  • packages/extension-long-memory/package.json is excluded by !**/*.json
  • packages/extension-mcp/package.json is excluded by !**/*.json
  • packages/extension-tools/package.json is excluded by !**/*.json
  • packages/extension-variable/package.json is excluded by !**/*.json
  • packages/renderer-image/package.json is excluded by !**/*.json
  • packages/service-embeddings/package.json is excluded by !**/*.json
  • packages/service-image/package.json is excluded by !**/*.json
  • packages/service-search/package.json is excluded by !**/*.json
  • packages/service-vector-store/package.json is excluded by !**/*.json
  • packages/shared-adapter/package.json is excluded by !**/*.json
📒 Files selected for processing (1)
  • packages/core/src/services/chat.ts (1 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

引入无限上下文压缩与管理流水线,新增 InfiniteContextManager 与 ChatLunaInfiniteContextChain,改为轮级别令牌计数并在多处集成 after_user_message 支持;移除基于消息计数的配置并替换为布尔开关 infiniteContext。

Changes

内聚体 / 文件(s) 变更摘要
配置变更
packages/core/src/config.ts
移除公开字段 messageCount: number,新增 infiniteContext: boolean(默认 true)并更新导出 Schema
无限上下文链
packages/core/src/llm-core/chain/infinite_context_chain.ts
新增 ChatLunaInfiniteContextChain 类与相关接口;实现 fromLLMcompressChunkcall_isAlreadyCompressed、model getter 等压缩调用逻辑
插件聊天链
packages/core/src/llm-core/chain/plugin_chat_chain.ts
导入 HumanMessage,新增常量 AGENT_AFTER_USER_PROMPT,在请求中注入 after_user_message(HumanMessage)
提示格式化
packages/core/src/llm-core/chain/prompt.ts
ChatLunaChatPromptFormat 中新增可选字段 after_user_message?: BaseMessage;扩展 formatMessages 接受并插入该消息;新增轮级别构建与令牌计数辅助方法并以“轮”为单位处理截断
聊天应用集成
packages/core/src/llm-core/chat/app.ts
集成 InfiniteContextManager,延迟创建并缓存 historyMemory;移除 maxMessagesCount 输入并使用固定值 10000 初始化;增加 dispose 时清理逻辑与错误捕获
无限上下文管理器
packages/core/src/llm-core/chat/infinite_context.ts
新增 InfiniteContextManagerInfiniteContextManagerOptions;实现 compressIfNeeded、分块选择、压缩调度、历史重写与 85% 阈值判断等完整流程与辅助方法
模型令牌计数重构
packages/core/src/llm-core/platform/model.ts
将私有计数暴露为公有 countMessageTokens;新增 buildConversationRoundscountMessagesTokens;将截断/裁剪逻辑改为按轮累积计数并返回截断标记与告警
服务层调整
packages/core/src/services/chat.ts
从 ChatInterfaceWrapper 初始化中移除 maxMessagesCount,不再传递配置的消息上限
工具函数微调
packages/core/src/utils/string.ts
扩展 getSystemPromptVariablesuser_id 的解析:增加 session.userId 作为降级来源之一

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Chat as ChatInterface
    participant Manager as InfiniteContextManager
    participant History as ChatHistory
    participant Chain as ChatLunaInfiniteContextChain
    participant Model as ChatLunaChatModel

    Chat->>Manager: compressIfNeeded(wrapper)
    activate Manager
    Manager->>Model: 获取模型与上下文大小
    Manager->>History: 读取消息并计算 tokens(_calculateMessageTokenStats)
    alt 总令牌 > 85% * max
        Manager->>Manager: _splitChunksForCompression -> 构造 chunk 列表
        loop 对每个 chunk
            Manager->>Chain: compressChunk({chunk, conversationId})
            activate Chain
            Chain->>Model: 调用底层 LLM 链进行压缩
            Model-->>Chain: 返回压缩文本/消息
            Chain-->>Manager: 返回压缩结果
            deactivate Chain
        end
        Manager->>History: 用 infinite_context/system 块替换/插入历史并 reloadConversation
    else 小于阈值
        Note right of Manager: 无需压缩,直接返回
    end
    deactivate Manager
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 我是忙碌的小兔子,码海中跳跃忙,
历史碎片被我叠,轮轮压缩不彷徨;
无限上下文轻声说,令牌阈值作伴藏,
链与链相连成歌,兔耳一摆好希望。

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed PR标题为"[Feature] Improve agent system with infinite context and better prompts",清晰地总结了变更集的两个主要方面:无限上下文功能和增强的代理提示。根据原始总结,这些确实是本PR的核心功能——引入了InfiniteContextManager用于自动上下文压缩、ChatLunaInfiniteContextChain用于对话摘要、轮次消息截断以及after_user_message系统提示等。标题具体明确,避免了模糊或通用的措辞,能够让审阅者快速理解PR的主要变化。
Description Check ✅ Passed PR描述与变更集密切相关,详细说明了多项具体的功能改进和实现细节,包括无限上下文系统、轮次消息截断、增强的代理提示、Sydney预设改进、配置更改以及令牌管理等改动。描述不仅提供了有意义的信息,还对应了原始总结中涉及的各个文件和功能。该描述避免了模糊或通用的表述,而是使用了具体的技术术语和功能名称来说明变更。
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dingyi222666, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the agent system by implementing an "Infinite Context" mechanism that intelligently manages conversation history to overcome token limits through automatic compression. It also refines how messages are truncated, ensuring complete conversational turns are preserved, and introduces more sophisticated agent prompts to guide task completion and improve response quality. These changes aim to enable more extended and coherent interactions with the agent, making it more robust and user-friendly.

Highlights

  • Infinite Context System: Introduces automatic context compression to manage conversation history when nearing model token limits, using InfiniteContextManager and ChatLunaInfiniteContextChain for intelligent summarization of older messages into structured, topic-based notes.
  • Round-Based Message Truncation: Refactors message truncation logic to preserve complete conversation rounds instead of individual messages, preventing fragmented context and improving conversational coherence.
  • Enhanced Agent Prompts: Adds an after_user_message system prompt to guide agents on task completion evaluation, improving response quality with requirements for uniqueness, language matching, and avoiding repetitive patterns.
  • Sydney Preset Improvements: Removes the artificial '5 searches' limit, adds workflow documentation for built_user_toast and built_user_confirm tools, strengthens markdown formatting restrictions, and updates examples.
  • Configuration Updates: Replaces the messageCount slider with an infiniteContext boolean toggle (defaulting to true) and increases the internal chat history buffer to 10,000 messages.
  • Improved Token Management: Enhances token counting for batch calculations (_countMessagesTokens) and makes countMessageTokens public in ChatLunaChatModel for external use.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an infinite context system, round-based message truncation, and enhanced agent prompts to improve the agent system. The changes include new files for infinite context management, modifications to the Sydney preset, updates to configuration, and adjustments to message handling. The review focuses on correctness and maintainability, particularly regarding the new prompt and the infinite context logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
packages/core/src/utils/string.ts (1)

190-194: getTimeInUTC 截取下标错误,返回值不正确(严重)

substring(11, 8) 会被当作 substring(8, 11) 处理,只返回 "DDT" 之类的片段,非预期的 "HH:mm:ss"。

-  return date.toISOString().substring(11, 8)
+  // "YYYY-MM-DDTHH:mm:ss.sssZ" -> "HH:mm:ss"
+  return date.toISOString().substring(11, 19)
packages/core/src/llm-core/platform/model.ts (1)

554-614: token 计数被系统性高估:每条消息都额外 +3(严重)

OpenAI 聊天计数模型中“每次回复 priming +3”应在“整段会话”层面只加一次;当前将它放在“单条消息计数”里,裁剪时对每条消息调用会导致累计偏大,提前且过度裁剪,甚至触发不必要的压缩。

请移除 per-message 的 +3,并仅在会话级别(cropMessages 返回前)一次性加上:

-  public async countMessageTokens(message: BaseMessage) {
-    let totalCount = 0
+  public async countMessageTokens(message: BaseMessage) {
     let tokensPerMessage = 0
     let tokensPerName = 0
@@
-    let count = textCount + tokensPerMessage + roleCount + nameCount
+    let count = textCount + tokensPerMessage + roleCount + nameCount
@@
-    totalCount += count
-
-    totalCount += 3 // every reply is primed with <|start|>assistant<|message|>
-
-    return totalCount
+    return count
   }

并在裁剪逻辑的返回前一次性加入 priming 开销(见下条评论)。

packages/core/src/llm-core/chat/app.ts (1)

88-96: wrapper 可能“使用前未赋值”(TypeScript 严格模式下会报错)

catch 分支中传入的 wrapper 未初始化。建议允许 undefined:

-    let wrapper: ChatLunaLLMChainWrapper
+    let wrapper: ChatLunaLLMChainWrapper | undefined
packages/core/src/llm-core/chain/prompt.ts (4)

102-111: 严重:_countMessageTokens 不应修改入参 message,导致内容被“就地”篡改与丢失

在计数阶段将 markdown 图片替换并回写到 message.content 会污染原消息,可能把图像/富文本从最终提示与历史中抹除。应仅在局部变量上做净化并据此计数。

请应用如下修复,避免对 message 产生副作用:

   private async _countMessageTokens(message: BaseMessage) {
-        let content = getMessageContent(message.content)
+        let content = getMessageContent(message.content)

         if (
             content.includes('![image]') &&
             content.includes('base64') &&
             message.additional_kwargs?.['images']
         ) {
-            // replace markdown image to '
-            content = content.replaceAll(/!\[.*?\]\(.*?\)/g, '')
-            message.content = content
+            // 仅用于计数的本地净化,不修改原消息
+            content = content.replaceAll(/!\[.*?\]\(.*?\)/g, '')
         }

-        let result =
-            (await this.tokenCounter(getMessageContent(message.content))) +
+        let result =
+            (await this.tokenCounter(content)) +
             (await this.tokenCounter(
                 messageTypeToOpenAIRole(message.getType())
             ))

Also applies to: 112-117


282-292: 功能错误:after_user_message 只有在存在 agent_scratchpad 时才被加入

这会导致没有工具调用时 after_user_message 被静默丢弃。应无条件在用户输入(及可选 scratchpad)之后追加。

建议修改如下:

         if (agentScratchpad) {
             if (Array.isArray(agentScratchpad)) {
                 result.push(...agentScratchpad)
             } else {
                 result.push(agentScratchpad)
             }
-            
-            if (afterUserMessage) {
-                result.push(afterUserMessage)
-            }
         }
+        // 无论是否存在 scratchpad,都应追加
+        if (afterUserMessage) {
+            result.push(afterUserMessage)
+        }

Also applies to: 273-280


567-568: 索引错误:personalityIndex 误用了 'description'

应为 'personality',否则插入位置计算将异常。

-        const descriptionIndex = findIndexByType('description')
-        const personalityIndex = findIndexByType('description')
+        const descriptionIndex = findIndexByType('description')
+        const personalityIndex = findIndexByType('personality')

345-348: 健壮性:空 result 时直接访问 result[result.length - 1]

在极简预设(无系统/指令)下可能越界。加个长度判断即可。

-        const hasLongMemory =
-            result[result.length - 1].content === 'Ok. I will remember.'
+        const hasLongMemory =
+            result.length > 0 &&
+            result[result.length - 1].content === 'Ok. I will remember.'
🧹 Nitpick comments (10)
packages/core/src/utils/string.ts (2)

399-403: 统一 ID 回退链并强制为字符串

已为 user_id 增加 session.userId 回退,建议同时:

  • 强制转为字符串,避免数值/undefined混入。
  • 与 sender_id 的回退策略保持一致(那里目前缺少 session.userId)。

建议修改本段并同步 sender_id:

-        user_id:
-            session.author?.user?.id ??
-            session.event?.user?.id ??
-            session.userId ??
-            '0',
+        user_id: String(
+            session.author?.user?.id ??
+            session.event?.user?.id ??
+            session.userId ??
+            '0'
+        ),

并在 formatUserPromptString 中将 sender_id 改为同样的回退链(见下方代码片段)。

额外改动(文件内其他位置,非本行范围):

// lines 451-453
sender_id: String(
  session.author?.user?.id ?? session.event?.user?.id ?? session.userId ?? '0'
),

351-360: gzipDecode 入参编码与类型处理不严谨

  • 将 inputEncoding 强类型断言为 'base64' 会掩盖 'hex' 等合法值的类型检查。
  • 当 data 为 ArrayBuffer 时应显式转为 Buffer。

建议实现:

-export async function gzipDecode(
-    data: ArrayBuffer | Buffer | string,
-    inputEncoding: Encoding = 'base64'
-): Promise<string> {
-    const buffer =
-        typeof data === 'string'
-            ? Buffer.from(data, inputEncoding as 'base64')
-            : data
-    return (await gunzipAsync(buffer)).toString('utf8')
-}
+export async function gzipDecode(
+  data: ArrayBuffer | Buffer | string,
+  inputEncoding: BufferEncoding = 'base64'
+): Promise<string> {
+  const buffer: Buffer =
+    typeof data === 'string'
+      ? Buffer.from(data, inputEncoding)
+      : Buffer.isBuffer(data)
+        ? data
+        : Buffer.from(new Uint8Array(data))
+  return (await gunzipAsync(buffer)).toString('utf8')
+}
packages/core/src/llm-core/platform/model.ts (1)

471-549: 基于“对话轮”的裁剪策略存在首轮超限的溢出风险(建议优化)

当“最新一轮”本身超过上限时,当前实现仍整轮纳入并仅标记 truncated,可能导致请求体仍超限而被上游拒绝。建议:

  • 当 selectedRounds 为空且 exceedsLimit 为真时,退化到“按消息粒度”自尾部回收,至少保证包含用户最后一句与必要的助手回复。
  • 或引入“单轮内二次裁剪”:在该轮内倒序逐条累加直至临界,确保不超限。

可选伪码要点:

  • 从 round 末尾开始逐条加入,直到 totalTokens + itemTokens > limit 为止;必要时只保留最后一条 HumanMessage。
packages/core/src/llm-core/chain/plugin_chat_chain.ts (2)

176-178: 将 after_user_message 改为 SystemMessage 更贴合意图

该段是“系统级指令”,用 HumanMessage 可能影响语义分组与记忆开销。建议改为 SystemMessage,并补充导入:

-import { AIMessage, BaseMessage, HumanMessage } from '@langchain/core/messages'
+import { AIMessage, BaseMessage, HumanMessage, SystemMessage } from '@langchain/core/messages'
@@
-requests['after_user_message'] = new HumanMessage(
-  AGENT_AFTER_USER_PROMPT
-)
+requests['after_user_message'] = new SystemMessage(AGENT_AFTER_USER_PROMPT)

287-317: 长指令常量建议外置与本地化

将 AGENT_AFTER_USER_PROMPT 抽出到 preset/配置或 i18n,有利于:

  • 复用与版本化;
  • 语言本地化;
  • 单元测试与对比。
packages/core/src/llm-core/chat/app.ts (1)

482-485: 历史上限硬编码 10000,建议可配置化或集中常量管理

建议:

  • 从 Config 引入可选上限(例如 historyMaxMessages,默认为 10000);或
  • 至少将魔数提升为本模块常量并带注释,便于后续调整与文档同步。
packages/core/src/llm-core/chain/prompt.ts (2)

75-83: 输入变量缺失:未将 after_user_message 声明到 inputVariables

这会影响模板变量校验与 partial/compose 用法的可用性。

请把 after_user_message 加入构造器的 inputVariables

         super({
             inputVariables: [
                 'chat_history',
                 'variables',
                 'input',
                 'agent_scratchpad',
                 'instructions',
-                'configurable'
+                'configurable',
+                'after_user_message'
             ]
         })

412-459: 可选:轮次截断逻辑易超预算且备用分支基本不可达

当前总是会在超限时也纳入最后一轮,导致 usedTokens 可能明显超过 availableLimit,而 441-447 的“无选中轮次”兜底几乎不会触发。建议用“预算递减”写法,严格不超过预算,且保底纳入一整轮。

如需,我可给出等价的简化实现(预算式累加 + 单轮保底)。

packages/core/src/llm-core/chat/infinite_context.ts (2)

46-55: 阈值基线不一致:fallback 使用了“整机上下文”,而模型默认可能用半上下文

invocation.maxTokenLimit 为空时,当前代码回退到 getModelMaxContextSize(),但模型层通常在未指定时会采用约半窗口作为 maxTokenLimit。建议与模型保持一致,避免在 85% 触发点上出现偏差。

-        const maxTokenLimit =
-            invocation.maxTokenLimit && invocation.maxTokenLimit > 0
-                ? invocation.maxTokenLimit
-                : model.getModelMaxContextSize()
+        const maxTokenLimit =
+            invocation.maxTokenLimit && invocation.maxTokenLimit > 0
+                ? invocation.maxTokenLimit
+                : Math.floor(model.getModelMaxContextSize() / 2)

292-320: 可选:分块目标上限可考虑设置硬顶

当前目标为 max(15%窗口, 300),在 128k 窗口下单块可能接近 19k token。可引入硬顶(例如 4k~8k)以降低单次压缩失败/超时风险。

如需,我可以提交基于模型信息的自适应上限计算方案。

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d927c4 and 188a447.

⛔ Files ignored due to path filters (3)
  • packages/core/resources/presets/sydney.yml is excluded by !**/*.yml
  • packages/core/src/locales/en-US.schema.yml is excluded by !**/*.yml
  • packages/core/src/locales/zh-CN.schema.yml is excluded by !**/*.yml
📒 Files selected for processing (9)
  • packages/core/src/config.ts (2 hunks)
  • packages/core/src/llm-core/chain/infinite_context_chain.ts (1 hunks)
  • packages/core/src/llm-core/chain/plugin_chat_chain.ts (3 hunks)
  • packages/core/src/llm-core/chain/prompt.ts (6 hunks)
  • packages/core/src/llm-core/chat/app.ts (5 hunks)
  • packages/core/src/llm-core/chat/infinite_context.ts (1 hunks)
  • packages/core/src/llm-core/platform/model.ts (2 hunks)
  • packages/core/src/services/chat.ts (0 hunks)
  • packages/core/src/utils/string.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • packages/core/src/services/chat.ts
🧰 Additional context used
🧬 Code graph analysis (5)
packages/core/src/llm-core/chain/infinite_context_chain.ts (5)
packages/core/src/llm-core/memory/langchain/buffer_memory.ts (1)
  • BufferMemory (50-97)
packages/core/src/llm-core/chain/base.ts (2)
  • ChatLunaLLMChain (273-361)
  • ChatLunaLLMCallArg (37-48)
packages/core/src/llm-core/platform/model.ts (1)
  • ChatLunaChatModel (102-675)
packages/core/src/utils/string.ts (1)
  • getMessageContent (150-166)
packages/core/src/utils/error.ts (1)
  • ChatLunaError (13-41)
packages/core/src/llm-core/chain/prompt.ts (1)
packages/core/src/llm-core/chat/app.ts (1)
  • chatHistory (339-341)
packages/core/src/llm-core/platform/model.ts (1)
packages/core/src/index.ts (1)
  • logger (38-38)
packages/core/src/llm-core/chat/app.ts (3)
packages/core/src/llm-core/memory/message/database_history.ts (1)
  • KoishiChatMessageHistory (20-405)
packages/core/src/llm-core/memory/langchain/buffer_memory.ts (1)
  • BufferMemory (50-97)
packages/core/src/llm-core/chat/infinite_context.ts (1)
  • InfiniteContextManager (28-406)
packages/core/src/llm-core/chat/infinite_context.ts (5)
packages/core/src/llm-core/memory/message/database_history.ts (2)
  • KoishiChatMessageHistory (20-405)
  • additionalArgs (48-50)
packages/core/src/llm-core/prompt/type.ts (1)
  • PresetTemplate (80-110)
packages/core/src/llm-core/chain/infinite_context_chain.ts (3)
  • ChatLunaInfiniteContextChain (27-155)
  • model (152-154)
  • call (129-150)
packages/core/src/utils/string.ts (1)
  • getMessageContent (150-166)
packages/core/src/llm-core/platform/model.ts (1)
  • ChatLunaChatModel (102-675)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: build
🔇 Additional comments (7)
packages/core/src/llm-core/platform/model.ts (2)

301-308: 在缺少 tokenUsage 时的回退计算合理

缺省使用本地计数作为补偿逻辑可接受。与下方计数修正一并验证整体精度。


467-469: 使用统一的 countMessageTokens 计数系统消息:OK

与新裁剪逻辑保持一致性,有利于上下文开销预估。

packages/core/src/llm-core/chat/app.ts (3)

136-144: 自动压缩集成方式合理

异常被捕获并记录,不阻断主流程;与 config.infiniteContext 开关对齐。


493-507: 缓存 BufferMemory:OK

避免重复实例化,和 chatHistory 绑定一致。


509-525: 惰性初始化 InfiniteContextManager:OK

与当前会话资源绑定,并在 ctx.dispose 时清理,生命周期管理合理。

packages/core/src/config.ts (1)

31-31: 所有关注事项已验证完成,代码无需修改

验证结果显示:

  1. 旧配置引用检查 ✓ 已清理

    • messageCount 仅作为 group.ts 中的局部变量出现
    • maxMessagesCount 仅作为 database_history.ts 中的私有类属性出现
    • 无配置级别的过时引用
  2. i18n 翻译已补充 ✓ 已完成

    • zh-CN.schema.yml (第42行):包含完整中文说明
    • en-US.schema.yml (第42行):包含完整英文说明
packages/core/src/llm-core/chain/infinite_context_chain.ts (1)

78-111: 压缩链实现整体靠谱,返回兼容 text/message 双通道

参数传递与空白处理合理;_isAlreadyCompressed 也覆盖了标签与来源标记两种判定。LGTM。

…orting

Fixed inconsistent token scope in compression metrics and threshold checks:
- Added presetTokens to post-compression count for accurate comparison
- Introduced newTotalTokens variable that includes both compressed history and preset tokens
- Updated reducedTokens calculation to use consistent token base (totalTokens - newTotalTokens)
- Fixed reducedPercent to accurately reflect actual token reduction
- Corrected threshold warning to compare newTotalTokens (not tokenCount alone) against threshold

This ensures compression reporting shows true reduction and threshold checks
account for preset tokens consistently with pre-compression totals.
Added the missing +3 session-level priming token to the totalTokens count
before returning from cropMessages function:
- Incremented totalTokens by 3 just before the return statement
- This accounts for the <|start|>assistant<|message|> priming tokens
- Applied after all message token counting and truncation logic
- Ensures consistent token counting between cropMessages and individual message counts

The +3 is added once per session regardless of the truncated branch,
preventing accumulation across multiple calls.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
packages/core/src/llm-core/chat/infinite_context.ts (2)

69-71: 将压缩阈值改为可配置,并预留输出 Token

当前 0.85 是硬编码;建议通过选项暴露,并在计算阈值时扣除预计输出上限(如 preset.config.maxOutputToken 或调用参数中的 maxTokens),避免把生成预算“吃满”。

示例改动(含默认值,兼容现有行为):

 export interface InfiniteContextManagerOptions {
   chatHistory: KoishiChatMessageHistory
   conversationId: string
   preset?: ComputedRef<PresetTemplate>
+  compressionThresholdRatio?: number // 默认 0.85
+  reserveOutputTokens?: number       // 如未提供,则回退到 preset.config.maxOutputToken
 }
@@
-        const invocation = model.invocationParams()
-        const maxTokenLimit =
+        const invocation = model.invocationParams()
+        const maxTokenLimit =
             invocation.maxTokenLimit && invocation.maxTokenLimit > 0
                 ? invocation.maxTokenLimit
                 : model.getModelMaxContextSize()
@@
-        const threshold = Math.floor(maxTokenLimit * 0.85)
+        const reserved =
+            this.options.reserveOutputTokens ??
+            (this.options.preset?.value?.config?.maxOutputToken ?? 0)
+        const usable = Math.max(0, maxTokenLimit - reserved)
+        const ratio = this.options.compressionThresholdRatio ?? 0.85
+        const threshold = Math.floor(usable * ratio)

298-299: 分块目标常量可配置(比例与下限)

0.15 与 300 为硬编码,且不同模型上下文窗口差异较大。建议通过选项暴露:

 export interface InfiniteContextManagerOptions {
   ...
+  chunkTargetRatio?: number   // 默认 0.15
+  minChunkTokens?: number     // 默认 300
 }
@@
-        const chunkTokenTarget = Math.max(Math.floor(maxTokenLimit * 0.15), 300)
+        const ratio = this.options.chunkTargetRatio ?? 0.15
+        const minChunk = this.options.minChunkTokens ?? 300
+        const chunkTokenTarget = Math.max(Math.floor(maxTokenLimit * ratio), minChunk)
🧹 Nitpick comments (6)
packages/core/src/llm-core/chat/infinite_context.ts (6)

128-134: 当仍高于阈值时,支持多轮压缩(限次)

现在仅一次压缩就落盘;若 newTotalTokens 仍高于阈值,只记录告警。建议在内存中(不写回历史)迭代 1~2 轮,逐步降低 preserve 窗口或提高压缩力度,直至低于阈值或达到轮次上限,以减少后续请求再次触发压缩的抖动。可通过 options 增加 maxCompressionRounds(默认 1)与最小保留窗口下限控制。


204-213: 缺少 AbortSignal 透传,无法取消长时间压缩调用

compressChunk 支持 signal,但此处未传入,无法在请求取消/超时时中止压缩。建议把 AbortSignal 作为 compressIfNeeded 的可选参数下传。

示例改动:

-    async compressIfNeeded(wrapper: ChatLunaLLMChainWrapper): Promise<void> {
+    async compressIfNeeded(
+      wrapper: ChatLunaLLMChainWrapper,
+      signal?: AbortSignal
+    ): Promise<void> {
@@
-            const compressedText = await compressor.compressChunk({
+            const compressedText = await compressor.compressChunk({
                 chunk: chunkText,
                 conversationId: this.options.conversationId
-            })
+            , signal
+            })

如果上层已有统一的调用上下文(如 wrapper 或 chain 的 signal),也可从中获取。请确认可用的来源。


324-336: Token 统计可避免重复计算并做轻量批处理

  • compressIfNeeded 已有 stats;_compressMessages 里又对 mergedMessages 全量重算,浪费 CPU。
  • 预设消息每次都重算 presetTokens,可缓存。

建议:

  • 复用已算出的 system/preserved Token,仅对新生成的 compressedMessages 调用 countMessageTokens。
  • 为 presetTokens 增加简单缓存(基于引用或版本号)。

示例改动(复用统计,减少一次全量遍历):

-        const tokenCount = await this._countMessagesTokens(
-            model,
-            mergedMessages
-        )
+        const tokenMap = new Map(stats.map(s => [s.message, s.tokens]))
+        const preservedTokens =
+          [...systemStats, ...preserved].reduce((sum, s) => sum + (tokenMap.get(s.message) ?? 0), 0)
+        const newCompressedTokens = await this._countMessagesTokens(model, compressedMessages)
+        const tokenCount = preservedTokens + newCompressedTokens

可选:为 _calculateMessageTokenStats 加入批次(如每批 50 条)Promise.all 处理,兼顾吞吐与内存。

Also applies to: 338-349


155-177: 固定保留最近 8 条可能不稳,建议基于 Token 或可配置

preserveCount=8 是经验值;对长消息或工具回复偏大的场景,8 条可能远超安全预算。建议:

  • 将 preserveCount 设为 options.preserveRecentCount(默认 8)。
  • 或改为“按 Token 保留”,直到近端保留区达到 N% 的可用窗口(例如 30%)。

246-258: 压缩产物消息的角色与标记

已通过 name/source 标记避免重复压缩,设计合理。建议再补充一条保护:在 _isCompressedMessage 中同时识别 name === 'infinite_context_ack'(虽然 source 已覆盖),以防外部构造漏写 additional_kwargs 的情况。


56-68: 预设消息 Token 每次重算,建议做引用级缓存

preset 多为静态;每次调用都计算开销可观。可在类上维护一个缓存(ref 与 tokens),当 this.options.preset?.value?.messages 的引用未变时,直接复用上次结果。

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 188a447 and 9222b45.

📒 Files selected for processing (1)
  • packages/core/src/llm-core/chat/infinite_context.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/core/src/llm-core/chat/infinite_context.ts (5)
packages/core/src/llm-core/memory/message/database_history.ts (1)
  • KoishiChatMessageHistory (20-405)
packages/core/src/llm-core/prompt/type.ts (1)
  • PresetTemplate (80-110)
packages/core/src/llm-core/chain/infinite_context_chain.ts (3)
  • ChatLunaInfiniteContextChain (27-155)
  • model (152-154)
  • call (129-150)
packages/core/src/utils/string.ts (1)
  • getMessageContent (150-166)
packages/core/src/llm-core/platform/model.ts (1)
  • ChatLunaChatModel (102-675)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: lint
🔇 Additional comments (1)
packages/core/src/llm-core/chat/infinite_context.ts (1)

96-113: 清空并重写历史的并发风险,请确认调用上下文

先 clear 再 addMessage 的多次写操作,若并发请求同时压缩或追加消息,可能出现丢失或乱序。请确认:

  • compressIfNeeded 是否仅在串行执行的调用路径(例如请求线程外无并发修改)。
  • KoishiChatMessageHistory 是否提供事务/锁或“compare-and-swap”语义。

若不具备,建议封装成单次原子重写接口或增加互斥。

Updated version across all packages:
- Bumped core package from 1.3.0-alpha.68 to 1.3.0-alpha.69
- Updated peerDependencies in all adapter packages to require ^1.3.0-alpha.69
- Updated peerDependencies in all extension packages to require ^1.3.0-alpha.69
- Updated peerDependencies in all service packages to require ^1.3.0-alpha.69
- Fixed trailing comma in chat.ts for consistent formatting

This version includes infinite context, round-based message truncation,
enhanced agent prompts, and token counting fixes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/core/src/llm-core/platform/model.ts (1)

471-495: 建议提取为共享工具函数以避免重复

buildConversationRounds 函数与 ChatLunaChatPrompt 中的实现重复。建议重构为共享工具函数以提高可维护性并避免代码重复。

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9222b45 and c7d41a5.

📒 Files selected for processing (1)
  • packages/core/src/llm-core/platform/model.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/core/src/llm-core/platform/model.ts (1)
packages/core/src/index.ts (1)
  • logger (38-38)
🔇 Additional comments (5)
packages/core/src/llm-core/platform/model.ts (5)

301-303: 正确使用了 countMessageTokens 方法

在 API 未返回 token 使用量时,正确使用 countMessageTokens 计算补全 token 数量。此方法现已公开,符合无限上下文系统的需求。


467-469: 正确统计系统消息的 token 数量

使用 countMessageTokens 方法统计系统消息的 token,与公开 API 的变更一致。


497-503: 批量 token 计数辅助函数实现正确

countMessagesTokens 函数通过迭代调用 countMessageTokens 来实现批量计数,逻辑清晰正确。


551-552: 正确添加了会话级 priming token

此修改响应了之前评审意见,在会话级别统一添加一次 priming token(+3),避免累计失真。注释清晰说明了用途。

但需要配合修复 countMessageTokens 方法中第 614 行的重复计数问题,才能确保 token 统计完全准确。

根据之前的评审意见。


557-617: 公开方法使用已验证正确

验证结果:

  • countMessageTokensinfinite_context.ts 中正确调用(第 331 和 345 行)
  • infinite_context.ts 已正确导入 ChatLunaChatModel(第 8 行)
  • 方法在 model.ts 内部也被正确复用(第 301、467、500 行)
  • prompt.ts 独立维护自身的私有 _countMessageTokens 方法,不受影响
  • 无不当的外部依赖或误用情况

此项改动符合无限上下文系统的设计需求,API 公开化的方式恰当。

@dingyi222666 dingyi222666 merged commit 171e841 into v1-dev Oct 18, 2025
5 checks passed
@dingyi222666 dingyi222666 deleted the fix/better-agent branch October 18, 2025 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants