-
-
Notifications
You must be signed in to change notification settings - Fork 41
[Feature] Improve agent system with infinite context and better prompts #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enhanced the agent system with better guidance prompts and context handling: - Updated Sydney preset to remove artificial tool usage limits and improve markdown formatting restrictions - Added after_user_message system prompt to guide agents on task completion evaluation - Enhanced long memory prompt with clearer context usage guidelines - Improved user_id resolution in system prompt variables with fallback chain The after_user_message prompt helps agents: - Better evaluate task completion status - Provide more natural, language-appropriate responses - Avoid repetitive patterns and maintain engagement - Use context and tools more effectively These changes improve the agent's ability to use tools efficiently while maintaining high-quality, contextually appropriate responses.
…ation Implemented automatic context compression and improved token management: Core Features: - Added InfiniteContextManager for automatic history compression when approaching token limits - Created ChatLunaInfiniteContextChain to compress conversation chunks into compact summaries - Replaced messageCount config with infiniteContext boolean toggle Message Handling Improvements: - Refactored message truncation to preserve complete conversation rounds instead of individual messages - Updated both ChatLunaChatPrompt and ChatLunaChatModel to use round-based truncation - Added _buildConversationRounds() to group messages into logical conversation turns - Messages are now truncated by complete rounds, preventing context fragmentation Token Management: - Enhanced token counting with _countMessagesTokens() for batch calculations - Improved token limit checks with better error messaging - Made countMessageTokens() public for external token calculations Infinite Context Features: - Triggers compression at 85% of model token limit - Preserves recent 8 conversation rounds by default - Splits older messages into chunks for compression - Generates structured summaries with topic-based organization - Maintains metadata about compression segments - Reduces token usage while preserving critical information Configuration Changes: - Removed messageCount slider (2-500 range) - Added infiniteContext boolean flag (default: true) - Increased chat history buffer to 10000 messages - Updated locale files with infinite context descriptions This enables virtually unlimited conversation length while staying within model context limits by automatically compressing older messages into structured summaries that preserve key information.
|
Warning Rate limit exceeded@dingyi222666 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 2 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (26)
📒 Files selected for processing (1)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Walkthrough引入无限上下文压缩与管理流水线,新增 InfiniteContextManager 与 ChatLunaInfiniteContextChain,改为轮级别令牌计数并在多处集成 after_user_message 支持;移除基于消息计数的配置并替换为布尔开关 infiniteContext。 Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Chat as ChatInterface
participant Manager as InfiniteContextManager
participant History as ChatHistory
participant Chain as ChatLunaInfiniteContextChain
participant Model as ChatLunaChatModel
Chat->>Manager: compressIfNeeded(wrapper)
activate Manager
Manager->>Model: 获取模型与上下文大小
Manager->>History: 读取消息并计算 tokens(_calculateMessageTokenStats)
alt 总令牌 > 85% * max
Manager->>Manager: _splitChunksForCompression -> 构造 chunk 列表
loop 对每个 chunk
Manager->>Chain: compressChunk({chunk, conversationId})
activate Chain
Chain->>Model: 调用底层 LLM 链进行压缩
Model-->>Chain: 返回压缩文本/消息
Chain-->>Manager: 返回压缩结果
deactivate Chain
end
Manager->>History: 用 infinite_context/system 块替换/插入历史并 reloadConversation
else 小于阈值
Note right of Manager: 无需压缩,直接返回
end
deactivate Manager
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @dingyi222666, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the agent system by implementing an "Infinite Context" mechanism that intelligently manages conversation history to overcome token limits through automatic compression. It also refines how messages are truncated, ensuring complete conversational turns are preserved, and introduces more sophisticated agent prompts to guide task completion and improve response quality. These changes aim to enable more extended and coherent interactions with the agent, making it more robust and user-friendly. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an infinite context system, round-based message truncation, and enhanced agent prompts to improve the agent system. The changes include new files for infinite context management, modifications to the Sydney preset, updates to configuration, and adjustments to message handling. The review focuses on correctness and maintainability, particularly regarding the new prompt and the infinite context logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
packages/core/src/utils/string.ts (1)
190-194: getTimeInUTC 截取下标错误,返回值不正确(严重)substring(11, 8) 会被当作 substring(8, 11) 处理,只返回 "DDT" 之类的片段,非预期的 "HH:mm:ss"。
- return date.toISOString().substring(11, 8) + // "YYYY-MM-DDTHH:mm:ss.sssZ" -> "HH:mm:ss" + return date.toISOString().substring(11, 19)packages/core/src/llm-core/platform/model.ts (1)
554-614: token 计数被系统性高估:每条消息都额外 +3(严重)OpenAI 聊天计数模型中“每次回复 priming +3”应在“整段会话”层面只加一次;当前将它放在“单条消息计数”里,裁剪时对每条消息调用会导致累计偏大,提前且过度裁剪,甚至触发不必要的压缩。
请移除 per-message 的 +3,并仅在会话级别(cropMessages 返回前)一次性加上:
- public async countMessageTokens(message: BaseMessage) { - let totalCount = 0 + public async countMessageTokens(message: BaseMessage) { let tokensPerMessage = 0 let tokensPerName = 0 @@ - let count = textCount + tokensPerMessage + roleCount + nameCount + let count = textCount + tokensPerMessage + roleCount + nameCount @@ - totalCount += count - - totalCount += 3 // every reply is primed with <|start|>assistant<|message|> - - return totalCount + return count }并在裁剪逻辑的返回前一次性加入 priming 开销(见下条评论)。
packages/core/src/llm-core/chat/app.ts (1)
88-96: wrapper 可能“使用前未赋值”(TypeScript 严格模式下会报错)catch 分支中传入的 wrapper 未初始化。建议允许 undefined:
- let wrapper: ChatLunaLLMChainWrapper + let wrapper: ChatLunaLLMChainWrapper | undefinedpackages/core/src/llm-core/chain/prompt.ts (4)
102-111: 严重:_countMessageTokens 不应修改入参 message,导致内容被“就地”篡改与丢失在计数阶段将 markdown 图片替换并回写到
message.content会污染原消息,可能把图像/富文本从最终提示与历史中抹除。应仅在局部变量上做净化并据此计数。请应用如下修复,避免对
message产生副作用:private async _countMessageTokens(message: BaseMessage) { - let content = getMessageContent(message.content) + let content = getMessageContent(message.content) if ( content.includes('![image]') && content.includes('base64') && message.additional_kwargs?.['images'] ) { - // replace markdown image to ' - content = content.replaceAll(/!\[.*?\]\(.*?\)/g, '') - message.content = content + // 仅用于计数的本地净化,不修改原消息 + content = content.replaceAll(/!\[.*?\]\(.*?\)/g, '') } - let result = - (await this.tokenCounter(getMessageContent(message.content))) + + let result = + (await this.tokenCounter(content)) + (await this.tokenCounter( messageTypeToOpenAIRole(message.getType()) ))Also applies to: 112-117
282-292: 功能错误:after_user_message 只有在存在 agent_scratchpad 时才被加入这会导致没有工具调用时
after_user_message被静默丢弃。应无条件在用户输入(及可选 scratchpad)之后追加。建议修改如下:
if (agentScratchpad) { if (Array.isArray(agentScratchpad)) { result.push(...agentScratchpad) } else { result.push(agentScratchpad) } - - if (afterUserMessage) { - result.push(afterUserMessage) - } } + // 无论是否存在 scratchpad,都应追加 + if (afterUserMessage) { + result.push(afterUserMessage) + }Also applies to: 273-280
567-568: 索引错误:personalityIndex 误用了 'description'应为
'personality',否则插入位置计算将异常。- const descriptionIndex = findIndexByType('description') - const personalityIndex = findIndexByType('description') + const descriptionIndex = findIndexByType('description') + const personalityIndex = findIndexByType('personality')
345-348: 健壮性:空 result 时直接访问 result[result.length - 1]在极简预设(无系统/指令)下可能越界。加个长度判断即可。
- const hasLongMemory = - result[result.length - 1].content === 'Ok. I will remember.' + const hasLongMemory = + result.length > 0 && + result[result.length - 1].content === 'Ok. I will remember.'
🧹 Nitpick comments (10)
packages/core/src/utils/string.ts (2)
399-403: 统一 ID 回退链并强制为字符串已为 user_id 增加 session.userId 回退,建议同时:
- 强制转为字符串,避免数值/undefined混入。
- 与 sender_id 的回退策略保持一致(那里目前缺少 session.userId)。
建议修改本段并同步 sender_id:
- user_id: - session.author?.user?.id ?? - session.event?.user?.id ?? - session.userId ?? - '0', + user_id: String( + session.author?.user?.id ?? + session.event?.user?.id ?? + session.userId ?? + '0' + ),并在 formatUserPromptString 中将 sender_id 改为同样的回退链(见下方代码片段)。
额外改动(文件内其他位置,非本行范围):
// lines 451-453 sender_id: String( session.author?.user?.id ?? session.event?.user?.id ?? session.userId ?? '0' ),
351-360: gzipDecode 入参编码与类型处理不严谨
- 将 inputEncoding 强类型断言为 'base64' 会掩盖 'hex' 等合法值的类型检查。
- 当 data 为 ArrayBuffer 时应显式转为 Buffer。
建议实现:
-export async function gzipDecode( - data: ArrayBuffer | Buffer | string, - inputEncoding: Encoding = 'base64' -): Promise<string> { - const buffer = - typeof data === 'string' - ? Buffer.from(data, inputEncoding as 'base64') - : data - return (await gunzipAsync(buffer)).toString('utf8') -} +export async function gzipDecode( + data: ArrayBuffer | Buffer | string, + inputEncoding: BufferEncoding = 'base64' +): Promise<string> { + const buffer: Buffer = + typeof data === 'string' + ? Buffer.from(data, inputEncoding) + : Buffer.isBuffer(data) + ? data + : Buffer.from(new Uint8Array(data)) + return (await gunzipAsync(buffer)).toString('utf8') +}packages/core/src/llm-core/platform/model.ts (1)
471-549: 基于“对话轮”的裁剪策略存在首轮超限的溢出风险(建议优化)当“最新一轮”本身超过上限时,当前实现仍整轮纳入并仅标记 truncated,可能导致请求体仍超限而被上游拒绝。建议:
- 当 selectedRounds 为空且 exceedsLimit 为真时,退化到“按消息粒度”自尾部回收,至少保证包含用户最后一句与必要的助手回复。
- 或引入“单轮内二次裁剪”:在该轮内倒序逐条累加直至临界,确保不超限。
可选伪码要点:
- 从 round 末尾开始逐条加入,直到 totalTokens + itemTokens > limit 为止;必要时只保留最后一条 HumanMessage。
packages/core/src/llm-core/chain/plugin_chat_chain.ts (2)
176-178: 将 after_user_message 改为 SystemMessage 更贴合意图该段是“系统级指令”,用 HumanMessage 可能影响语义分组与记忆开销。建议改为 SystemMessage,并补充导入:
-import { AIMessage, BaseMessage, HumanMessage } from '@langchain/core/messages' +import { AIMessage, BaseMessage, HumanMessage, SystemMessage } from '@langchain/core/messages' @@ -requests['after_user_message'] = new HumanMessage( - AGENT_AFTER_USER_PROMPT -) +requests['after_user_message'] = new SystemMessage(AGENT_AFTER_USER_PROMPT)
287-317: 长指令常量建议外置与本地化将 AGENT_AFTER_USER_PROMPT 抽出到 preset/配置或 i18n,有利于:
- 复用与版本化;
- 语言本地化;
- 单元测试与对比。
packages/core/src/llm-core/chat/app.ts (1)
482-485: 历史上限硬编码 10000,建议可配置化或集中常量管理建议:
- 从 Config 引入可选上限(例如 historyMaxMessages,默认为 10000);或
- 至少将魔数提升为本模块常量并带注释,便于后续调整与文档同步。
packages/core/src/llm-core/chain/prompt.ts (2)
75-83: 输入变量缺失:未将 after_user_message 声明到 inputVariables这会影响模板变量校验与 partial/compose 用法的可用性。
请把
after_user_message加入构造器的inputVariables:super({ inputVariables: [ 'chat_history', 'variables', 'input', 'agent_scratchpad', 'instructions', - 'configurable' + 'configurable', + 'after_user_message' ] })
412-459: 可选:轮次截断逻辑易超预算且备用分支基本不可达当前总是会在超限时也纳入最后一轮,导致
usedTokens可能明显超过availableLimit,而 441-447 的“无选中轮次”兜底几乎不会触发。建议用“预算递减”写法,严格不超过预算,且保底纳入一整轮。如需,我可给出等价的简化实现(预算式累加 + 单轮保底)。
packages/core/src/llm-core/chat/infinite_context.ts (2)
46-55: 阈值基线不一致:fallback 使用了“整机上下文”,而模型默认可能用半上下文
invocation.maxTokenLimit为空时,当前代码回退到getModelMaxContextSize(),但模型层通常在未指定时会采用约半窗口作为maxTokenLimit。建议与模型保持一致,避免在 85% 触发点上出现偏差。- const maxTokenLimit = - invocation.maxTokenLimit && invocation.maxTokenLimit > 0 - ? invocation.maxTokenLimit - : model.getModelMaxContextSize() + const maxTokenLimit = + invocation.maxTokenLimit && invocation.maxTokenLimit > 0 + ? invocation.maxTokenLimit + : Math.floor(model.getModelMaxContextSize() / 2)
292-320: 可选:分块目标上限可考虑设置硬顶当前目标为
max(15%窗口, 300),在 128k 窗口下单块可能接近 19k token。可引入硬顶(例如 4k~8k)以降低单次压缩失败/超时风险。如需,我可以提交基于模型信息的自适应上限计算方案。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
packages/core/resources/presets/sydney.ymlis excluded by!**/*.ymlpackages/core/src/locales/en-US.schema.ymlis excluded by!**/*.ymlpackages/core/src/locales/zh-CN.schema.ymlis excluded by!**/*.yml
📒 Files selected for processing (9)
packages/core/src/config.ts(2 hunks)packages/core/src/llm-core/chain/infinite_context_chain.ts(1 hunks)packages/core/src/llm-core/chain/plugin_chat_chain.ts(3 hunks)packages/core/src/llm-core/chain/prompt.ts(6 hunks)packages/core/src/llm-core/chat/app.ts(5 hunks)packages/core/src/llm-core/chat/infinite_context.ts(1 hunks)packages/core/src/llm-core/platform/model.ts(2 hunks)packages/core/src/services/chat.ts(0 hunks)packages/core/src/utils/string.ts(1 hunks)
💤 Files with no reviewable changes (1)
- packages/core/src/services/chat.ts
🧰 Additional context used
🧬 Code graph analysis (5)
packages/core/src/llm-core/chain/infinite_context_chain.ts (5)
packages/core/src/llm-core/memory/langchain/buffer_memory.ts (1)
BufferMemory(50-97)packages/core/src/llm-core/chain/base.ts (2)
ChatLunaLLMChain(273-361)ChatLunaLLMCallArg(37-48)packages/core/src/llm-core/platform/model.ts (1)
ChatLunaChatModel(102-675)packages/core/src/utils/string.ts (1)
getMessageContent(150-166)packages/core/src/utils/error.ts (1)
ChatLunaError(13-41)
packages/core/src/llm-core/chain/prompt.ts (1)
packages/core/src/llm-core/chat/app.ts (1)
chatHistory(339-341)
packages/core/src/llm-core/platform/model.ts (1)
packages/core/src/index.ts (1)
logger(38-38)
packages/core/src/llm-core/chat/app.ts (3)
packages/core/src/llm-core/memory/message/database_history.ts (1)
KoishiChatMessageHistory(20-405)packages/core/src/llm-core/memory/langchain/buffer_memory.ts (1)
BufferMemory(50-97)packages/core/src/llm-core/chat/infinite_context.ts (1)
InfiniteContextManager(28-406)
packages/core/src/llm-core/chat/infinite_context.ts (5)
packages/core/src/llm-core/memory/message/database_history.ts (2)
KoishiChatMessageHistory(20-405)additionalArgs(48-50)packages/core/src/llm-core/prompt/type.ts (1)
PresetTemplate(80-110)packages/core/src/llm-core/chain/infinite_context_chain.ts (3)
ChatLunaInfiniteContextChain(27-155)model(152-154)call(129-150)packages/core/src/utils/string.ts (1)
getMessageContent(150-166)packages/core/src/llm-core/platform/model.ts (1)
ChatLunaChatModel(102-675)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: build
🔇 Additional comments (7)
packages/core/src/llm-core/platform/model.ts (2)
301-308: 在缺少 tokenUsage 时的回退计算合理缺省使用本地计数作为补偿逻辑可接受。与下方计数修正一并验证整体精度。
467-469: 使用统一的 countMessageTokens 计数系统消息:OK与新裁剪逻辑保持一致性,有利于上下文开销预估。
packages/core/src/llm-core/chat/app.ts (3)
136-144: 自动压缩集成方式合理异常被捕获并记录,不阻断主流程;与 config.infiniteContext 开关对齐。
493-507: 缓存 BufferMemory:OK避免重复实例化,和 chatHistory 绑定一致。
509-525: 惰性初始化 InfiniteContextManager:OK与当前会话资源绑定,并在 ctx.dispose 时清理,生命周期管理合理。
packages/core/src/config.ts (1)
31-31: 所有关注事项已验证完成,代码无需修改验证结果显示:
旧配置引用检查 ✓ 已清理
messageCount仅作为group.ts中的局部变量出现maxMessagesCount仅作为database_history.ts中的私有类属性出现- 无配置级别的过时引用
i18n 翻译已补充 ✓ 已完成
zh-CN.schema.yml(第42行):包含完整中文说明en-US.schema.yml(第42行):包含完整英文说明packages/core/src/llm-core/chain/infinite_context_chain.ts (1)
78-111: 压缩链实现整体靠谱,返回兼容 text/message 双通道参数传递与空白处理合理;
_isAlreadyCompressed也覆盖了标签与来源标记两种判定。LGTM。
…orting Fixed inconsistent token scope in compression metrics and threshold checks: - Added presetTokens to post-compression count for accurate comparison - Introduced newTotalTokens variable that includes both compressed history and preset tokens - Updated reducedTokens calculation to use consistent token base (totalTokens - newTotalTokens) - Fixed reducedPercent to accurately reflect actual token reduction - Corrected threshold warning to compare newTotalTokens (not tokenCount alone) against threshold This ensures compression reporting shows true reduction and threshold checks account for preset tokens consistently with pre-compression totals.
Added the missing +3 session-level priming token to the totalTokens count before returning from cropMessages function: - Incremented totalTokens by 3 just before the return statement - This accounts for the <|start|>assistant<|message|> priming tokens - Applied after all message token counting and truncation logic - Ensures consistent token counting between cropMessages and individual message counts The +3 is added once per session regardless of the truncated branch, preventing accumulation across multiple calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
packages/core/src/llm-core/chat/infinite_context.ts (2)
69-71: 将压缩阈值改为可配置,并预留输出 Token当前 0.85 是硬编码;建议通过选项暴露,并在计算阈值时扣除预计输出上限(如 preset.config.maxOutputToken 或调用参数中的 maxTokens),避免把生成预算“吃满”。
示例改动(含默认值,兼容现有行为):
export interface InfiniteContextManagerOptions { chatHistory: KoishiChatMessageHistory conversationId: string preset?: ComputedRef<PresetTemplate> + compressionThresholdRatio?: number // 默认 0.85 + reserveOutputTokens?: number // 如未提供,则回退到 preset.config.maxOutputToken } @@ - const invocation = model.invocationParams() - const maxTokenLimit = + const invocation = model.invocationParams() + const maxTokenLimit = invocation.maxTokenLimit && invocation.maxTokenLimit > 0 ? invocation.maxTokenLimit : model.getModelMaxContextSize() @@ - const threshold = Math.floor(maxTokenLimit * 0.85) + const reserved = + this.options.reserveOutputTokens ?? + (this.options.preset?.value?.config?.maxOutputToken ?? 0) + const usable = Math.max(0, maxTokenLimit - reserved) + const ratio = this.options.compressionThresholdRatio ?? 0.85 + const threshold = Math.floor(usable * ratio)
298-299: 分块目标常量可配置(比例与下限)0.15 与 300 为硬编码,且不同模型上下文窗口差异较大。建议通过选项暴露:
export interface InfiniteContextManagerOptions { ... + chunkTargetRatio?: number // 默认 0.15 + minChunkTokens?: number // 默认 300 } @@ - const chunkTokenTarget = Math.max(Math.floor(maxTokenLimit * 0.15), 300) + const ratio = this.options.chunkTargetRatio ?? 0.15 + const minChunk = this.options.minChunkTokens ?? 300 + const chunkTokenTarget = Math.max(Math.floor(maxTokenLimit * ratio), minChunk)
🧹 Nitpick comments (6)
packages/core/src/llm-core/chat/infinite_context.ts (6)
128-134: 当仍高于阈值时,支持多轮压缩(限次)现在仅一次压缩就落盘;若 newTotalTokens 仍高于阈值,只记录告警。建议在内存中(不写回历史)迭代 1~2 轮,逐步降低 preserve 窗口或提高压缩力度,直至低于阈值或达到轮次上限,以减少后续请求再次触发压缩的抖动。可通过 options 增加 maxCompressionRounds(默认 1)与最小保留窗口下限控制。
204-213: 缺少 AbortSignal 透传,无法取消长时间压缩调用compressChunk 支持 signal,但此处未传入,无法在请求取消/超时时中止压缩。建议把 AbortSignal 作为 compressIfNeeded 的可选参数下传。
示例改动:
- async compressIfNeeded(wrapper: ChatLunaLLMChainWrapper): Promise<void> { + async compressIfNeeded( + wrapper: ChatLunaLLMChainWrapper, + signal?: AbortSignal + ): Promise<void> { @@ - const compressedText = await compressor.compressChunk({ + const compressedText = await compressor.compressChunk({ chunk: chunkText, conversationId: this.options.conversationId - }) + , signal + })如果上层已有统一的调用上下文(如 wrapper 或 chain 的 signal),也可从中获取。请确认可用的来源。
324-336: Token 统计可避免重复计算并做轻量批处理
- compressIfNeeded 已有 stats;_compressMessages 里又对 mergedMessages 全量重算,浪费 CPU。
- 预设消息每次都重算 presetTokens,可缓存。
建议:
- 复用已算出的 system/preserved Token,仅对新生成的 compressedMessages 调用 countMessageTokens。
- 为 presetTokens 增加简单缓存(基于引用或版本号)。
示例改动(复用统计,减少一次全量遍历):
- const tokenCount = await this._countMessagesTokens( - model, - mergedMessages - ) + const tokenMap = new Map(stats.map(s => [s.message, s.tokens])) + const preservedTokens = + [...systemStats, ...preserved].reduce((sum, s) => sum + (tokenMap.get(s.message) ?? 0), 0) + const newCompressedTokens = await this._countMessagesTokens(model, compressedMessages) + const tokenCount = preservedTokens + newCompressedTokens可选:为 _calculateMessageTokenStats 加入批次(如每批 50 条)Promise.all 处理,兼顾吞吐与内存。
Also applies to: 338-349
155-177: 固定保留最近 8 条可能不稳,建议基于 Token 或可配置preserveCount=8 是经验值;对长消息或工具回复偏大的场景,8 条可能远超安全预算。建议:
- 将 preserveCount 设为 options.preserveRecentCount(默认 8)。
- 或改为“按 Token 保留”,直到近端保留区达到 N% 的可用窗口(例如 30%)。
246-258: 压缩产物消息的角色与标记已通过 name/source 标记避免重复压缩,设计合理。建议再补充一条保护:在 _isCompressedMessage 中同时识别 name === 'infinite_context_ack'(虽然 source 已覆盖),以防外部构造漏写 additional_kwargs 的情况。
56-68: 预设消息 Token 每次重算,建议做引用级缓存preset 多为静态;每次调用都计算开销可观。可在类上维护一个缓存(ref 与 tokens),当 this.options.preset?.value?.messages 的引用未变时,直接复用上次结果。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/core/src/llm-core/chat/infinite_context.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/core/src/llm-core/chat/infinite_context.ts (5)
packages/core/src/llm-core/memory/message/database_history.ts (1)
KoishiChatMessageHistory(20-405)packages/core/src/llm-core/prompt/type.ts (1)
PresetTemplate(80-110)packages/core/src/llm-core/chain/infinite_context_chain.ts (3)
ChatLunaInfiniteContextChain(27-155)model(152-154)call(129-150)packages/core/src/utils/string.ts (1)
getMessageContent(150-166)packages/core/src/llm-core/platform/model.ts (1)
ChatLunaChatModel(102-675)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: lint
🔇 Additional comments (1)
packages/core/src/llm-core/chat/infinite_context.ts (1)
96-113: 清空并重写历史的并发风险,请确认调用上下文先 clear 再 addMessage 的多次写操作,若并发请求同时压缩或追加消息,可能出现丢失或乱序。请确认:
- compressIfNeeded 是否仅在串行执行的调用路径(例如请求线程外无并发修改)。
- KoishiChatMessageHistory 是否提供事务/锁或“compare-and-swap”语义。
若不具备,建议封装成单次原子重写接口或增加互斥。
Updated version across all packages: - Bumped core package from 1.3.0-alpha.68 to 1.3.0-alpha.69 - Updated peerDependencies in all adapter packages to require ^1.3.0-alpha.69 - Updated peerDependencies in all extension packages to require ^1.3.0-alpha.69 - Updated peerDependencies in all service packages to require ^1.3.0-alpha.69 - Fixed trailing comma in chat.ts for consistent formatting This version includes infinite context, round-based message truncation, enhanced agent prompts, and token counting fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/core/src/llm-core/platform/model.ts (1)
471-495: 建议提取为共享工具函数以避免重复此
buildConversationRounds函数与ChatLunaChatPrompt中的实现重复。建议重构为共享工具函数以提高可维护性并避免代码重复。
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/core/src/llm-core/platform/model.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/core/src/llm-core/platform/model.ts (1)
packages/core/src/index.ts (1)
logger(38-38)
🔇 Additional comments (5)
packages/core/src/llm-core/platform/model.ts (5)
301-303: 正确使用了 countMessageTokens 方法在 API 未返回 token 使用量时,正确使用
countMessageTokens计算补全 token 数量。此方法现已公开,符合无限上下文系统的需求。
467-469: 正确统计系统消息的 token 数量使用
countMessageTokens方法统计系统消息的 token,与公开 API 的变更一致。
497-503: 批量 token 计数辅助函数实现正确
countMessagesTokens函数通过迭代调用countMessageTokens来实现批量计数,逻辑清晰正确。
551-552: 正确添加了会话级 priming token此修改响应了之前评审意见,在会话级别统一添加一次 priming token(+3),避免累计失真。注释清晰说明了用途。
但需要配合修复
countMessageTokens方法中第 614 行的重复计数问题,才能确保 token 统计完全准确。根据之前的评审意见。
557-617: 公开方法使用已验证正确验证结果:
countMessageTokens在infinite_context.ts中正确调用(第 331 和 345 行)infinite_context.ts已正确导入ChatLunaChatModel(第 8 行)- 方法在
model.ts内部也被正确复用(第 301、467、500 行)prompt.ts独立维护自身的私有_countMessageTokens方法,不受影响- 无不当的外部依赖或误用情况
此项改动符合无限上下文系统的设计需求,API 公开化的方式恰当。
This PR significantly enhances the agent system with automatic context management and improved guidance prompts.
New Features
Infinite Context System
InfiniteContextManagerthat monitors token usage and triggers compression at 85% thresholdChatLunaInfiniteContextChainfor intelligent conversation summarizationRound-Based Message Truncation
ChatLunaChatPromptandChatLunaChatModelfor consistency_buildConversationRounds()helper to group messages into logical turnsEnhanced Agent Prompts
after_user_messagesystem prompt to guide agents on task completion evaluationSydney Preset Improvements
built_user_toastandbuilt_user_confirmtoolsConfiguration Changes
messageCountslider (2-500 range) withinfiniteContextboolean toggleinfiniteContextto true by default for automatic compressioncountMessageTokens()public inChatLunaChatModelfor external token calculationsOther Changes
Token Management
_countMessagesTokens()for batch calculationsCode Quality