Spring AI 2.0.0-M2 的 Ollama think 字段污染 Bug：排查过程与 Interceptor 临时方案 ::

背景#

在开发新加坡小学数学 AI 辅导 App 时，我使用 Ollama 本地运行 qwen3.5:2b 模型。qwen3.5 是 thinking-capable 模型，Ollama 0.12+ 默认开启 thinking 模式，会先在 thinking 字段输出推理过程，再在 content 字段输出最终答案。

问题是：thinking 模式下推理耗时从 ~16s 膨胀到 ~52s（3x），而且对 PSLE 小学数学来说完全不需要深度推理。所以我需要关闭 thinking。

现象#

使用 Spring AI 2.0.0-M2 提供的 API 关闭 thinking：

ChatClient.prompt()
    .system(systemPrompt)
    .user(userMessage)
    .options(OllamaChatOptions.builder().disableThinking().build())
    .call()
    .content();

Ollama 返回 HTTP 400：

think must be a boolean or string ("high", "medium", "low", true, or false)

根因分析#

抓取 Spring AI 发往 Ollama 的请求体，发现 think 字段出现在了两个位置：

{
  "model": "qwen3.5:2b",
  "messages": [...],
  "think": false,
  "options": {
    "temperature": 0.7,
    "think": { "type": "DISABLED" }
  }
}

顶层 "think": false — 正确，Ollama 能识别
options.think: ThinkOption 对象 — 错误，Ollama 的 options 字段只接受模型参数（temperature、num_predict 等），不认识 think，导致了参数污染。

问题出在 OllamaChatModel.ollamaChatRequest() 方法：

.options(requestOptions)           // 调用 requestOptions.toMap()，think 渗透进 options map
.think(requestOptions.getThinkOption()) // 正确设到顶层

OllamaChatOptions.filterNonSupportedFields() 的 NON_SUPPORTED_FIELDS 列表中缺少 "think"，导致它没有被从 options map 中过滤掉：

// Spring AI 2.0.0-M2 源码示例 (OllamaChatOptions.java)
public Map<String, Object> toMap() {
    Map<String, Object> options = ModelOptionsUtils.toMap(this);
    return filterNonSupportedFields(options); // "think" 仍然留在 options 中
}

这个 bug 已在上游 spring-ai#5435 修复，但尚未发布到正式版。

提示：如果你的项目已经升级到 Spring AI 2.0.0-RC1+，请直接忽略本文，因为上游已经修复了该字段污染问题。

尝试过的无效方案#

在找到正确的绕过方式之前，我试过这些：

子类化 OllamaChatOptions 重写 toMap() — 失败。ModelOptionsUtils.merge() 在合并选项时会创建新的 OllamaChatOptions 实例，子类方法被丢弃。
直接用 ChatModel.call(new Prompt(..., options)) — 同样触发 bug，因为最终都走 OllamaChatModel.ollamaChatRequest()。

两种可行的绕过方案#

方案 A：裸 RestClient 直接调 Ollama API#

完全绕过 Spring AI 的 OllamaChatModel，自己构造 HTTP 请求：

private final RestClient ollamaClient;

private String callOllama(String systemPrompt, String userMessage) {
    var requestBody = Map.of(
        "model", chatModel,
        "messages", List.of(
            Map.of("role", "system", "content", systemPrompt),
            Map.of("role", "user", "content", userMessage)),
        "stream", false,
        "think", false);  // 只出现在顶层，不会进入 options

    var response = ollamaClient.post()
        .uri("/api/chat")
        .body(requestBody)
        .retrieve()
        .body(Map.class);

    var message = (Map<String, Object>) response.get("message");
    return (String) message.get("content");
}

方案 B：ClientHttpRequestInterceptor 在 HTTP 层修复#

保留 ChatClient，注册一个 RestClientCustomizer，在请求发出前从 options map 中移除泄漏的 think 字段：

@Configuration
public class OllamaConfig {

    @Bean
    RestClientCustomizer ollamaThinkFieldFixCustomizer(ObjectMapper objectMapper) {
        return restClientBuilder -> restClientBuilder
            .requestInterceptor((request, body, execution) -> {
                if (body != null && body.length > 0) {
                    try {
                        var tree = objectMapper.readTree(body);
                        if (tree.has("options") && tree.get("options").has("think")) {
                            ((ObjectNode) tree.get("options")).remove("think");
                            body = objectMapper.writeValueAsBytes(tree);
                        }
                    } catch (Exception e) {
                        // 非 JSON 请求或解析失败，跳过
                    }
                }
                return execution.execute(request, body);
            });
    }

    @Bean
    ChatClient chatClient(OllamaChatModel ollamaChatModel) {
        return ChatClient.builder(ollamaChatModel).build();
    }
}

业务代码回到标准的 ChatClient 调用：

private String callLlm(String systemPrompt, String userMessage) {
    return chatClient.prompt()
        .system(systemPrompt)
        .user(userMessage)
        .options(OllamaChatOptions.builder().disableThinking().build())
        .call()
        .content();
}

Tradeoff 对比#

	方案 A：裸 RestClient	方案 B：Interceptor
做法	绕过 `OllamaChatModel`，直接构造 HTTP 请求	保留 `ChatClient`，在 HTTP 层拦截修复请求体
Spring AI 功能	全部丢失（prompt template、output parser、advisor chain、observability）	全部保留
代码量	需自行管理 RestClient、超时、响应解析	一个 `RestClientCustomizer` bean (~20 行)
切换 LLM provider	需重写调用层	改 `application.yml` 即可
后续迁移成本	高 — 需重写回 ChatClient	极低 — 删掉 Interceptor Bean 即可
潜在副作用	无 — 完全控制请求体	拦截器会解析该 RestClient 的所有请求（需 catch 异常）
调试难度	低 — 请求体一目了然	中 — 需要知道有 interceptor 存在

最终选择：方案 B#

选择 Interceptor 方案的核心理由：

保留 Spring AI 生态 — prompt template、advisor chain（如 logging advisor、retry advisor）、observability（Micrometer metrics）等功能不用重新造轮子。对于一个正在迭代的项目，这些功能迟早会用到。
Provider 可切换 — 项目计划后续接入 DeepSeek-R1 等云端模型。方案 A 绑死了 Ollama 的 HTTP API 格式，切换时需要重写；方案 B 只需改配置。
移除成本极低 — 等 spring-ai#5435 合入正式版（预计 2.0.0-RC1），只需删掉一个 @Bean 方法。而方案 A 意味着要把整个调用层从裸 RestClient 改回 ChatClient。
关于风险的评估 — Interceptor 会解析所有经过该 RestClient 的请求体，但实际上 Spring AI 的 Ollama RestClient 只用于 Ollama API 调用，不会有误伤。解析失败时 catch 住直接跳过，不影响正常请求。

验证#

改造后效果符合预期，think 字段不再渗透入 options，且推理过程被正确关闭：

场景	推理耗时 (Think)	内容生成 (Content)	总计耗时
Thinking 开启 (默认)	~26s	~26s	~52s
Thinking 关闭 (`disableThinking()` + Interceptor)	~6s	~10s	~16s

小结#

Spring AI 2.0.0-M2 的 OllamaChatOptions.disableThinking() 有 bug：think 字段泄漏到 options map，导致 Ollama 返回 400
上游已修复（spring-ai#5435），但未发布
临时绕过推荐用 ClientHttpRequestInterceptor 方案，保留 Spring AI 全部功能，升级后一行删除
遇到框架 bug 时，优先选择最小侵入的绕过方式，而不是完全抛弃框架 — 短期省事，长期负债

注意：Spring Boot 4.0 中 RestClientCustomizer 的包路径从 org.springframework.boot.web.client 改为 org.springframework.boot.restclient，迁移时注意更新 import。

Spring AI 2.0.0-M2 的 Ollama think 字段污染 Bug：排查过程与 Interceptor 临时方案

目录

背景#

现象#