用 Spring AI + pgvector 构建 RAG 知识库 — 新加坡数学 AI 辅导 Phase 2 实战 ::

回顾#

Phase 1 搭建了 Java 25 + Spring AI 的多 Agent 解题管线。但当时 Planner Agent “裸跑”，没有任何参考题库。Phase 2 要解决的核心问题：让 Agent 带着"知识"回答问题。

具体目标：

导入 40 道 PSLE 真题/模拟题到向量数据库
解题前先做 RAG 检索，找到相似题作为 Prompt 上下文
按年级过滤，P5 请求不会看到 P6 难度的参考题
Redis 缓存相同题目的 AI 响应，避免重复调用 LLM

整体架构变化#

graph TD A[SolveRequest] --> B[SolveService - Redis @Cacheable] B -->|Cache Miss| C[MathSolverOrchestrator] C --> D[RagRetrievalService] D --> E[(pgvector - vector_store)] D -->|Top-5 相似题| F[Planner Agent + RAG Context] F --> G{StructuredTaskScope} G --> H[CPA Designer Agent] G --> I[Persona Agent] H --> J[SolveResult] I --> J J --> K[(Redis Cache - 24h TTL)] B -->|Cache Hit| L[直接返回缓存结果] classDef cache fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#e65100 classDef rag fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e7d32 classDef agent fill:#ffffff,stroke:#1976d2,stroke-width:3px,color:#0d47a1 classDef db fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c class B,K,L cache class D,E rag class C,F,H,I agent class J db

Phase 1 的数据流是 请求 → Planner → CPA + Persona → 响应。

Phase 2 新增了两层：

前置：RAG 检索层（RagRetrievalService）
外层：Redis 缓存层（SolveService + @Cacheable）

PSLE 题库建设#

题目格式设计#

每道题包含 content（题目文本）和 metadata（结构化元数据）：

{
  "content": "The ratio of boys to girls in a class is 3:5. If there are 24 boys, how many students are there altogether?",
  "metadata": {
    "grade": 5,
    "topic": "ratio.basic",
    "difficulty": "medium",
    "source": "PSLE Practice"
  }
}

metadata 中的 grade 字段用于 RAG 过滤——学生查询 P5 题目时，只检索 P1-P5 的相似题，不会返回 P6 的内容。

知识点编码体系#

按 2026 PSLE 大纲整理了知识点编码：

年级	编码	示例
P4	`fractions.of_remainder`	“给了 1/4，再给 1/3 的余下…”
P4	`measurement.area_perimeter`	“长方形面积和周长”
P5	`ratio.basic` / `ratio.difference` / `ratio.before_after`	比例类问题
P5	`average.combined_groups`	综合平均数
P6	`algebra.forming_equation`	“用 x 列方程求解”
P6	`algebra.simultaneous_concept`	鸡兔同笼问题

最终收录了 40 道题，覆盖 P4-P6 的核心考点：代数（13 题）、比例（9 题）、平均数（4 题）、分数（6 题）、测量（3 题）等。

Spring AI VectorStore 集成#

PgVectorStore 自动配置#

Spring AI 的 PgVectorStore 自动配置非常丝滑。只需在 application-dev.yml 中配置：

spring:
  ai:
    ollama:
      embedding:
        model: nomic-embed-text    # 768维向量
    vectorstore:
      pgvector:
        initialize-schema: true    # 自动建 vector_store 表
        dimensions: 768            # 匹配 embedding 模型
        index-type: hnsw           # HNSW 索引

initialize-schema: true 让 Spring AI 自动创建 vector_store 表及 HNSW 索引。不需要手动写 DDL。

启动时自动导入题库#

QuestionImportService 监听 ApplicationReadyEvent，在应用启动时检查 vector_store 是否为空，如果是则自动导入：

@Service
public class QuestionImportService {

    private final VectorStore vectorStore;
    private final JdbcTemplate jdbcTemplate;
    private final ObjectMapper objectMapper = new ObjectMapper();

    @EventListener(ApplicationReadyEvent.class)
    public void importQuestionsOnStartup() {
        Long count = jdbcTemplate.queryForObject(
                "SELECT count(*) FROM vector_store", Long.class);
        if (count != null && count > 0) {
            log.info("Vector store already has {} docs, skip", count);
            return;
        }

        log.info("Vector store is empty, importing...");
        importQuestions();
    }

    public void importQuestions() {
        var resource = new ClassPathResource("data/sg-math-questions.json");
        List<Map<String, Object>> questions = objectMapper.readValue(
                resource.getInputStream(), new TypeReference<>() {});

        List<Document> documents = questions.stream()
                .map(q -> new Document(
                    (String) q.get("content"),
                    (Map<String, Object>) q.get("metadata")))
                .toList();

        vectorStore.add(documents);  // 自动调用 Embedding API + 写入 PG
    }
}

vectorStore.add(documents) 这一行做了三件事：

调用 Ollama 的 nomic-embed-text 模型生成 768 维向量
将 content + embedding + metadata 写入 vector_store 表
HNSW 索引自动更新

启动后验证：

SELECT count(*) FROM vector_store;
-- 40

SELECT content, metadata FROM vector_store LIMIT 1;
-- content: "Ali had 120 stickers..."
-- metadata: {"grade": 4, "topic": "fractions.of_remainder", ...}

Spring Boot 4.0 踩坑：ObjectMapper#

Spring Boot 4.0 中，ObjectMapper Bean 不再由 spring-boot-starter-web 自动注册（至少在某些配置组合下如此）。直接 @Autowired ObjectMapper 会报 NoSuchBeanDefinitionException。

解决方案很简单——在服务内部自行创建：

private final ObjectMapper objectMapper = new ObjectMapper();

这在我们的场景下完全够用，因为只需要最基本的 JSON 反序列化。

RAG 检索 + 年级过滤#

RagRetrievalService#

@Service
public class RagRetrievalService {

    private static final int TOP_K = 5;
    private static final double SIMILARITY_THRESHOLD = 0.5;

    private final VectorStore vectorStore;

    public List<Document> retrieveSimilarQuestions(
            String question, int grade) {
        var filterBuilder = new FilterExpressionBuilder();
        var filter = filterBuilder.lte("grade", grade).build();

        SearchRequest searchRequest = SearchRequest.builder()
                .query(question)
                .topK(TOP_K)
                .similarityThreshold(SIMILARITY_THRESHOLD)
                .filterExpression(filter)
                .build();

        return vectorStore.similaritySearch(searchRequest);
    }

    public String formatAsContext(List<Document> documents) {
        var sb = new StringBuilder(
            "=== Similar Questions from PSLE Question Bank ===\n\n");
        for (int i = 0; i < documents.size(); i++) {
            Document doc = documents.get(i);
            sb.append("Question %d: %s\n".formatted(
                i + 1, doc.getText()));
            sb.append("  Topic: %s\n".formatted(
                doc.getMetadata().get("topic")));
            sb.append("  Difficulty: %s\n".formatted(
                doc.getMetadata().get("difficulty")));
            sb.append("\n");
        }
        return sb.toString();
    }
}

关键设计：

FilterExpressionBuilder.lte("grade", grade)：利用 Spring AI 的 filter DSL 实现年级过滤。P5 学生查询时 grade <= 5，不会返回 P6 的代数题
similarityThreshold(0.5)：余弦相似度阈值 0.5，过滤掉不相关的结果
topK(5)：返回最相似的 5 道题作为 Context
formatAsContext()：将检索结果格式化为自然语言，注入 Planner Agent 的 User Prompt

注入 Agent 链#

Planner Agent 的 User Prompt 现在包含三个部分：

private String runPlannerAgent(
        SolveRequest request, String ragContext) {
    String userMessage = """
        Grade: P%d
        Question: %s

        %s
        """.formatted(
            request.grade(),
            request.question(),
            ragContext);

    return chatClient.prompt()
            .system(PLANNER_SYSTEM_PROMPT)
            .user(userMessage)
            .call()
            .content();
}

实际发给 LLM 的 User Prompt 类似：

Grade: P5
Question: The ratio of boys to girls is 4:5. There are 36 students. How many girls are there?

=== Similar Questions from PSLE Question Bank ===

Question 1: The ratio of boys to girls in a class is 3:5. If there are 24 boys, how many students are there altogether?
  Topic: ratio.basic
  Difficulty: easy

Question 2: The ratio of the number of red beads to blue beads is 2:5. If there are 30 more blue beads than red beads, how many beads are there in total?
  Topic: ratio.difference
  Difficulty: medium
...

这样 LLM 在回答时就能参考同类题型的解法模式，输出质量显著提高。

日志验证#

INFO  c.m.service.RagRetrievalService : RAG retrieval returned 5 similar questions for grade <= 5
INFO  c.m.agent.MathSolverOrchestrator : RAG retrieval completed, found 5 similar questions
INFO  c.m.agent.MathSolverOrchestrator : Starting Planner Agent for grade 5 question

System Prompt 优化#

Phase 2 对三个 Agent 的 System Prompt 做了显著增强：

Planner Agent#

Phase 1 (基础)：只要求 JSON 结构化输出和 CPA 方法。

Phase 2 (增强)：新增三部分内容：

PSLE 评分标准

## PSLE 2026 Scoring Criteria
- Full marks require: correct answer + complete working + proper units
- Method marks: awarded for correct approach even if final answer is wrong
- Presentation: numbered steps, one operation per step, clear labelling

知识点编码体系 — 确保 knowledgeTags 使用标准编码而非随意命名
RAG 引用指令 — “Reference similar questions from the knowledge base when the approach is applicable”

CPA Designer Agent#

Phase 2 新增：Bar Model 设计规则

## Bar Model Design Rules
- Use proportional bar lengths to represent quantities
- For ratio problems: draw bars side by side with equal unit lengths
- For fraction problems: divide a single bar into equal parts
- For algebra: use a bar with unknown length labeled with the variable

Persona Agent#

Phase 2 新增：按年级分层输出

- For P1-P3: use concrete objects (sweets, toys, stickers)
- For P4-P5: use relatable scenarios (sharing pizza, collecting cards)
- For P6: use slightly more mature contexts while keeping it fun

还增加了要求提供"Common mistakes"和"Follow-up practice question"的指令，让家长指导更有针对性。

Redis 缓存#

缓存策略#

同一道题 + 同一年级的 AI 响应缓存 24 小时。缓存 key 结构：

solveResults::the ratio of boys to girls is 4:5...:5

@Service
public class SolveService {

    private final MathSolverOrchestrator orchestrator;

    @Cacheable(
        value = "solveResults",
        key = "#request.question().trim().toLowerCase()"
            + " + ':' + #request.grade()")
    public SolveResult solve(SolveRequest request) {
        log.info("Cache miss - running full agent pipeline");
        return orchestrator.solve(request);
    }
}

Redis 配置#

@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager cacheManager(
            RedisConnectionFactory connectionFactory) {
        var jsonSerializer =
            new Jackson2JsonRedisSerializer<>(Object.class);

        RedisCacheConfiguration config =
            RedisCacheConfiguration.defaultCacheConfig()
                .entryTtl(Duration.ofHours(24))
                .serializeValuesWith(
                    RedisSerializationContext.SerializationPair
                        .fromSerializer(jsonSerializer))
                .disableCachingNullValues();

        return RedisCacheManager.builder(connectionFactory)
                .cacheDefaults(config)
                .build();
    }
}

注意：Spring Boot 4.0 中 GenericJackson2JsonRedisSerializer 已标记 @Deprecated(forRemoval = true)，需要改用 Jackson2JsonRedisSerializer。

效果#

首次请求（Cache Miss）：需要完整 Agent 管线 ~2min（本地 Ollama）
后续相同请求（Cache Hit）：直接从 Redis 返回，<50ms

Phase 2 新增文件总结#

backend/src/main/
├── java/com/mathlearning/
│   ├── config/
│   │   └── CacheConfig.java          # Redis 缓存配置
│   └── service/
│       ├── QuestionImportService.java # 题库自动导入
│       ├── RagRetrievalService.java   # RAG 检索 + 年级过滤
│       └── SolveService.java          # 缓存层包装
└── resources/
    └── data/
        └── sg-math-questions.json     # 40 道 PSLE 题库

修改的文件：

MathSolverOrchestrator.java — 新增 RAG 注入 + 增强 System Prompt
SolveController.java — 改用 SolveService（带缓存）替代直接调 Orchestrator

Phase 2 总结#

任务	状态
40 道 PSLE 题库 JSON 编写	✅
PgVectorStore 集成 + 自动导入	✅
RAG 检索注入 Planner Agent	✅
年级过滤（grade ≤ N）	✅
System Prompt 优化（PSLE 评分标准 + 知识点编码）	✅
Redis 24h 缓存	✅

核心收获：

Spring AI VectorStore 的抽象层做得很好。vectorStore.add(documents) 一行代码完成 embedding + 存储 + 索引，similaritySearch() 支持 filter DSL 做元数据过滤，几乎不需要写 SQL。
RAG 不难实现，难在数据质量。40 道题的小题库已经能显著改善 Agent 输出。难点在于题目要覆盖够全、metadata 标注要准确、embedding 模型要适配。
Redis 缓存是性价比最高的优化。本地 Ollama 单次完整 Agent 链 ~2 分钟，加了缓存后相同题目秒级返回，用户体验差异巨大。
Spring Boot 4.0 仍有一些"惊喜"，比如 ObjectMapper 和 GenericJackson2JsonRedisSerializer 的变化，需要关注迁移指南。

Phase 3 计划：用 Kotlin Multiplatform + Compose for Web (Wasm) 搭建前端原型。

用 Spring AI + pgvector 构建 RAG 知识库 — 新加坡数学 AI 辅导 Phase 2 实战

目录

回顾#