* feat: search quality boost — compiled truth ranking, detail parameter, cosine re-scoring Compiled truth chunks now rank 2x higher in hybrid search via RRF normalization + source boost. New --detail flag (low/medium/high) controls timeline inclusion. Cosine re-scoring blends query-chunk similarity before dedup for query-specific ranking. Also: remove DISTINCT ON from keyword search (dedup handles per-page capping), add chunk_id + chunk_index to SearchResult, add getEmbeddingsByChunkIds to BrainEngine interface. Inspired by Ramp Labs' "Latent Briefing" paper (April 2026). * feat: RRF normalization, source-aware dedup, detail param in operations RRF scores normalized to 0-1 before 2.0x compiled truth boost. Source-aware dedup guarantees compiled truth chunk per page. Detail parameter added to query operation, dedupResults added to bare search operation. Debug logging via GBRAIN_SEARCH_DEBUG=1. * chore: bump version and changelog (v0.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CJK word count in query expansion CJK text is not space-delimited. A query like "向量搜索优化" was counted as 1 word and silently skipped expansion. Now counts characters for CJK queries instead of space-separated tokens. Co-Authored-By: YIING99 <yiing99@users.noreply.github.com> * feat: retrieval evaluation harness — P@k, R@k, MRR, nDCG@k + gbrain eval Full IR evaluation framework: precisionAtK, recallAtK, mrr, ndcgAtK metrics with runEval() orchestrator. gbrain eval CLI with single-run table and A/B comparison mode (--config-a / --config-b) for parameter tuning. HybridSearchOpts now accepts rrfK and dedupOpts overrides. Co-Authored-By: 4shut0sh <4shut0sh@users.noreply.github.com> * test: search quality tests — RRF boost, dedup guarantee, cosine similarity, E2E benchmark 42 new tests across 3 files: - test/search.test.ts: RRF normalization, compiled truth 2x boost, dedup key collision prevention, cosine similarity edge cases, CJK word count detection - test/dedup.test.ts: source-aware compiled truth guarantee, layer interactions, custom maxPerPage, empty/single result edge cases - test/e2e/search-quality.test.ts: full pipeline against PGLite with basis vector embeddings — chunk_id/chunk_index fields, detail parameter filtering, getEmbeddingsByChunkIds, keyword multi-chunk, vector ordering Also: export rrfFusion + cosineSimilarity for unit testing, fix PGLite getEmbeddingsByChunkIds to parse string vectors from pgvector. * test: search quality benchmark with A/B comparison (baseline vs PR#64) Benchmark measures P@1, MRR, nDCG@5, and source accuracy across 8 queries against 5 seeded pages. Key finding: boost helps entity lookups but over-corrects temporal queries. Validates the --detail parameter as the right control mechanism. Output at docs/benchmarks/2026-04-13.md. * feat: query intent classifier — auto-selects detail level, 100% source accuracy Zero-latency heuristic classifier detects query intent from text patterns: - "Who is Pedro?" → entity → detail=low (compiled truth only) - "When did we last meet?" → temporal → detail=high (no boost, natural ranking) - "Variant fund announcement" → event → detail=high - General queries → detail=medium (default with boost) The key insight: skip the 2.0x compiled truth boost for detail=high queries. Temporal/event queries want natural ranking where timeline entries can win. Benchmark results (source accuracy = does the top chunk match expected type): - Baseline: 100% (already good, no boost needed) - Boost only: 71.4% (boost over-corrects temporal queries) - Boost + intent classifier: 100% (best of both worlds) 35 unit tests for the classifier. 590 total tests pass. * feat: query intent classifier — auto-selects detail level, 100% source accuracy Heuristic classifier detects query intent from text patterns (zero latency, no LLM call). Maps temporal queries ("when did we last meet") to detail=high, entity queries ("who is X") to detail=low, events to detail=high. Benchmark results (29 pages, 20 queries, graded relevance): - Baseline: P@1=0.947, MRR=0.974, source accuracy=89.5% - Boost only: P@1=0.895, MRR=0.939, source accuracy=63.2% (over-correction) - Boost + intent: P@1=0.947, MRR=0.974, source accuracy=89.5% (fully recovered) The intent classifier eliminates the boost's over-correction on temporal queries while preserving its benefits for entity lookups. 35 unit tests for the classifier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: search quality benchmark with A/B comparison (baseline vs PR#64) Rich benchmark: 29 pages, 58 chunks, 20 queries with graded relevance. Now measures CHUNK-LEVEL quality, not just page-level retrieval. Key findings (C. Boost+Intent vs A. Baseline): - Unique pages in top-10: 7.2 → 8.7 (+21% broader coverage) - Compiled truth ratio: 51.6% → 66.8% (+15pp more signal) - CT-first rate: 100% (compiled truth leads for entity queries) - Timeline accessible: 100% (temporal queries still find dates) - Source accuracy: 89.5% maintained (intent classifier prevents regression) The boost alone (B) causes -26pp source accuracy regression. Intent classifier (C) recovers it fully. * docs: clean benchmark report — ELI10 search quality analysis for PR#64 Replaces two drafts with one clean report. Explains what changed, why it matters, and what the numbers mean. All fictional data, no private info. Key findings: 21% more page coverage per query, 29% more compiled truth in results. Intent classifier prevents boost from burying timeline for temporal queries. Full per-query breakdown with before/after comparison. * chore: remove auto-generated benchmark file (clean version is 2026-04-14-search-quality.md) * docs: update project documentation for search quality boost CLAUDE.md: added search/intent.ts, search/eval.ts, commands/eval.ts to key files. Added 5 new test files (search, dedup, intent, eval, e2e/search-quality). Updated test count from 23+4 to 28+5. Added docs/benchmarks/ to key files. README.md: updated search pipeline diagram with intent classifier, RRF normalization, compiled truth boost, cosine re-scoring, and 5-layer dedup. Added --detail flag explanation and benchmark instructions. CHANGELOG.md: added search quality entries to v0.9.3 (intent classifier, --detail flag, gbrain eval, CJK fix). Credited @4shut0sh and @YIING99. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: headline benchmark gains in changelog * docs: add community attribution rule to CHANGELOG voice section --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: YIING99 <yiing99@users.noreply.github.com> Co-authored-by: 4shut0sh <4shut0sh@users.noreply.github.com>
164 lines
5.6 KiB
TypeScript
164 lines
5.6 KiB
TypeScript
/**
|
|
* Query Intent Classifier tests
|
|
*/
|
|
|
|
import { describe, test, expect } from 'bun:test';
|
|
import { classifyQueryIntent, autoDetectDetail } from '../src/core/search/intent.ts';
|
|
|
|
describe('classifyQueryIntent', () => {
|
|
describe('entity queries', () => {
|
|
test('"Who is Pedro?" → entity', () => {
|
|
expect(classifyQueryIntent('Who is Pedro?')).toBe('entity');
|
|
});
|
|
|
|
test('"What does Variant do?" → entity', () => {
|
|
expect(classifyQueryIntent('What does Variant do?')).toBe('entity');
|
|
});
|
|
|
|
test('"Tell me about Brex" → entity', () => {
|
|
expect(classifyQueryIntent('Tell me about Brex')).toBe('entity');
|
|
});
|
|
|
|
test('"What is the ownership economy?" → entity', () => {
|
|
expect(classifyQueryIntent('What is the ownership economy?')).toBe('entity');
|
|
});
|
|
|
|
test('"Summarize Pedro" → entity', () => {
|
|
expect(classifyQueryIntent('Summarize Pedro')).toBe('entity');
|
|
});
|
|
|
|
test('"Background on Variant Fund" → entity', () => {
|
|
expect(classifyQueryIntent('Background on Variant Fund')).toBe('entity');
|
|
});
|
|
|
|
test('"What do we know about Brex?" → entity', () => {
|
|
expect(classifyQueryIntent('What do we know about Brex?')).toBe('entity');
|
|
});
|
|
});
|
|
|
|
describe('temporal queries', () => {
|
|
test('"When did we last meet Pedro?" → temporal', () => {
|
|
expect(classifyQueryIntent('When did we last meet Pedro?')).toBe('temporal');
|
|
});
|
|
|
|
test('"Recent updates on Variant" → temporal', () => {
|
|
expect(classifyQueryIntent('Recent updates on Variant')).toBe('temporal');
|
|
});
|
|
|
|
test('"Meeting notes about Pedro" → temporal', () => {
|
|
expect(classifyQueryIntent('Meeting notes about Pedro')).toBe('temporal');
|
|
});
|
|
|
|
test('"What\'s new with Brex?" → temporal', () => {
|
|
expect(classifyQueryIntent("What's new with Brex?")).toBe('temporal');
|
|
});
|
|
|
|
test('"Last conversation with Jesse" → temporal', () => {
|
|
expect(classifyQueryIntent('Last conversation with Jesse')).toBe('temporal');
|
|
});
|
|
|
|
test('"Timeline of Variant" → temporal', () => {
|
|
expect(classifyQueryIntent('Timeline of Variant')).toBe('temporal');
|
|
});
|
|
|
|
test('"History with Pedro" → temporal', () => {
|
|
expect(classifyQueryIntent('History with Pedro')).toBe('temporal');
|
|
});
|
|
|
|
test('"Updates from last month" → temporal', () => {
|
|
expect(classifyQueryIntent('Updates from last month')).toBe('temporal');
|
|
});
|
|
|
|
test('"Latest on Brex" → temporal', () => {
|
|
expect(classifyQueryIntent('Latest on Brex')).toBe('temporal');
|
|
});
|
|
|
|
test('"How long ago did we meet Jesse?" → temporal', () => {
|
|
expect(classifyQueryIntent('How long ago did we meet Jesse?')).toBe('temporal');
|
|
});
|
|
|
|
test('"2024-03 Pedro" → temporal (date pattern)', () => {
|
|
expect(classifyQueryIntent('2024-03 Pedro')).toBe('temporal');
|
|
});
|
|
});
|
|
|
|
describe('event queries', () => {
|
|
test('"Variant fund announcement" → event', () => {
|
|
expect(classifyQueryIntent('Variant fund announcement')).toBe('event');
|
|
});
|
|
|
|
test('"Brex launched new product" → event', () => {
|
|
expect(classifyQueryIntent('Brex launched new product')).toBe('event');
|
|
});
|
|
|
|
test('"Series B raised $50M" → event', () => {
|
|
expect(classifyQueryIntent('Series B raised $50M')).toBe('event');
|
|
});
|
|
|
|
test('"Brex IPO" → event', () => {
|
|
expect(classifyQueryIntent('Brex IPO')).toBe('event');
|
|
});
|
|
|
|
test('"What happened with the acquisition" → event', () => {
|
|
expect(classifyQueryIntent('What happened with the acquisition')).toBe('event');
|
|
});
|
|
});
|
|
|
|
describe('full context queries → temporal', () => {
|
|
test('"Give me everything on Pedro" → temporal', () => {
|
|
expect(classifyQueryIntent('Give me everything on Pedro')).toBe('temporal');
|
|
});
|
|
|
|
test('"Full history with Variant" → temporal', () => {
|
|
expect(classifyQueryIntent('Full history with Variant')).toBe('temporal');
|
|
});
|
|
|
|
test('"All information about Brex" → temporal', () => {
|
|
expect(classifyQueryIntent('All information about Brex')).toBe('temporal');
|
|
});
|
|
|
|
test('"Deep dive on AI philosophy" → temporal', () => {
|
|
expect(classifyQueryIntent('Deep dive on AI philosophy')).toBe('temporal');
|
|
});
|
|
});
|
|
|
|
describe('general queries', () => {
|
|
test('"AI changes who gets to build" → general', () => {
|
|
expect(classifyQueryIntent('AI changes who gets to build')).toBe('general');
|
|
});
|
|
|
|
test('"fintech payments infrastructure" → general', () => {
|
|
expect(classifyQueryIntent('fintech payments infrastructure')).toBe('general');
|
|
});
|
|
|
|
test('"Pedro Brex" → general (bare entity name)', () => {
|
|
expect(classifyQueryIntent('Pedro Brex')).toBe('general');
|
|
});
|
|
|
|
test('"crypto web3 ownership" → general', () => {
|
|
expect(classifyQueryIntent('crypto web3 ownership')).toBe('general');
|
|
});
|
|
});
|
|
});
|
|
|
|
describe('autoDetectDetail', () => {
|
|
test('entity queries → low', () => {
|
|
expect(autoDetectDetail('Who is Pedro?')).toBe('low');
|
|
expect(autoDetectDetail('What does Variant do?')).toBe('low');
|
|
});
|
|
|
|
test('temporal queries → high', () => {
|
|
expect(autoDetectDetail('When did we last meet Pedro?')).toBe('high');
|
|
expect(autoDetectDetail('Recent updates on Variant')).toBe('high');
|
|
});
|
|
|
|
test('event queries → high', () => {
|
|
expect(autoDetectDetail('Variant fund announcement')).toBe('high');
|
|
});
|
|
|
|
test('general queries → undefined (default)', () => {
|
|
expect(autoDetectDetail('AI changes who gets to build')).toBeUndefined();
|
|
expect(autoDetectDetail('fintech payments')).toBeUndefined();
|
|
});
|
|
});
|