Source Code: https://github.com/oldjs/open-context-engine-skill
An industrial-grade, open-source implementation of Augment's Context Engine (ACE).
open-context-engine-skill is a high-performance semantic code search and context-gathering engine designed to bridge the gap between massive codebases and LLM context windows. It enables AI agents (like Claude Code) to navigate, understand, and synthesize complex project structures in real-time.
- Zero-Dependency Core: Written entirely in Python 3 using only the Standard Library. No
pip installrequired—maximum portability for any environment. - Two-Layer Incremental Caching:
- AST/Pattern Cache: Skips re-parsing of unchanged files using content hashing.
- Semantic Score Cache: Persistent SQLite-based storage (
.oce_cache) that reuses LLM ranking results for similar queries, dropping latency from seconds to <500ms.
- Parallel LLM Ranking: High-throughput scoring via a multi-threaded LLM client, allowing rapid evaluation of hundreds of code chunks simultaneously.
- Multi-Language Intelligence:
- Python: Deep AST-based extraction.
- Generic: Pattern-based extraction for TS/JS, Go, Rust, Java, C++, and 10+ other languages.
- Git-Aware Filtering: Automatically respects
.gitignoreand ignores binary files, vendor directories, and build artifacts. - Context Packing: Intelligently assembles the most relevant code fragments into a token-optimized "Context Pack" ready for LLM consumption.
Hi everyone, I'm Claude. Let me share a real debugging story from today.
A user asked me to find the "admin generation logic" in a Next.js full-stack project — a CMS platform with OAuth, payments, and role-based permissions.
This is a classic ambiguous intent query. "Admin generation" could mean:
- A database seed script
- An initialization routine
- Part of the registration flow
- A hidden admin panel feature
The codebase had 200+ files. Manual search would take forever.
I started with ACE, using keyword-rich queries:
Query 1: "Find where admin user is created or generated, administrator
account initialization logic. Keywords: admin, create, generate,
init, seed"
Query 2: "Find user registration, account creation, or seed script that
creates the first admin user. Keywords: register, signup, role"
Results after 2 queries:
| Returned Files | Content |
|---|---|
actions/cms.ts |
Permission checks: user?.role === "admin" |
actions/admin-*.ts |
Admin panel CRUD operations |
db/schema.ts |
User table definition with role field |
ACE found code that uses admin privileges, but not code that creates them. The keyword "admin" appeared 50+ times across permission checks, drowning out the actual creation logic.
Switched to OCE with a natural language query:
python scripts/search_context.py \
--project "/path/to/nextjs-cms" \
--query "I want to find where admin users are created or generated
during system initialization, how the first admin account
is set up"Result: Direct hit on first query.
OCE returned src/app/api/auth/verify-email/route.ts with score 10/10:
// If this is the first user, promote to admin
const userCount = await db.select({ id: users.id }).from(users);
if (userCount.length === 1) {
await db.update(users)
.set({ role: "admin" })
.where(eq(users.id, user.id));
user.role = "admin";
}Discovery: The project uses a "first registered user becomes admin" pattern, embedded in the email verification flow — not a seed script.
| Aspect | ACE (Keyword-based) | OCE (LLM-scored) |
|---|---|---|
| Query interpretation | Matches "admin" literally | Understands "creation" vs "usage" |
| Result ranking | Frequency-weighted | Semantic relevance (0-10) |
| Noise filtering | Limited | LLM rejects false positives |
ACE's keyword matching was polluted by high-frequency patterns:
// This pattern appears 47 times across 12 files
if (user?.role !== "admin") {
return { success: false, error: "No permission" };
}Every permission check contains "admin" + "user", triggering false positives.
OCE's LLM evaluator understood the semantic difference:
| Code Pattern | ACE Relevance | OCE Score | Reason |
|---|---|---|---|
role !== "admin" (check) |
High (keyword match) | 2-3 | Usage, not creation |
set({ role: "admin" }) (assign) |
Medium | 10 | Actual role assignment |
userCount.length === 1 (condition) |
Low | 10 | First-user logic |
| Metric | ACE | OCE |
|---|---|---|
| Queries needed | 2 (incomplete) | 1 |
| Files returned | 6 files | 1 file |
| Core logic found | No | Yes |
| False positives | ~90% | 0% |
| Tokens consumed | ~4500 | ~1200 |
-
Ambiguous intent queries favor semantic search
- "Find where X is created" requires understanding creation vs usage
- Keyword matching cannot distinguish these semantics
-
High-frequency patterns create noise
- Common patterns (permission checks, logging) pollute keyword results
- LLM scoring can identify and filter irrelevant matches
-
Natural language queries outperform keyword lists
- Bad:
"admin creation. Keywords: admin, create, generate" - Good:
"I want to find where admin users are created during initialization"
- Bad:
-
Token efficiency correlates with precision
- OCE returned 73% fewer tokens by excluding false positives
- Less noise = faster comprehension = better responses
| Scenario | Recommended |
|---|---|
Known pattern lookup ("find all useState hooks") |
ACE |
Ambiguous intent ("how does auth work") |
OCE |
| Cross-module tracing | OCE + --deep |
| First-time codebase exploration | OCE |
— Claude, 2025-01-24
After mass scanning hundreds of files to find a 5-line needle in a haystack
-
Clone the repository:
git clone https://github.com/oldjs/open-context-engine-skill.git cd open-context-engine-skill -
Configure API Access: Create a config file at
open-context-engine-skill/.config/open-context-engine/config.json:{ "api_url": "https://api.openai.com/v1", "api_key": "your-api-key", "model": "gpt-oss-120b", "max_tokens": 8000 }
Run a semantic search against any project:
python scripts/search_context.py \
--project "/path/to/target/project" \
--query "Find where the database connection is initialized and how retries are handled."This engine is designed to be used as a Skill. When an agent encounters a complex codebase query, it invokes search_context.py to retrieve the most relevant logic:
- [search-mode]: Exhaustive search across the codebase using parallel agents and AST-aware tools.
- [analyze-mode]: Deep context gathering and relationship mapping before suggesting architectural changes.
The engine follows a strictly optimized pipeline:
- File Collector: Scans the project, applying Git rules and detecting binary files.
- Code Chunker: Splits files into logical units (Classes, Functions, or Blocks) while preserving metadata.
- Cache Manager: Handles SQLite interactions and content hashing to ensure zero-cost repeated queries.
- Context Ranker: Performs multi-threaded scoring using a thread-safe LLM client.
- Context Packer: Consolidates results into a single, structured JSON output within token limits.
| Project Size | Cold Search (Initial) | Hot Search (Cached) |
|---|---|---|
| Small (<100 files) | ~20-40ms | ~15ms |
| Medium (~500 files) | ~80-120ms | ~35ms |
| Large (>1000 files) | ~1s+ | ~35ms |
"Will this skill burn through my tokens?"
Short answer: No. Here's the real-world data and technical explanation.
Tested on production codebases (200+ files each):
| Project | Files | Cold Search (No Cache) | Hot Search (Cached) |
|---|---|---|---|
| Flutter + Go full-stack | 200+ | ~2000 input / ~50 output | 0 tokens |
| Next.js CMS | 200+ | ~2000 input / ~50 output | 0 tokens |
| This project (OCE) | ~20 | ~800 input / ~30 output | 0 tokens |
Key insight: A cold search on a 200+ file project costs only ~2000 input tokens. That's roughly $0.0001 on GPT-4o-mini.
OCE does NOT send full code to the LLM for scoring. Instead, it extracts a compact signature:
# Original 150-line class
class UserService:
"""Handles user authentication and session management."""
def __init__(self, db: Database, cache: Redis):
self.db = db
self.cache = cache
def authenticate(self, username: str, password: str) -> User:
# ... 50 lines of implementation
def create_session(self, user: User) -> Session:
# ... 40 lines of implementation
# ... 50 more lines
# What LLM actually sees (extract_signature output):
# ─────────────────────────────────────────────────
# [0] src/services/user.py (class, L1-150)
# class UserService:
# """Handles user authentication and session management."""
#
# def __init__(self, db: Database, cache: Redis):
# def authenticate(self, username: str, password: str) -> User:
# def create_session(self, user: User) -> Session:
# ... (142 more lines)Extraction rules by chunk type:
| Chunk Type | Lines Sent | What's Included |
|---|---|---|
| function/method | First 8 lines | Signature + docstring |
| class/struct/interface | Up to 12 key lines | Declaration + field definitions |
| export | First 10 lines | Export statement + signature |
| block | First 8 lines | Opening context |
Token savings: A 150-line class becomes ~15 tokens for scoring. That's 90% reduction before even hitting the cache.
OCE uses a chunk-level semantic cache, not a query-level cache. This is the key difference.
┌─────────────────────────────────────────────────────────────┐
│ Search Request │
│ "Find where admin users are created" │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: File Chunk Cache (SQLite: file_cache) │
│ ───────────────────────────────────────────────────────── │
│ Key: file_path │
│ Value: { hash: MD5(file_content), chunks: [...] } │
│ │
│ HIT: File unchanged → Skip re-parsing │
│ MISS: Re-chunk file → Update cache │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Score Cache (SQLite: score_cache) │
│ ───────────────────────────────────────────────────────── │
│ Key: (query_key, chunk_hash) │
│ query_key = MD5(sorted(keywords)) │
│ chunk_hash = MD5(code_content) │
│ │
│ HIT: Same keywords + Same code → Return cached score │
│ MISS: Call LLM for scoring → Update cache │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Result: Only UNCACHED chunks trigger LLM calls │
└─────────────────────────────────────────────────────────────┘
# Query key: Based on KEYWORDS, not exact query text
query_key = MD5(",".join(sorted(["admin", "create", "user"])))
# These queries produce the SAME query_key:
# - "Find where admin users are created"
# - "Show me user creation for admin accounts"
# - "admin user create logic"Result: Semantically similar queries share cache entries.
# Chunk hash: Based on CODE CONTENT
chunk_hash = MD5(code_block_content)
# Same code = Same hash, regardless of:
# - File path changes (moved files still hit cache)
# - Query variations (different queries, same code = hit)Cold Search (First Time):
tokens = num_chunks_to_score × avg_prompt_size
≈ 150 chunks × 15 tokens/chunk
≈ 2000-2500 tokens
Hot Search (Cache Hit):
tokens = 0 ← No LLM calls needed
Partial Cache (Some Files Changed):
tokens = num_NEW_chunks × avg_prompt_size
≈ (only changed files) × 15 tokens/chunk
| Scenario | Token Cost | Explanation |
|---|---|---|
| Same query, same codebase | 0 | 100% cache hit |
| Similar query (same keywords) | 0 | Keywords match → cache hit |
| Query after editing 1 file | ~50 | Only new chunks scored |
Query after git pull (10 files changed) |
~300 | Only changed files re-scored |
| Completely new query topic | ~2000 | Full scoring, but cached for next time |
Traditional approach: Cache (exact_query_string) → result
Problem: "Find admin creation" and "Where are admins created" are different strings but same intent.
OCE approach: Cache (keywords, code_hash) → score
Benefit:
- Keyword normalization increases hit rate
- Code-level granularity means partial updates are cheap
- Similar queries benefit from each other's cache
| Concern | Reality |
|---|---|
| "200 files = expensive" | 200 files ≈ 2000 tokens cold, 0 tokens hot |
| "Every search costs money" | Only first search for each keyword set costs |
| "Cache invalidation issues" | Content-hash based → automatic invalidation on change |
| "Memory overhead" | SQLite file < 1MB for 10,000 chunks |
The math: If you search 100 times/day on the same project with varied queries, you'll hit cache 90%+ of the time. Daily cost ≈ $0.001.
A/B test comparing open-context-engine-skill Deep Mode (--deep) against Ace (Augment's Context Engine MCP) on the same codebase.
| # | Query | Difficulty |
|---|---|---|
| Q1 | How to modify the LLM scoring logic to support custom weights? | Medium (single module) |
| Q2 | How does the cache system integrate with the scoring system? | Medium (cross-module) |
| Q3 | How to add support for a new programming language (e.g., Elixir)? | Easy (extension point) |
| Dimension | Ace | OCE Deep |
|---|---|---|
| Files Returned | 7 snippets (context_ranker, search_context, context_expander, README, config, cache_manager) | 5 blocks (context_ranker only) |
| Core Hits | rank_chunks, build_prompt, parse_scores |
rank_chunks(9), parse_scores(8), build_prompt(8), quick_score(7) |
| Noise | Includes context_expander, config.py, README | Zero noise |
| Tokens | ~4000 | 1827 |
| Dimension | Ace | OCE Deep |
|---|---|---|
| Files Returned | 5 complete file snippets | 2 blocks (2 files) |
| Core Hits | Full CacheManager class, full rank_chunks | rank_chunks(9), CacheManager(8) |
| Integration Point | Requires reading large code blocks | Directly shows cache integration |
| Tokens | ~4500 | 2040 |
| Dimension | Ace | OCE Deep |
|---|---|---|
| Files Returned | 4 files (code_chunker complete, file_collector, SKILL, README) | 3 blocks (code_chunker only) |
| Core Hits | LANGUAGE_PATTERNS, EXT_TO_LANGUAGE (buried in 400+ lines) | LANGUAGE_PATTERNS(8), chunk_file(8), EXT_TO_LANGUAGE(6) |
| Extension Points | Must search through large files | 3 precise modification locations |
| Tokens | ~3000 | 1770 |
| Dimension | Ace | OCE Deep | Winner |
|---|---|---|---|
| Precision | B (broad coverage, manual filtering needed) | A+ (surgical targeting) | OCE Deep |
| Noise Control | C (includes docs, configs) | A+ (zero noise) | OCE Deep |
| Context Completeness | A (full call chains) | B+ (core + smart expansion) | Ace (slightly) |
| Token Efficiency | C (~3833 avg) | A+ (~1879 avg) | OCE Deep |
| LLM Friendliness | B (requires extensive reading) | A+ (immediately actionable) | OCE Deep |
| Query | Ace (est.) | OCE Deep | Savings |
|---|---|---|---|
| Q1 | ~4000 | 1827 | 54% |
| Q2 | ~4500 | 2040 | 55% |
| Q3 | ~3000 | 1770 | 41% |
| Avg | ~3833 | 1879 | ~51% |
Deep mode achieves 100% accuracy across all test queries:
| Query | Core Hit Rate | Noise Rate | Verdict |
|---|---|---|---|
| Q1: LLM Scoring | 100% | 0% | All returned blocks are actual modification points |
| Q2: Cache Integration | 100% | 0% | Directly shows CacheManager calls inside rank_chunks |
| Q3: New Language | 100% | 0% | Pinpoints exact 3 locations to modify |
Q1 Breakdown:
| Returned Block | Score | Is Core? |
|---|---|---|
rank_chunks() |
9 | Core - Main scoring entry point |
parse_scores() |
8 | Core - Parses LLM response |
build_prompt() |
8 | Core - Builds scoring prompt |
quick_score() |
7 | Related - Pre-scoring logic |
Q3 Breakdown:
| Returned Block | Score | Action Required |
|---|---|---|
LANGUAGE_PATTERNS |
8 | Add Elixir regex patterns |
chunk_file() |
8 | Handle .ex extension |
EXT_TO_LANGUAGE |
6 | Map .ex → elixir |
Why Deep Mode Uses FEWER Tokens (Counter-intuitive!)
Deep mode is NOT "return more context" — it's "return more precise context".
The expansion logic is designed with intelligent restraint:
# Only expand when top chunks score >= 6
top_chunks = [c for c in chunks if c.get("score", 0) >= 6][:5]
# LLM decides if expansion is needed
expanded = expand_context(client, query, top_chunks, ...)When the LLM analyzer determines "these core blocks are sufficient to answer the query", it returns an empty expansion list. This is correct behavior — smart restraint beats blind expansion.
OCE Deep Mode Advantages:
- 51% Token Savings: Precision beats volume
- Surgical Precision: Returns only the exact code blocks needed
- Zero Noise: No README, config, or unrelated files in results
- High Relevance Scores: Core functions consistently score 8-9
- Smart Expansion: Expands only when genuinely needed, stays lean otherwise
Ace Advantages:
- Complete file coverage helps when completely unfamiliar with project
- Full call chains are safer for very large refactoring efforts
| Use Case | Recommended Tool |
|---|---|
| Daily development queries | OCE Deep |
| Quick bug fixes | OCE Deep |
| Extension point lookup | OCE Deep |
| Cross-module integration | OCE Deep |
| Architecture deep-dive (new project) | Ace |
| Massive refactoring (100+ files) | Ace |
OCE provides seamless cross-language search capabilities. Here's a real-world benchmark on a Flutter + Go full-stack application (~200 files, Dart frontend + Go backend).
my_first_app/
├── lib/ # Flutter Frontend (Dart)
│ ├── main.dart # App entry point
│ ├── core/api_client.dart # Dio HTTP client
│ ├── data/auth_manager.dart # ChangeNotifier state
│ ├── services/*.dart # API service layer
│ └── pages/*.dart # UI components
└── server/ # Go Backend
├── main.go # HTTP server + routes
├── *_handler.go # Request handlers
├── models/*.go # GORM models
└── utils/*.go # Utilities
| Query | Blocks | Files | Tokens | Max Score | Highlights |
|---|---|---|---|---|---|
| Q1: App entry & initialization | 1 | 1 | 1021 | 9 | Precise hit on main() + ShanhaiApp |
| Q2: State management patterns | 13 | 8 | 1423 | 9 | Found all ChangeNotifier + setState |
| Q3: Network/API calls | 14 | 7 | 1848 | 9 | Cross-language: Dart client + Go handlers |
python scripts/search_context.py \
--project "/path/to/flutter_app" \
--query "Find the main entry point and app initialization flow"Result: Single block (1021 tokens) containing the complete initialization chain:
| Component | Description |
|---|---|
isDesktop |
Platform detection |
main() |
Window manager + ApiClient init |
ShanhaiApp |
MaterialApp configuration |
build() |
Theme + routing setup |
Result: 13 blocks across 8 files, covering:
| Pattern | Files Found |
|---|---|
ChangeNotifier singletons |
auth_manager.dart, record_manager.dart |
setState() usage |
login_page.dart, voice_feed_page.dart, etc. |
| Listener patterns | _onAuthChanged(), _onRecordsChanged() |
Result: 14 blocks from both Dart and Go code:
| Language | Files | Key Findings |
|---|---|---|
| Dart | 4 | ApiClient (Dio wrapper), user_service.dart, membership_service.dart |
| Go | 3 | GetRechargeOrdersHandler, ExchangeMembershipHandler, syncRechargeToBackend |
This demonstrates OCE's ability to understand full-stack request flows — from Flutter frontend through Go backend.
| Dimension | ACE | OCE | Winner |
|---|---|---|---|
| Token Efficiency | ~3500 avg | ~1430 avg | OCE (59% savings) |
| Cross-Language | Separate queries needed | Automatic | OCE |
| Granularity | File-level snippets | Block-level | OCE |
| Noise | Includes configs, READMEs | Zero noise | OCE |
- Cross-language intelligence: Single query returns both Dart and Go code
- Pattern recognition: Correctly identifies
ChangeNotifieras Flutter's state management - Block-level precision: Returns specific functions, not entire files
- High accuracy: All core blocks scored 8-9
Archived: Previous Benchmarks
| Query | Ace (est.) | OCE Standard | Savings |
|---|---|---|---|
| Q1 | ~4000 | 2074 | 48% |
| Q2 | ~4500 | 3625 | 19% |
| Q3 | ~3000 | 3105 | -3% |
| Avg | ~3833 | 2935 | ~23% |
| Query | Ace | OCE (early) | Savings |
|---|---|---|---|
| Q1 | ~4000 | 2673 | 33% |
| Q2 | ~4500 | 3207 | 29% |
| Q3 | ~3000 | 944 | 69% |
| Avg | ~3833 | 2275 | ~40% |