sidebar_position: 9 title: “Context Engine Plugins” description: “How to build a context engine plugin that replaces the built-in ContextCompressor”
Building a Context Engine Plugin
Context engine plugins replace the built-in ContextCompressor with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.
How it works
The agent’s context management is built on the ContextEngine ABC (agent/context_engine.py). The built-in ContextCompressor is the default implementation. Plugin engines must implement the same interface.
Only one context engine can be active at a time. Selection is config-driven:
# config.yaml
context:
engine: "compressor" # default built-in
engine: "lcm" # activates a plugin engine named "lcm"
Plugin engines are never auto-activated — the user must explicitly set context.engine to the plugin’s name.
Directory structure
Each context engine lives in plugins/context_engine/<name>/:
plugins/context_engine/lcm/
├── __init__.py # exports the ContextEngine subclass
├── plugin.yaml # metadata (name, description, version)
└── ... # any other modules your engine needs
The ContextEngine ABC
Your engine must implement these required methods:
from agent.context_engine import ContextEngine
class LCMEngine(ContextEngine):
@property
def name(self) -> str:
"""Short identifier, e.g. 'lcm'. Must match config.yaml value."""
return "lcm"
def update_from_response(self, usage: dict) -> None:
"""Called after every LLM call with the usage dict.
Update self.last_prompt_tokens, self.last_completion_tokens,
self.last_total_tokens from the response.
"""
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Return True if compaction should fire this turn."""
def compress(self, messages: list, current_tokens: int = None) -> list:
"""Compact the message list and return a new (possibly shorter) list.
The returned list must be a valid OpenAI-format message sequence.
"""
Class attributes your engine must maintain
The agent reads these directly for display and logging:
last_prompt_tokens: int = 0
last_completion_tokens: int = 0
last_total_tokens: int = 0
threshold_tokens: int = 0 # when compression triggers
context_length: int = 0 # model's full context window
compression_count: int = 0 # how many times compress() has run
Optional methods
These have sensible defaults in the ABC. Override as needed:
| Method | Default | Override when |
|---|---|---|
on_session_start(session_id, **kwargs) | No-op | You need to load persisted state (DAG, DB) |
on_session_end(session_id, messages) | No-op | You need to flush state, close connections |
on_session_reset() | Resets token counters | You have per-session state to clear |
update_model(model, context_length, ...) | Updates context_length + threshold | You need to recalculate budgets on model switch |
get_tool_schemas() | Returns [] | Your engine provides agent-callable tools (e.g., lcm_grep) |
handle_tool_call(name, args, **kwargs) | Returns error JSON | You implement tool handlers |
should_compress_preflight(messages) | Returns False | You can do a cheap pre-API-call estimate |
get_status() | Standard token/threshold dict | You have custom metrics to expose |
Engine tools
Context engines can expose tools the agent calls directly. Return schemas from get_tool_schemas() and handle calls in handle_tool_call():
def get_tool_schemas(self):
return [{
"name": "lcm_grep",
"description": "Search the context knowledge graph",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"],
},
}]
def handle_tool_call(self, name, args, **kwargs):
if name == "lcm_grep":
results = self._search_dag(args["query"])
return json.dumps({"results": results})
return json.dumps({"error": f"Unknown tool: {name}"})
Engine tools are injected into the agent’s tool list at startup and dispatched automatically — no registry registration needed.
Registration
Via directory (recommended)
Place your engine in plugins/context_engine/<name>/. The __init__.py must export a ContextEngine subclass. The discovery system finds and instantiates it automatically.
Via general plugin system
A general plugin can also register a context engine:
def register(ctx):
engine = LCMEngine(context_length=200000)
ctx.register_context_engine(engine)
Only one engine can be registered. A second plugin attempting to register is rejected with a warning.
Lifecycle
1. Engine instantiated (plugin load or directory discovery)
2. on_session_start() — conversation begins
3. update_from_response() — after each API call
4. should_compress() — checked each turn
5. compress() — called when should_compress() returns True
6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)
on_session_reset() is called on /new or /reset to clear per-session state without a full shutdown.
Configuration
Users select your engine via hermes plugins → Provider Plugins → Context Engine, or by editing config.yaml:
context:
engine: "lcm" # must match your engine's name property
The compression config block (compression.threshold, compression.protect_last_n, etc.) is specific to the built-in ContextCompressor. Your engine should define its own config format if needed, reading from config.yaml during initialization.
Testing
from agent.context_engine import ContextEngine
def test_engine_satisfies_abc():
engine = YourEngine(context_length=200000)
assert isinstance(engine, ContextEngine)
assert engine.name == "your-name"
def test_compress_returns_valid_messages():
engine = YourEngine(context_length=200000)
msgs = [{"role": "user", "content": "hello"}]
result = engine.compress(msgs)
assert isinstance(result, list)
assert all("role" in m for m in result)
See tests/agent/test_context_engine.py for the full ABC contract test suite.
See also
- Context Compression and Caching — how the built-in compressor works
- Memory Provider Plugins — analogous single-select plugin system for memory
- Plugins — general plugin system overview