Technology Stack¶
The choices behind OpenAgentMesh: why NATS as the single infrastructure dependency, why Pydantic for contracts, and how the architecture maps to patterns you already know from service meshes.
Why NATS¶
Most agent frameworks cobble together separate systems for messaging, service discovery, and state management. OAM uses NATS, a single deployment that provides everything the mesh needs:
from openagentmesh import AgentMesh
# One connection string. Everything included.
mesh = AgentMesh("nats://mesh.company.com:4222")
What NATS provides out of the box:
| Capability | What OAM uses it for |
|---|---|
| Pub/sub | Agent event fan-out, real-time notifications |
| Request/reply | Synchronous agent invocation (mesh.call()) |
| Queue groups | Automatic load balancing across agent instances |
| KV store | Contract registry, agent catalog |
| Object store | Shared workspace for artifacts between agents |
No Consul for discovery. No Redis for state. No RabbitMQ for messaging. No Nginx for load balancing. One binary, one connection, all primitives.
Sub-millisecond latency
NATS routes messages in microseconds. Agent-to-agent invocation overhead is negligible compared to LLM inference time.
Queue groups: free load balancing¶
When multiple instances of the same agent connect to the mesh, NATS automatically distributes requests across them. No configuration, no load balancer, no code changes:
# Deploy 3 instances of the same agent -- NATS handles the rest
@mesh.agent(AgentSpec(name="nlp.summarizer", description="..."))
async def summarize(req: SummarizeInput) -> SummarizeOutput:
...
# Consumers don't know or care how many instances exist
result = await mesh.call("nlp.summarizer", payload)
Why Pydantic v2¶
Every agent on the mesh publishes a typed contract with input/output JSON Schemas. Pydantic v2 generates these schemas directly from Python type hints:
from pydantic import BaseModel
class SummarizeInput(BaseModel):
text: str
max_length: int = 200
class SummarizeOutput(BaseModel):
summary: str
token_count: int
@mesh.agent(AgentSpec(name="nlp.summarizer", description="..."))
async def summarize(req: SummarizeInput) -> SummarizeOutput:
...
# The decorator introspects type hints, generates JSON Schemas,
# and publishes them to the registry. No manual schema authoring.
What you get from Pydantic:
- JSON Schema generation from type hints: schemas that LLMs can consume for tool selection
- Runtime validation at the mesh boundary: malformed requests are rejected before reaching your handler
- Serialization/deserialization: the SDK handles JSON encoding automatically
Why protocol-first¶
The NATS protocol is the product. The Python SDK is a convenience layer.
Any NATS client in any language can participate in the mesh by following the subject naming conventions and message envelope format. A Go service, a Rust CLI tool, or a Node.js application can register agents, discover the catalog, and invoke other agents. No SDK required.
# NATS subjects (any NATS client can use these directly)
mesh.agent.{channel}.{name} # invocation
mesh.agent.{channel}.{name}.events # publisher events
mesh.stream.{request_id} # streaming chunks
mesh.errors.{channel}.{name} # dead-letter errors
mesh.results.{request_id} # async callback replies
# KV buckets (contract storage, not subjects)
mesh-registry: {channel}.{name} # full contract per agent
mesh-catalog: catalog # lightweight agent index
This means OAM is not locked to Python. The protocol is language-agnostic by design.
The service mesh analogy¶
If you've built microservices with Istio or Linkerd, the architecture will feel familiar:
| Service Mesh Concept | OAM Equivalent |
|---|---|
| Service registry (Consul, etcd) | NATS KV contract registry |
| Service endpoint | NATS subject (mesh.agent.{channel}.{name}) |
| Load balancer | NATS queue groups (built-in) |
| DNS / service discovery | mesh.discover() / mesh.catalog() |
| Sidecar proxy (Envoy) | SDK (validation, serialization, tracing, health) |
| Sidecar middleware | SDK middleware hooks |
| Shared filesystem | NATS Object Store via mesh.workspace |
The key difference: service meshes route based on network rules (URLs, headers, IP ranges). OAM routes based on semantic understanding: what the agent does, what it accepts, and whether it's the right fit for a given task.
Same code, any scale¶
The agent code is identical whether you're running locally or across a multi-region cluster:
# Development: connect to oam mesh up
mesh = AgentMesh()
# Production: shared NATS infrastructure
mesh = AgentMesh("nats://mesh.company.com:4222")
Run oam mesh up to start a local development server with JetStream and KV pre-configured. Your agent code doesn't change. Your interaction patterns don't change. The only thing that changes is the connection string.
Not two modes, one continuum
Local and production are endpoints on the same architecture. Moving from one developer experimenting locally to a team sharing a NATS cluster requires changing one line of code.