0010. External Agent Routing API

Status: Proposed

Date: August 03, 2025

Categories:

Architecture Technology

0010. External Agent Routing API

Date: 2025-01-03 Status: Accepted Deciders: Platform Systems Architect, UX Research Expert, Async Rust Expert, Observability Expert

Context and Problem Statement

Caxton needs to support “externally routable agents” - allowing external API clients to invoke agents asynchronously. This requires designing:

API patterns for agent invocation
Concurrency model for multiple requests to the same agent
Async behavior handling for long-running tasks
Security standards for external access
Observability for production debugging

The solution must align with Caxton’s application server architecture and “3 AM debugging” philosophy.

Decision Drivers

Production Debugging: API responses must contain enough context for independent troubleshooting
Developer Experience: API should match developer mental models for service invocation
Performance: Target < 1ms overhead for local API calls
Security: Support industry standards from development to production
Observability: Full request lifecycle visibility with correlation IDs
Concurrency: Handle multiple clients calling the same agent safely
WebAssembly Integration: Work within WASM execution constraints

Considered Options

API Design Options

Option A: Single REST Endpoint

/api/v1/route/{agent_id} with POST payload
Simple but implies infrastructure routing
Misaligns with developer mental models

Option B: Resource-Oriented REST

/api/v1/agents/{agent_id}/invoke for invocation
/api/v1/jobs/{job_id} for async job tracking
Matches developer expectations for service calls

Option C: GraphQL

Single endpoint with query flexibility
Complex for simple agent invocation use cases
Adds unnecessary complexity for core use case

Option D: Dual Protocol

gRPC primary for performance (ExternalAgentRouter service)
REST gateway for accessibility and ecosystem compatibility
Best of both worlds approach

Concurrency Models

Option A: Shared Agent Instances

Single agent instance handles multiple requests
Complex state management and potential conflicts
Poor fault isolation

Option B: Request-Per-Instance

New agent instance for each request
High resource overhead and slow startup
Simple but wasteful

Option C: Actor-Per-Agent

Dedicated tokio task per agent with message queues
Natural back-pressure through bounded channels
Good fault isolation and resource management

Security Approaches

Option A: API Keys Only

Simple for development
Insufficient for production requirements
No fine-grained access control

Option B: OAuth2/JWT

Industry standard for external APIs
Complex setup for simple use cases
Good ecosystem compatibility

Option C: Progressive Security

API keys for development (cax_dev_{random}_{clientname})
mTLS + RBAC for production
Matches user journey from dev to production

Decision Outcome

Chosen: Dual Protocol + Actor-Per-Agent + Progressive Security

API Design

Primary: gRPC ExternalAgentRouter service with methods:
- InvokeSync() - synchronous agent calls
- InvokeAsync() - asynchronous with job tracking
- StreamInvoke() - streaming responses
Secondary: REST gateway at /api/v1/agents/{agent_id}/invoke
Job Tracking: /api/v1/jobs/{job_id} for async status/results

Concurrency Model

Actor-per-agent with dedicated tokio tasks
Bounded MPSC channels for natural back-pressure
Hierarchical cancellation using tokio::CancellationToken
Multi-layer protection: global semaphore, rate limiting, circuit breakers

Async Behavior

Job lifecycle: SUBMITTED → QUEUED → ASSIGNED → RUNNING → COMPLETED
Progress tracking with estimated completion times
Configurable TTL for job results storage
WebAssembly cooperative scheduling using fuel-based yield points

Security Model

Development: API keys with structure cax_dev_{random}_{clientname}
Production: mTLS client certificates with RBAC authorization
Rate limiting with standard HTTP headers (X-RateLimit-*)
CORS support for browser-based clients

Observability Strategy

OpenTelemetry spans covering full request lifecycle
Structured error responses following What/Why/How/Debug pattern
Correlation IDs in all logs and traces
Canonical log line per request with complete debugging context
Four golden signals metrics with high-cardinality dimensions:
- Latency (p50, p90, p99, p99.9)
- Traffic (requests/sec by agent, route, user)
- Errors (rate by type, agent, cause)
- Saturation (queue depth, CPU, memory)

Consequences

Positive

Self-debugging API: Responses contain enough context for independent troubleshooting
Production ready: Comprehensive observability and error handling
Performance optimized: gRPC primary path with <1ms overhead target
Developer friendly: REST gateway for ecosystem compatibility
Scalable: Actor model handles concurrency naturally
Secure: Progressive security matches deployment patterns

Negative

Complexity: Dual protocol increases implementation complexity
Resource usage: Actor-per-agent model uses more memory
Learning curve: gRPC may be unfamiliar to some developers
Configuration: Security model requires proper operational setup

Risks and Mitigations

Risk: WebAssembly execution blocking event loop
- Mitigation: Fuel-based cooperative scheduling with yield points
Risk: Resource exhaustion under load
- Mitigation: Multi-layer protection with bounded queues and circuit breakers
Risk: Security misconfiguration
- Mitigation: Secure defaults and clear configuration documentation

Implementation Notes

Phase 1: Core External Routing

gRPC service definition and server implementation
Actor-per-agent concurrency model
Basic job tracking and lifecycle management
API key authentication for development

Phase 2: Production Features

REST gateway via grpc-gateway
mTLS and RBAC authorization
Advanced observability and debugging APIs
Performance optimizations and benchmarking

Phase 3: Advanced Patterns

Streaming invocation patterns
Batch job processing
Advanced rate limiting and quotas
Integration with cloud provider auth systems

0010. External Agent Routing API

Context and Problem Statement

Decision Drivers

Considered Options

API Design Options

Concurrency Models

Security Approaches

Decision Outcome

API Design

Concurrency Model

Async Behavior

Security Model

Observability Strategy

Consequences

Positive

Negative

Risks and Mitigations

Implementation Notes

Phase 1: Core External Routing

Phase 2: Production Features

Phase 3: Advanced Patterns

Links