Performance Benchmarking Guide
Performance Benchmarking Suite
Overview
This guide establishes Caxton’s performance benchmarking framework, including baseline metrics, continuous performance testing, and regression detection strategies.
Performance Targets
Core Metrics
| Metric | Target | Acceptable | Critical | |——–|——–|————|———-| | Message Throughput | 100K msg/sec | 50K msg/sec | 10K msg/sec | | Message Latency (p50) | < 1ms | < 5ms | < 10ms | | Message Latency (p99) | < 10ms | < 50ms | < 100ms | | Agent Spawn Time | < 100ms | < 500ms | < 1s | | Memory per Agent | < 10MB | < 50MB | < 100MB | | CPU per Agent | < 5% | < 10% | < 25% | | Recovery Time | < 30s | < 2min | < 5min |
Benchmark Categories
1. Throughput Benchmarks
Message Processing Throughput
#[bench]
fn bench_message_throughput(b: &mut Bencher) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let orchestrator = create_test_orchestrator();
let messages = generate_messages(10000);
b.iter(|| {
runtime.block_on(async {
for msg in &messages {
orchestrator.route_message(msg.clone()).await.unwrap();
}
})
});
let throughput = (10000.0 * 1_000_000_000.0) / b.ns_per_iter() as f64;
println!("Throughput: {:.0} messages/sec", throughput);
}
Agent Task Processing
#[bench]
fn bench_task_processing(b: &mut Bencher) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let agent_pool = create_agent_pool(10);
let tasks = generate_tasks(1000);
b.iter(|| {
runtime.block_on(async {
let futures: Vec<_> = tasks
.iter()
.map(|task| agent_pool.process_task(task.clone()))
.collect();
futures::future::join_all(futures).await
})
});
}
2. Latency Benchmarks
End-to-End Message Latency
pub struct LatencyBenchmark {
samples: Vec<Duration>,
}
impl LatencyBenchmark {
pub async fn measure_e2e_latency(&mut self, iterations: usize) {
for _ in 0..iterations {
let start = Instant::now();
// Send request
let response = self.send_request_response().await;
let latency = start.elapsed();
self.samples.push(latency);
}
}
pub fn calculate_percentiles(&self) -> PercentileReport {
let mut sorted = self.samples.clone();
sorted.sort();
PercentileReport {
p50: sorted[sorted.len() / 2],
p90: sorted[sorted.len() * 9 / 10],
p95: sorted[sorted.len() * 95 / 100],
p99: sorted[sorted.len() * 99 / 100],
p999: sorted[sorted.len() * 999 / 1000],
max: sorted[sorted.len() - 1],
}
}
}
Agent Response Time
#[bench]
fn bench_agent_response_time(b: &mut Bencher) {
let agent = create_test_agent();
let request = create_test_request();
b.iter(|| {
let start = Instant::now();
agent.handle_request(&request);
start.elapsed()
});
}
3. Scalability Benchmarks
Horizontal Scaling Test
pub async fn benchmark_horizontal_scaling() -> ScalingReport {
let mut report = ScalingReport::new();
for agent_count in [1, 10, 100, 1000, 10000] {
let orchestrator = create_orchestrator();
// Spawn agents
let spawn_time = measure_time(|| {
spawn_agents(&orchestrator, agent_count)
}).await;
// Measure throughput at scale
let throughput = measure_throughput(&orchestrator).await;
// Measure resource usage
let resources = measure_resources(&orchestrator).await;
report.add_data_point(agent_count, spawn_time, throughput, resources);
}
report
}
Load Testing
pub struct LoadTest {
target_rps: u32,
duration: Duration,
ramp_up: Duration,
}
impl LoadTest {
pub async fn run(&self) -> LoadTestReport {
let mut report = LoadTestReport::new();
let start = Instant::now();
while start.elapsed() < self.duration {
let current_rps = self.calculate_current_rps(start.elapsed());
// Generate load
let responses = self.generate_load(current_rps).await;
// Record metrics
report.record_interval(IntervalMetrics {
timestamp: Instant::now(),
requests_sent: current_rps,
responses_received: responses.len() as u32,
errors: count_errors(&responses),
latencies: extract_latencies(&responses),
});
tokio::time::sleep(Duration::from_secs(1)).await;
}
report
}
}
4. Resource Usage Benchmarks
Memory Profiling
pub struct MemoryBenchmark {
allocator: StatsAllocator,
}
impl MemoryBenchmark {
pub async fn profile_agent_memory(&self) -> MemoryProfile {
let initial = self.allocator.stats();
let agent = Agent::new();
let after_creation = self.allocator.stats();
// Process messages
for _ in 0..1000 {
agent.process_message(generate_message()).await;
}
let after_processing = self.allocator.stats();
// Create checkpoint
agent.create_checkpoint().await;
let after_checkpoint = self.allocator.stats();
MemoryProfile {
creation_overhead: after_creation - initial,
processing_overhead: after_processing - after_creation,
checkpoint_overhead: after_checkpoint - after_processing,
total: after_checkpoint - initial,
}
}
}
CPU Profiling
pub async fn profile_cpu_usage() -> CpuProfile {
let sampler = CpuSampler::new();
// Profile different operations
let idle = sampler.sample_duration(Duration::from_secs(10)).await;
let processing = sampler.sample_while(|| async {
process_messages(10000).await
}).await;
let spawning = sampler.sample_while(|| async {
spawn_agents(100).await
}).await;
CpuProfile {
idle_usage: idle,
processing_usage: processing,
spawning_usage: spawning,
}
}
5. WebAssembly Performance
WASM Execution Overhead
#[bench]
fn bench_wasm_overhead(b: &mut Bencher) {
let wasm_agent = create_wasm_agent();
let native_agent = create_native_agent();
let task = create_compute_task();
let wasm_time = bench_agent(&wasm_agent, &task);
let native_time = bench_agent(&native_agent, &task);
println!("WASM overhead: {:.2}x slower",
wasm_time.as_secs_f64() / native_time.as_secs_f64());
}
WASM Memory Limits
pub async fn test_wasm_memory_limits() -> MemoryLimitReport {
let mut report = MemoryLimitReport::new();
for memory_limit in [1, 10, 100, 1000] {
let agent = create_wasm_agent_with_limit(memory_limit * MB);
let max_allocation = find_max_allocation(&agent).await;
let performance = measure_performance_at_limit(&agent).await;
report.add_result(memory_limit, max_allocation, performance);
}
report
}
Continuous Benchmarking
CI/CD Integration
GitHub Actions Workflow
name: Performance Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Run Benchmarks
run: |
cargo bench --bench throughput -- --save-baseline current
cargo bench --bench latency -- --save-baseline current
cargo bench --bench scalability -- --save-baseline current
- name: Compare with Baseline
run: |
cargo bench --bench throughput -- --baseline main
cargo bench --bench latency -- --baseline main
cargo bench --bench scalability -- --baseline main
- name: Upload Results
uses: actions/upload-artifact@v2
with:
name: benchmark-results
path: target/criterion/
- name: Comment PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('target/criterion/report.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: report
});
Regression Detection
pub struct RegressionDetector {
baseline: BenchmarkResults,
threshold: f64, // e.g., 0.1 for 10% regression
}
impl RegressionDetector {
pub fn detect_regressions(
&self,
current: &BenchmarkResults
) -> Vec<Regression> {
let mut regressions = Vec::new();
// Check throughput
if current.throughput < self.baseline.throughput * (1.0 - self.threshold) {
regressions.push(Regression {
metric: "throughput",
baseline: self.baseline.throughput,
current: current.throughput,
change_percent: calculate_change_percent(
self.baseline.throughput,
current.throughput
),
});
}
// Check latency
if current.p99_latency > self.baseline.p99_latency * (1.0 + self.threshold) {
regressions.push(Regression {
metric: "p99_latency",
baseline: self.baseline.p99_latency,
current: current.p99_latency,
change_percent: calculate_change_percent(
self.baseline.p99_latency,
current.p99_latency
),
});
}
regressions
}
}
Benchmark Scenarios
Scenario 1: Peak Load
pub async fn benchmark_peak_load() {
let config = LoadTestConfig {
agents: 1000,
messages_per_second: 100_000,
duration: Duration::from_secs(300),
message_size: 1024,
};
let results = run_load_test(config).await;
assert!(results.success_rate > 0.99);
assert!(results.p99_latency < Duration::from_millis(100));
}
Scenario 2: Sustained Load
pub async fn benchmark_sustained_load() {
let config = LoadTestConfig {
agents: 100,
messages_per_second: 10_000,
duration: Duration::from_hours(24),
message_size: 512,
};
let results = run_load_test(config).await;
assert!(results.memory_leak_detected == false);
assert!(results.performance_degradation < 0.05);
}
Scenario 3: Burst Traffic
pub async fn benchmark_burst_traffic() {
let config = BurstTestConfig {
baseline_rps: 1000,
burst_rps: 50000,
burst_duration: Duration::from_secs(10),
recovery_time_target: Duration::from_secs(30),
};
let results = run_burst_test(config).await;
assert!(results.handled_burst_successfully);
assert!(results.recovery_time < config.recovery_time_target);
}
Performance Optimization Guide
Optimization Strategies
1. Message Batching
pub struct MessageBatcher {
batch_size: usize,
batch_timeout: Duration,
buffer: Vec<Message>,
last_flush: Instant,
}
impl MessageBatcher {
pub async fn add_message(&mut self, msg: Message) {
self.buffer.push(msg);
if self.should_flush() {
self.flush().await;
}
}
fn should_flush(&self) -> bool {
self.buffer.len() >= self.batch_size ||
self.last_flush.elapsed() > self.batch_timeout
}
}
2. Connection Pooling
pub struct ConnectionPool {
connections: Vec<Arc<Connection>>,
next_connection: AtomicUsize,
}
impl ConnectionPool {
pub fn get_connection(&self) -> Arc<Connection> {
let index = self.next_connection.fetch_add(1, Ordering::Relaxed)
% self.connections.len();
self.connections[index].clone()
}
}
3. Zero-Copy Optimizations
pub fn process_message_zero_copy(data: &[u8]) -> Result<()> {
// Parse without allocation
let message = Message::parse_borrowed(data)?;
// Process in-place
process_in_place(&message)?;
// Forward without copying
forward_borrowed(&message)?;
Ok(())
}
Benchmark Reports
Report Format
# Performance Benchmark Report
Date: 2025-01-15
Commit: abc123def
Environment: Production-like
## Summary
- ✅ All performance targets met
- ⚠️ 2 minor regressions detected
- 📈 15% improvement in throughput
## Detailed Results
### Throughput
| Metric | Baseline | Current | Change |
|--------|----------|---------|--------|
| Messages/sec | 85,000 | 97,750 | +15% |
| Tasks/sec | 12,000 | 11,800 | -1.7% |
### Latency
| Percentile | Baseline | Current | Change |
|------------|----------|---------|--------|
| p50 | 0.8ms | 0.7ms | -12.5% |
| p99 | 9.2ms | 10.1ms | +9.8% |
### Resource Usage
| Resource | Baseline | Current | Change |
|----------|----------|---------|--------|
| Memory/agent | 8.5MB | 8.2MB | -3.5% |
| CPU/agent | 4.2% | 4.0% | -4.8% |
Monitoring Production Performance
Real-time Metrics
pub struct ProductionMonitor {
metrics: Arc<Metrics>,
alerting: Arc<AlertingService>,
}
impl ProductionMonitor {
pub async fn monitor(&self) {
loop {
let snapshot = self.metrics.snapshot();
// Check against SLOs
if snapshot.p99_latency > SLO_LATENCY {
self.alerting.trigger(Alert::HighLatency(snapshot.p99_latency));
}
if snapshot.error_rate > SLO_ERROR_RATE {
self.alerting.trigger(Alert::HighErrorRate(snapshot.error_rate));
}
tokio::time::sleep(Duration::from_secs(10)).await;
}
}
}
Best Practices
- Benchmark Early and Often
- Run benchmarks on every commit
- Track performance over time
- Set up alerts for regressions
- Use Representative Workloads
- Model real production patterns
- Include edge cases
- Test failure scenarios
- Isolate Variables
- Control environment
- Minimize noise
- Run multiple iterations
- Profile Before Optimizing
- Identify actual bottlenecks
- Measure impact of changes
- Avoid premature optimization
- Document Performance Characteristics
- Known limitations
- Scaling boundaries
- Optimization opportunities