About this project
PitchA drop-in, open-source proxy that brings military-grade observability, security, and smart caching to your autonomous AI agents. Synapse Proxy sits gracefully between your application and any OpenAI-compatible LLM provider. Its primary mission is to empower developers with Deep Agentic Observability and a Smart Firewall, keeping rogue agent loops in check, protecting sensitive data, and making multi-turn LLM interactions entirely visible and measurable. While it actively protects your infrastructure and monitors your agents' intents, Synapse Proxy quietly optimizes token usage in the background with a four-tier cache pipeline (L0 to L3), ensuring you never pay twice for the same agentic thought process. 🛡️ Agentic Firewall & Security When building autonomous agents (AutoGPT, LangChain, custom loops), the biggest risk is infinite loops, runaway costs, and prompt injections. Synapse Proxy introduces a robust Firewall specifically designed for AI agents: Loop Kill Switch & Self-Correction: Detects when an agent is drifting into an infinite loop (repeating identical tool payloads). It intercepts the execution and returns a mock OpenAI-compatible chat completion response (HTTP 200) containing a descriptive self-correction warning to guide the agent to change strategy without crashing the process. Tool Allowlisting & Fingerprinting: Lock down your agent's capabilities. If an agent hallucinates a tool or tries to invoke an unauthorized function, the Proxy actively blocks the request. Granular Tool Cache TTLs: Configure custom cache durations per tool name (including setting TTL to 0s to disable caching for specific stateful tools). PII Redaction: Native regex-based masking of sensitive data (Emails, API Keys, Phone Numbers) before the prompt ever reaches the upstream provider. Session Circuit Breaker: Define strict prompt-token limits per session to cap expenditures on a per-task basis. 📊 Deep Telemetry & Intent Observability Every request is persisted to a PostgreSQL database, turning black-box agent behavior into a transparent, analyzable flow via our stunning Next.js Control Plane. Local AI Intent Classification: We use @xenova/transformers (running locally, 100% offline) to asynchronously classify every prompt intent (coding, rag, chat, extraction) without adding a single millisecond of latency to the critical proxy path. Session Replay Timeline: Inspect agent interactions step-by-step. Reconstruct the agent's flow, tool calls, and payload latency across a unified visual timeline. System Prompt Diffing: Agents sometimes rewrite their own instructions mid-session. The proxy extracts and diffs the system prompt, highlighting exactly what changed in the dashboard. Context Window Tracker: A dynamic graph comparing the Original Prompt Tokens against the L3 Compressed Tokens over time, demonstrating exactly how context grows and how Synapse Proxy mitigates it. A/B Benchmark: Toggle benchmark mode to fire control and optimized requests in parallel, using an LLM judge to score response similarity. ⚡ Cost Optimization as a Bonus Though security and observability take center stage, Synapse Proxy features a state-of-the-art caching engine designed to minimize latency and token waste. Drop-in OpenAI replacement: No SDK changes required. Just point your client at http://<host>:8080/v1 with an Authorization: Bearer sk-opti-... virtual key. Four caches in one binary: L0 In-flight Dedup: Blocks and deduplicates identical concurrent requests (useful for agent fan-outs). L1 Exact Match: Ultra-fast SHA-256 match for scripts retrying the exact same query. L2 Semantic Match: ONNX-based vector search (MiniLM) for conceptually identical queries. Auto-disabled on multi-turn conversations to prevent state corruption. L3 Prefix-Preserving Compression: Intelligently prunes stale <thought> blocks, truncates oversized tool outputs, and condenses older history. It maintains a byte-exact prefix so the upstream provider's native prompt cache remains 99% effective. Semantic Tool Deduplication: Intercepts LLM tool calls and retrieves cached outputs from similar prior invocations, bypassing client-side execution loops.








Discussion
0 commentsSign in to join the discussion.
No takes yet. Be the first to weigh in.