The autonomous agent
experimentation platform.
SquareDiff generates informed hypotheses to improve agent performance based on evaluation scores, traces, and frontier research. Then autonomously codes, deploys, and evaluates 100+ variants in parallel to discover the most optimal harness continuously.
ReAct vs CoT reasoning strategies
WINNERSkills toolkit vs MCP tool serving
Parallel subagent delegation patterns
Multi-agent: orchestrator + 3 workers
Memory compression + struct. evals
Built & advised by talent from

Generate new agent variants with multiple modes:
Import your existing evaluation criteria or build a new set with our eval generation suite.
We work closely with our customers to audit and establish base evals that define what great performance means for their unique agent.
Track the impact of every experiment. See accuracy, cost, and latency deltas with statistical significance, surface regressions instantly, and measure total improvement across your full experimentation history.
Ship winning variants as GitHub PRs in a single click. Stage, deploy, and roll back versions at any time.




Connect from any
agent framework
Get started in minutes. SquareDiff connects with the agent frameworks your team already uses, no migration or rewrites needed.