Vibe2Prod (FORGE engine).
A local CLI and web platform that audits AI-written code for security, reliability, and architectural issues before it ships. Three AI agents, forty-seven deterministic checks, forty-nine custom SAST rules, roughly twenty-one cents per scan.
the problem
AI-assisted development made it trivial to ship code that looks right, runs locally, and fails in production. The usual tools do not catch the new failure modes. Standard linters miss architectural drift. Traditional SAST scanners miss the patterns LLMs specifically generate. Human review does not scale to the velocity AI coding enables. I kept seeing clean-looking repos ship with broken auth boundaries, missing retry logic, and silent security gaps that nobody would find until a real user triggered them. I wanted a tool that treated AI-written code as its own category and audited it accordingly.
the approach
Built FORGE as the audit engine and Vibe2Prod as the product wrapping it. The engine runs a three-agent pipeline: a Codebase Analyst maps the architecture, entry points, and data flows; a Security Auditor runs three parallel passes against the OWASP ASVS rubric; a Fix Strategist builds the remediation plan. Opengrep handles the deterministic SAST layer with forty-nine custom rules. A separate evaluation step runs forty-seven deterministic checks across seven dimensions: security, reliability, maintainability, test quality, performance, documentation, operations. The output is a Production Readiness Score with band ratings from A to F, plus a fingerprinted finding list so issues stay stable across scans and the delta report shows what is new, recurring, or regressed.
On the infrastructure side, model routing goes through OpenRouter so the engine stays model-agnostic. MiniMax handles the architectural mapping, Haiku handles the security and remediation work. Scans run for about twenty-one cents. The CLI ships as pip install vibe2prod with a /forge skill for Claude Code that turns the scan report into autonomous fixes with micro-commits. The web platform adds team management, GitHub OAuth with encrypted token storage, wallet-based billing via Stripe, and BYOK support for teams that want to use their own OpenRouter keys. Code stays local during CLI scans. Only the LLM API calls leave the machine.
Built the MCP server for Claude Code integration, the /forge skill for autonomous remediation, and the evaluation engine from scratch. Every piece of infrastructure, from the fingerprinting system to the baseline tracking to the .forgeignore suppression format, was designed to hold up under repeated use on real codebases over time, not just impress on a first scan.
the outcome
In active use as my own audit layer on every project I build. The portfolio site you are reading was scanned with it. Vibe2Prod scanned itself. Building it taught me more about production AI systems than any project before it: prompt design at scale, cost engineering against real budgets, deterministic-plus-probabilistic pipelines, MCP server design, Claude Code skill authoring, multi-tenant SaaS architecture with strict data isolation, and the specific discipline of building a tool that has to work reliably on other people’s codebases, not just mine. CLI and web platform both targeting public launch before end of 2026.
the lessons
The core insight behind FORGE is that LLMs alone cannot audit code reliably, and deterministic tools alone cannot catch architectural or contextual problems. The value is in the seam between them. Building the engine taught me that most production AI systems live or die on that seam: what you hand to the LLM, what you hand to deterministic logic, and how the two communicate. Also learned that fingerprinting and baseline tracking matter more than any single scan's output. The real product value is watching a codebase improve across scans, not the first report.
stack
- Python 3.12 ·
- FastAPI ·
- Pydantic v2 ·
- httpx ·
- Typer (CLI) ·
- Next.js 16 ·
- React 19 ·
- TypeScript ·
- Tailwind 4 ·
- shadcn/ui ·
- TanStack Query ·
- Clerk ·
- Supabase (PostgreSQL + RLS) ·
- Stripe ·
- Docker Compose ·
- OpenRouter (model routing) ·
- Anthropic Claude ·
- MiniMax ·
- Daytona (sandboxes) ·
- MCP (Claude Code integration) ·
- Opengrep (SAST)