Stop trusting ‘looks about right’: verifying AI UI against Figma

How I built figma-connect and verify_node: a local tool that renders an AI agent's generated code and pixel-diffs it against the live Figma design.

Published June 17, 2026
Reading Time 6 min read
Tags
Figma AI Agents Tooling
01The 10% That Hides

I do a lot of UI work, and like a lot of people lately I've been letting an AI agent take the first pass. Point it at a Figma file, let it write the components, come back to something that's 90% there. On a good day that's a huge time save.

The problem is the other 10%, and where it hides.

It's never an obvious break. It's the padding that's 12px instead of 16. A font weight that's 500 where the design says 600. A border radius that's a couple pixels off. A gradient that starts at the wrong stop. Each one tiny, but together they're the difference between "looks like the design" and "looks like someone who sort of saw the design once." And the only way I was catching any of it was opening the Figma frame and the browser side by side and squinting back and forth like it's a spot-the-difference puzzle.

That got old fast. What bugged me most was that the agent had no idea it was wrong. It would read the design, write the code, and confidently tell me it matched. It couldn't check its own work. Every Figma tool I tried could feed it data about a node, but none of them could answer the actual question: does the thing you just built look like the thing the designer drew?

So I stopped squinting and built the missing piece. It's a local tool called figma-connect, and the part I care about is one function: verify_node.

02The Actual Idea

verify_node takes the code the agent wrote, renders it in a real browser, and compares it pixel for pixel against the live Figma node. Pass or fail, with the diff image attached. That's it. The agent finally has a mirror.

There's a read side too (it can pull geometry, auto-layout, fills, type, tokens, components, the usual), but honestly the read part is table stakes. Plenty of tools do that. The verify part is the bit I hadn't seen anywhere, and it's the bit that changed how the agent behaves.

The whole thing runs on my laptop. Browser Figma works, so there's no desktop app, no cloud API, and no design files leaving my machine.

03How the Check Actually Works

Give it a node id and the candidate code. It mounts the code in headless Chromium with Playwright, exports the matching node from Figma, and diffs them. I run three different comparisons at once, because I tried each one alone first and each one lied to me in its own way.

Raw pixel diffing catches the most: the 4px shift, the wrong radius, the moved gradient. But it's hysterical about a one-pixel global offset, screaming that everything's broken when the whole thing just nudged sideways. So I layered SSIM on top, which scores structural similarity and tracks closer to what a human would actually call "close." And then a text and accessibility pass with axe-core, because more than once I had a render that was pixel-perfect and had quietly dropped a label or lowercased a heading. Looking right and being right are not the same thing, and I learned that the annoying way.

The output is a labeled EXPECTED / ACTUAL / DIFF image. I did that on purpose. A bare similarity score is something an agent will happily rationalize ("0.94, close enough!"). A picture of exactly what's wrong is not.

The real win is that "looks about right" stopped being good enough. The agent now has a gate it has to pass before it can call something done.

04The Stuff That Actually Ate My Week

The render-and-diff idea took an afternoon. Making it trustworthy took way longer, because a verifier that fails for dumb reasons is worse than no verifier at all. The first time it cried wolf, I stopped trusting it, and that defeats the whole point.

The first thing that got me was fonts. I kept getting failures on text that looked identical, and I burned an embarrassing amount of time before I realized I was screenshotting before the web fonts had loaded. The render was comparing a fallback font against the design's real font and flagging the difference. Gating the capture on document.fonts.ready killed that entire class of false failures.

Then there was the wait strategy. I was waiting on networkidle before capturing, which is fine until you hit a page with a long-poll or a streaming connection, and then it just never idles. The verification would hang forever. I ripped out the blanket wait and replaced it with explicit readiness signals.

The one I'm still not fully done with is fidelity versus budget. The summary of a node that I hand to the agent can't carry every property at full precision, or it blows the context window. So I had to make calls about what to keep exact and what to approximate, and then be honest about it. The digest now carries explicit flags for gradients, shadows, strokes, opacity, and masks, so the agent knows when a value is the real thing and when it's a best guess.

05Under the Hood, Briefly

Small pnpm monorepo. A Figma plugin lives in the file (it's the only thing that can actually read the document). A local bridge daemon indexes the file into SQLite with full-text search, updates itself as the design changes, and exposes everything over MCP. A separate harness does the rendering and diffing. The agent talks to the daemon through a little stdio shim so the file stays indexed between sessions. Around 15k lines of TypeScript, 35 tools, all read-only except the verify step, bound to localhost only.

06What It Still Can't Do

Being honest about the edges, because I hate posts that pretend their thing is finished.

The search is lexical, not semantic, so it matches words that literally appear in a layer's name or text. A vibes-based query won't find a generically named group. The digest is budgeted, hence the fidelity flags. And it only reads, it never writes back to Figma, on purpose.

If you've ever watched an AI spit out UI that's subtly, confidently wrong and had no way to catch it except your own eyes, this was my attempt at giving it the feedback loop it was missing.

Full writeup with screenshots and the architecture: https://www.arjunp.pro/projects/figma-connect.html