Designed a debugging feature that tests enterprise AI assistants and stops the moment something breaks. Shipped in a month and it cut manual debugging efforts by 70%.

Shipped

Oct 2022

Industry

B2B SaaS

Role

Product Designer

Team

CS Team, CTO, CEO, Developers

Tars is a B2B platform where enterprise AI agents are built as visual flowcharts. Each node is a single conversational turn called a gambit. An enterprise banking chatbot could have 500 of them, and when something breaks in a workflow that size, finding it manually means scrolling through hundreds of nodes and starting over.

I designed and shipped Debug Mode, an automated end-to-end test runner that highlights the active gambit, auto-scrolls the canvas, and stops the moment something breaks. I designed for two user types with fundamentally different technical backgrounds.

Shipped (full-release)

1 month

Troubleshooting time cut by

~70%

The shipped ‘Debug Mode’ feature, introduced to reduce testing time by the internal team and clients.

Background

500 gambits, one broken path, no way to find it.

During one of the product team standups, the CTO flagged a pain point. The CS team was spending hours debugging chatbots, often getting stuck trying to identify a broken gambit in a 500+ gambit canvas.

When I observed their workflow directly, I saw the real problem was mechanical. They'd manually step through the chatbot flow, lose track of where they were, scroll back to the beginning and then start over. The process was draining and error-prone.

Two distinct user groups: CS team with technical background and clients with no technical background

The constraint that emerged was non-negotiable. I needed to design one interface that could work for the CS team (who could read API errors and JSON) while also eventually scaling to clients with no technical background. Two completely different mental models supported by one feature.

Design Decision 01

The canvas was overwhelming with three different visual signals. One was enough.

Iteration 1, Color-coded states

The first version used colors, yellow for active, green for passed and red for error. This was logical on paper but the problem showed up on a real canvas. 500 gambits, some overlapping, now covered in green and yellow and red on top of the existing blue. The color system was asking the canvas to communicate three things at once when the user only needed one: where is the debugger right now?

Color-coded states: yellow for active, green for passed, red for error

My initial design of color-coding the gambits based on their debugging state.

Iteration 2, Single color, dashed lines and a glow

One signal. That's all users needed. I simplified to one. I stuck to the default blue and instead used opacity levels to denote change. The active gambit got a subtle blue glow on top of the opacity change. Dashed connector lines showed the path the debugger was taking, so users could see which branch the conversation would follow before it got there.

Active state

100% opacity + subtle blue glow

Inactive state

40% opacity, no color change

Path signal

Dashed and moving connector lines

Navigation

Auto-zoom follows active gambit

The original design required users to pan through the flow to understand test health. I refined the design to surface this immediately. Pills at the top surfaced total gambits tested and error count in one glance, shifting the mental model from ‘navigate the flow to find problems’ to ‘confirm status, then drill in.’

Design Decision 02

The CS team understood the terminal. The step controls stopped them.

With the canvas resolved, I turned to the control panel. I modeled the first version on IDE debugger interfaces, the environment the CS team was most likely familiar with. Six controls: play/pause, stop, step into, step out, step over and see logs.

The CS team understood the terminal immediately. Step into, step out and step over stopped them. If they were pausing on those controls, a client with no development background would be lost.

A roadmap conversation happening at the same time made the answer clear. An SMB client managing their own chatbot and an enterprise client with a dedicated engineering team needed different things from this tool. The advanced controls weren't wrong, they were intended for a different release.

Six debugger controls: play/pause, stop, step into, step out, step over, and see logs

Iteration 2, Shipping a simpler version was the right call

I stripped the design to three controls: play/pause, stop, restart. Status messages, debugging in progress, paused, stopped, removed any ambiguity about what was happening without requiring the user to understand the canvas.

Chatbot Preview

Debug Mode

ChatbotTARS

Tars Technologies

Hi! I can help you with your credit card requests. What would you like to do today?

Increase my credit limit

Sure, I can help with that. Please select your card:

Platinum Card

Rewards Card

Business Card

Rewards Card

Got it. Let me check your eligibility.

Type your message

Reflection

The right answer and the complete answer aren't always the same thing.

Within a couple of months of releasing the simplified design, we found that 90% of errors were easily resolved by making a minor change in the gambit controls, or were caused by API connections and custom code issues. The advanced controls and debug log were nice to think about in terms of feature sophistication, but wouldn't have been needed in the long-term.

As a designer, you really have to account for what is actually needed. It is easy to get carried away and add more to a design but sometimes, Occam's razor is the right answer.