From Clicks to Commands: Our Journey Building an Agent Interface for Radiologists

At Rugas Technologies, we’ve been building Zymez iDIM — our cloud-based imaging platform that enables healthcare providers to receive, view, and report medical images securely from anywhere.

Designed for modern teleradiology workflows, it manages everything from DICOM image storage to structured reporting, helping radiologists deliver fast and consistent interpretations.

At the heart of iDIM lies our radiology viewer, a customized version of OHIF, equipped with an extensive set of diagnostic tools — from MPR (multi-planar reconstruction) and slice thickness adjustments to ROI measurement, contrast windowing, and annotation features.

It’s powerful, flexible, and clinically proven — but with great capability comes complexity.

For seasoned radiologists, the viewer becomes second nature. But for new users — especially those joining tele-radiology networks — the learning curve can be steep. Understanding where each tool is, how to activate it, and which sequence of operations to follow for a particular study type takes time and experience.

And much of that experience is repetitive.

For instance, in chest CT studies, radiologists might always begin with the same steps — adjusting window/level presets, enabling MPR, zooming into the lung fields, and marking ROIs.

These actions don’t require medical judgment — they’re simply operational steps before the real analysis begins.

That’s when the question surfaced:

“If these actions are mostly predictable and repetitive, why should radiologists spend time clicking through menus to perform them?”

What if there was a way to interact with the radiology viewer naturally — not through toolbars and icons, but through intent?

That question became the spark for our first experiment with MCP (Model Context Protocol) — and the beginning of building an Agent interface for the radiology viewer.

The Spark — And the First Big Challenge

Once we decided to build an Agent interface for our radiology viewer using MCP (Model Context Protocol), we were excited — but it didn’t take long to realize the complexity ahead.

Our viewer is a zero-footprint application — it runs entirely in the browser, without any local installation or setup. As long as users are authenticated into Zymez iDIM, they can access the viewer from anywhere, instantly. That design makes deployment seamless — but also creates a challenge.

Running a traditional MCP server that directly connects to a local app instance simply wasn’t possible here.

The viewer doesn’t run natively on the client machine, and we couldn’t assume local access to any ports or backend services from the browser.

For a moment, it felt like we’d hit a wall.

But then we flipped the problem around.

Instead of having the MCP server connect to the viewer, we let the viewer connect outward — to a central MCP Bridge.

Here’s how it worked:

The MCP server lives on the backend — it manages the model, available tools, and the logic for executing viewer operations.
We introduced an MCP Viewer Bridge, a middle layer designed to communicate over WebSockets.
The radiology viewer connects to this bridge when it loads, establishing a live, two-way WebSocket connection.

This meant the viewer could now receive commands from the MCP via the bridge — securely and in real time — without needing any local setup or installation.

Once the architecture clicked into place, everything became beautifully simple:

The viewer, bridge, and MCP server were now all speaking the same language — connected through a live WebSocket channel, ready to perform operations.

We had just built the communication backbone for our AI-powered Radiology Agent.

Building the Command Module — Making the Viewer Speak MCP

With the viewer, bridge, and MCP server now connected through WebSockets, the next challenge was clear: how do we make the viewer’s actions — things like opening MPR mode, adjusting slice thickness, or drawing measurements — executable as commands from code?

To achieve this, we built a Command Module inside our radiology viewer, following the Command Design Pattern.

Every operation that a user could perform through the interface — zoom, pan, adjust contrast, switch layouts, measure ROIs, enable crosshairs, change presets — was refactored into a self-contained command that could be invoked programmatically.

Each command followed a simple structure:

A command name (e.g., applyMPR, adjustSliceThickness, enableCrosshair)
A payload describing the parameters
An execute() method that performed the actual viewer action

This architecture turned the viewer into a scriptable, controllable system, not just a UI-driven app.

In our first phase, we implemented 52 commands — covering nearly all essential viewer functionalities used in daily radiology workflows.

This gave us a wide operational base for the MCP agent to work with, without worrying about unsupported actions.

Next, we integrated the Command Module with the MCP Viewer Bridge.

When the viewer connects, it automatically registers all available commands with the bridge. The bridge, in turn, informs the MCP server that the viewer is ready — and only then do we expose those commands to the language model.

This design ensured synchronization and safety — the LLM could only access tools that were live and connected in a verified viewer session.

In short, we built a pipeline that looks like this:

Viewer Commands → MCP Viewer Bridge → MCP Server → LLM Interface

Once everything was wired together, we had a system where an LLM could say:

“Enable MPR and adjust slice thickness to 5mm,”

and the viewer would instantly perform the operation — no clicks, no menus, no confusion.

The radiology viewer had finally become MCP-native — capable of understanding intent and executing action.

Bringing It All Together — The First Command Comes Alive

With all the building blocks in place — the MCP server, the Viewer Bridge, and the Command Module inside our OHIF-based radiology viewer —

we were ready for the most exciting part: connecting the human to the system.

We quickly added a simple chat window right inside the viewer interface — a small, minimal panel where the radiologist (or tester, in our case) could type natural-language commands.

Behind the scenes, every chat message was sent to our backend via an API.

The backend stored the conversation history, verified that the viewer was actively connected through the MCP Bridge, and then invoked the LLM.

The LLM interpreted the user’s intent — something like:

“Set the zoom to 150%,”

and translated that into the corresponding MCP command.

The command flowed back through the pipeline:

Chat Input → Backend API → MCP Server → MCP Viewer Bridge → Viewer Command Module

And then, it happened —

the viewer smoothly zoomed in to 150%.

No button clicks.

No hunting for icons.

Just a simple instruction, understood and executed flawlessly.

That first moment was electric — seeing the viewer respond intelligently to natural language felt like crossing a line from interface to interaction.

We weren’t just controlling software anymore; we were conversing with it.

What started as a vision to simplify radiology workflows had now become a working AI-powered assistant inside the viewer — ready to interpret commands, perform operations, and make imaging analysis more intuitive than ever before. You can watch a quick demo from below.

What We Learned — and What Comes Next

Looking back, this project started as a simple question:

Can we make the radiology viewer easier to use — so radiologists can focus on diagnosis, not on the interface?

That question took us through an incredible journey — from rethinking our architecture around MCP, to building the Viewer Bridge, designing the Command Module, and finally watching the viewer respond to a natural-language instruction for the very first time.

In the process, we learned a few important lessons:

MCP changes how we think about UI — it decouples how something is done from what the user wants done.

Bridging intent and execution requires careful design — the Command Pattern and Bridge architecture made that possible.

Natural language can be a usable interface — not as a gimmick, but as a practical productivity tool for complex domains like radiology.

And this is just the beginning.

The next step is to take the assistant beyond tool execution — to true intelligent interpretation.

We’re now exploring more advanced operations such as:

“Segment the liver and measure its volume.”

“Compare this CT with the previous study and highlight any new lesions.”

“Generate a preliminary impression based on current findings.”

By combining intent understanding with vision inputs — allowing the AI to “see” what the radiologist sees — we’re taking the first steps toward creating a truly intelligent radiologist assistant inside Zymez iDIM.

What began as an experiment to make the viewer smarter might soon become a platform that amplifies human expertise, helping radiologists diagnose faster, with greater precision, and less friction.

The future of medical imaging isn’t just AI that interprets scans —

it’s AI that understands radiologists.

Contact Info