When the Avatar Speaks First — STNET × AICLUDE "Catch Sales" Kiosk PoC
When the digital signage on the street starts talking to you
Imagine you are walking past a shop and the character on the screen looks at you and says, "Hi — got a moment to chat?" Most people will assume it is just another advertisement and keep walking. But the Catch Sales kiosk PoC that STNET and AICLUDE ran across Kagawa retail sites from December 2025 through March 2026 tells a more nuanced story.
Only two CLU Agent Station (CAS) kiosks were placed in the field. Each unit combined a camera-based pedestrian sensor with a lip-synced avatar and STTS voice. Over roughly four months, those two kiosks racked up more than 6.4 million pedestrian detections, and when the avatar opened the conversation with a greeting, almost two out of every three people answered. This article revisits that PoC from a product engineering perspective.
What happened when the AI spoke first
While detecting more than 6.4 million people, the system went on to analyze the appearance of 7,341 pedestrians — age range, gender signals, and so on — and used that read to initiate 3,730 voice-outreach sessions. 66.4% of those outreach sessions got a reply, and each engaged session averaged about 7.3 messages exchanged — enough for the conversation to go beyond the opening greeting.
A second number is worth pausing on: 97.3% of all AI conversation sessions were initiated by the AI, not by the customer. That suggests the role AI can play in a physical retail setting is not "the information desk that waits for someone to walk up," but something closer to a salesperson who takes the first step. Pedestrians who would politely wave off a human staff member frequently responded when the screen character opened with a warm greeting.
But — between "saying hello" and "having a conversation" there is still a wall
The numbers look good, but the field also revealed clear limits.
A high response rate does not mean every reply turned into a substantive consultation. Pedestrians are usually on their way somewhere. Answering a greeting is easy; stopping on a public sidewalk to have a deep machine conversation is not. And just as important as that psychological hurdle was the simple physical problem of outdoor ambient noise. With street voices, neighboring music, and wind mixing in, even strong speech recognition loses precision.
In other words, "capturing attention on the street" and "having a detailed conversation inside the store" are not really the same channel of communication. A single mode of operation cannot serve both well. That discovery pointed directly at the next product decision.
A product pivot — one CAS, two modes
If the data taught us anything, it was that outside-the-store and inside-the-store demand fundamentally different interactions. One hardware shape cannot solve both with a single behavior. So the next release of CLU Agent Station splits its operation into two modes — Smart Signage Mode and AI Concierge Mode — and switches between them based on where the unit is installed and what it is there to do.
Smart Signage Mode lives outside the store. Its job is to capture attention and reinforce brand presence. It does not try to have a detailed conversation. Instead, using the camera's read on the people walking by — age signals, gender mix, dwell time — it chooses and surfaces the visual content most likely to stop that particular passerby. Think of it as a pacemaker whose job is to draw footsteps toward the storefront.
AI Concierge Mode lives inside the store, in a much more acoustically controlled environment. This is where the actual consultation happens. Plans, promotions, and frequently asked questions are pre-ingested into the Knowledge Hub, and the agent returns evidence-grounded answers. When the question gets complex, a staff Takeover hands the conversation off to a human without breaking the thread. The same CAS hardware simply plays a different role depending on where it stands.
Agentic — the end of outsourced creative
What turns Smart Signage Mode from "smart" into truly Agentic is that the content itself improves inside a closed loop without human authorship in the middle.
The camera aggregates gaze paths and dwell time in real time. The agent uses that to make tactical judgments — "the current hour skews toward older male footfall, so surface the loyalty-plan benefit poster." From there, the platform's Media GPU Worker generates the poster or short-form video directly inside the platform, without going out to an agency. The deployed content is then measured and fed back into the next cycle. The old loop of brief → agency → review → upload collapses into a single platform loop. The cost of creative production trends structurally toward zero, and signage stops being a cost center. It becomes a revenue-generating asset.
Knowledge learned in-store, available on the web the same day
The intelligence a visitor meets at a sidewalk screen does not end at the store exit. The conversations, the FAQs, the product knowledge accumulated in-store are bound together as one shared Knowledge, and that same knowledge flows into the company website and SNS chatbots.
From the customer's perspective, the character they spoke with in the store yesterday shows up again tonight on their phone. That means round-the-clock coverage and a consistent brand voice across on- and off-line — without standing up a separate chatbot. CAS is already one endpoint in CLU Agent Platform's 9+ omnichannel fabric, so this connection happens by default rather than as an extra project.
What this PoC left us with
Four months of data gave a fairly clear hint about where AI can usefully sit in physical retail. "Making the first move" works — the response rate proves it. At the same time, "carrying the conversation through" turns out to be as much a question of space design as of AI quality. Trying to solve the outdoor and indoor situations with a single mode ends up compromising both.
So the practical question for someone running a retail floor is not AI that talks vs. AI that listens. It is which mode, where. The Catch Sales PoC left one concrete, usable answer to that question.
More
- Product: CLU Agent Station (CAS)
- Solutions: Retail & Distribution, Telecom
Back to Blog