CLU Agent Station
An AI concierge and signage solution for your customers.
Detect passersby with a camera, respond in real-time voice, render lip-synced avatars and run signage: all on one device.
One web build. Mobile, tablet, and desktop.
The concierge auto-fits every form factor from a single web build. The same URL on mobile, tablet, or desktop returns the same session and the same answer.
An AI concierge that works the moment you plug it in.
Set it on site and it handles detection, conversation, signage and reporting: automatically.
Pedestrian auto-detection
Camera detects approaching visitors in real time and greets them with attribute-aware messages.
Real-time STTS voice
Mic input flows straight through to spoken response without breaks, keeping conversational turns natural.
Browser-native avatar
Runs in any web browser without a dedicated GPU server. Mouth shapes and expressions blend with the voice so the avatar responds like a person.
Signage video playback
Seamless video loops and a news ticker run alongside the concierge on the same screen.
Attention & frontal gaze
Per-content dwell time and frontal-gaze ratios are measured automatically and turned into KPIs.
Weekly PDF reports
Footfall, conversion, top content: published as a PDF every week without lifting a finger.
Two tracks, one device, running in parallel.
The concierge loop and the signage + analytics track operate independently on the same screen.
When the network drops the IndexedDB cache keeps the signage loop and greeting clips alive.
Self-hosted Docker stack
One station = one container set. Drops onto an in-store PC or edge box without changes.
- embedding3.2 GB
- detect250 MB
- sttavatar1.1 GB
- ttsavatar1 GB
- liveportraitavatar800 MB
- Commercial LLM API
- Self-hosted LLM
- Local Inference
| Tier | vCPU | Memory | Disk | Notes |
|---|---|---|---|---|
Minimal (cas-chat) | 2 vCPU | 4 GB | 30 GB SSD | Text only. Excludes STT, TTS and avatar daemons. |
Standard (cas-avatar)Recommended | 4 vCPU | 16 GB | 50 GB SSD | Voice + avatar full set. 5 to 10 devices. |
Large (multi-device) | 8 vCPU | 32 GB | 100 GB SSD | 20 to 50 devices, ~100 concurrent conversations. |
- Minimal (cas-chat)
- vCPU
- 2 vCPU
- Memory
- 4 GB
- Disk
- 30 GB SSD
Text only. Excludes STT, TTS and avatar daemons.
- Standard (cas-avatar)Recommended
- vCPU
- 4 vCPU
- Memory
- 16 GB
- Disk
- 50 GB SSD
Voice + avatar full set. 5 to 10 devices.
- Large (multi-device)
- vCPU
- 8 vCPU
- Memory
- 32 GB
- Disk
- 100 GB SSD
20 to 50 devices, ~100 concurrent conversations.
- One station can drive any number of displays (signage, tablet, web, mobile) with no extra fee.
- Offline cache keeps the core experience alive for up to 72 hours of license grace.
- Real-time STTS runs without a GPU on the standard tier.
Wherever your front line is.
Already proven at a Shikoku retail store: 6.4M passersby, 3,709 conversations, 66.4% engagement.
Retail concierge
Recognize visitors and present events, stock and discounts in their language.
Airports & hotels
Departure gates, check-in counters, shuttle times: voiced and shown on signage at once.
Museum docent
Per-exhibit docent modes, multilingual narration, automatic kid / tourist mode switching.
Hospitals & public reception
Reception, floor guidance and queue updates over voice and signage cuts staff load.
One station = concierge + signage + analytics.
Start with one AI concierge.
Pricing is per station: every additional display you bring up runs the same station for free.