β€’
Drizz raises $2.7M in seed funding
β€’
Featured on Forbes
β€’
Drizz raises $2.7M in seed funding
β€’
Featured on Forbes
Logo
Schedule a demo
Schedule a demo
Blog page
>
We Connected Discord to a Mobile AI Agent. Here's What Actually Happened.

We Connected Discord to a Mobile AI Agent. Here's What Actually Happened.

OpenClaw is an open-source AI agent you self-host and control through your messaging app. We connected it to Drizz.dev, a Vision AI engine that executes real tasks on real mobile apps. No scripts. No selectors. No manual configuration.
Author:
Jai Aggarwal and Yash Varyani
Posted on:
March 17, 2026
Read time:
5 minutes

What the Demo Actually Shows

Someone types in Discord: "@Openclaw search for a place to stay in San Francisco on Airbnb and find sunscreens on Amazon"

That's the entire input. One sentence in a chat window.

What follows is a real sequence of actions on a real Android device. Airbnb opens. The search field fills. Location suggestions appear and get resolved. The date gets set. A filtered results screen loads.

Nobody wrote a script for this. Nobody mapped the Airbnb UI. Nobody pre-configured a single selector. The system read the screen, understood what it was looking at, and navigated it start to finish, including autocomplete dropdowns and permission dialogs that appeared mid-flow.

That's the demo. Now here's how it works.

What Is OpenClaw?

OpenClaw is a free, open-source AI agent that you self-host on your own machine or a VPS. Once running, it connects to whatever messaging platform you already use : Discord, Telegram, WhatsApp, Slack, and gives you an AI agent that can take real actions, not just answer questions.

You install it via npm ➑️ pair it with a Discord bot token from the Discord Developer Portal ➑️ connect an LLM API key (Claude, GPT, or a local model via Ollama), and your agent is live. From that point, anything you type in your Discord channel goes to OpenClaw, which understands it and decides what to do.

Out of the box, OpenClaw can browse the web, manage files, run terminal commands, and keep persistent memory of everything you tell it. What we did was extend it further, connecting it to Drizz.dev so it could also execute tasks directly inside mobile apps.

Get started: https://github.com/openclaw/openclaw| penclaw.ai

What Is Drizz.dev?

Drizz.dev is a Vision AI execution platform built specifically for mobile. You give it an instruction, and it operates a real mobile device to complete it, reading the screen with computer vision, identifying UI elements, and navigating the app in real time.

It doesn't use selectors, XPath, or any pre-mapped UI structure. It sees the screen the way a human does and makes decisions accordingly. That means it works on any app, any flow, without needing to be pre-configured for it.

Developers can sign up directly at drizz.dev and connect their own pipelines to it. What OpenClaw provided in this demo was the front-end, the natural language layer that took a Discord message and turned it into a structured instruction Drizz.dev could act on.

Get started: drizz.dev

How the Two Connect: The Full Pipeline

Step 1 β€” User asks openclaw via discord to look up stays in San Francisco and look up travel essentialsk on Amazon

Step 2 β€” OpenClaw receives the message, parses the intent, and converts it into a structured instruction Drizz.dev can understand.

Step 3 β€” Drizz.dev receives the instruction and executes it on a real Android device, opening Airbnb, filling fields, resolving suggestions, setting dates, all in real time.

Step 4 β€” The task is done. No human touched the phone. No script was written. The loop closed.

That's the full pipeline. Discord β†’ OpenClaw β†’ Drizz.dev β†’ mobile device.

Why Mobile Is the Hard Part

Most automation targets web or API surfaces, environments designed to be controlled programmatically with stable, documented interfaces.

Mobile apps were built for human hands. The screen changes constantly based on state, OS interrupts, and app updates. A button that was here in the last build might be behind a dialog in this one. Traditional mobile automation works around this by having developers manually map every element and write selectors, which break the moment the app updates.

Drizz.dev reasons about the screen from scratch every time. That's what made the Airbnb demo possible without any Airbnb-specific setup, and it's what makes this approach generalize to any app without starting over.

Real World Use Case: One Message, Two Actions

Here is what this pipeline looks like in a real scenario a developer or user would actually care about.

You are traveling to San Francisco. You open Discord and type: "Find me a place to stay in San Francisco on Airbnb and look up travel essentials on Amazon."

That is one message. What the agent does with it is two separate tasks across two separate apps, back to back, without any input from you in between.

OpenClaw parses the full instruction and breaks it into two actions. Drizz.dev opens Airbnb, enters San Francisco as the location, and lands on filtered results. Then without stopping, it opens Amazon, searches for required products, and returns the results page , your travel essentials sorted before you've even packed.

One More Thing: This Was All Run From a Phone

Here's a detail that didn't make it into the demo video but matters to anyone thinking about what this stack actually requires to run.

The entire setup: terminal, pipeline, remote machine connection, was operated from a phone. Not a laptop. Not a desktop. A phone, connected to a remote machine, running everything.

That's not a flex. It's a signal. It means this pipeline is lightweight enough to run without a traditional dev environment. The people who built this demo lived the same constraint they were solving for, mobile first, not mobile as an afterthought.

If the pipeline can be built and run from a phone, it can be deployed anywhere

‍

Schedule a demo