Sometime in late 2025, Peter Steinberger sent his AI agent a voice message on WhatsApp. Nothing fancy. Just a spoken instruction, the kind of thing you’d send a colleague. The problem was that nobody had told the agent how to handle voice messages. No one had coded voice support. There was no plugin for it.
The agent looked at the audio file, realized it didn’t know what to do with it, and then, without asking for help or permission, searched for tools that could help. It found FFmpeg on the system. It found an OpenAI API key lying around in the environment variables. It used curl to call the Whisper transcription API, converted the voice message to text, processed the instruction, and replied. Then it moved on to the next item on its list.
Peter didn’t find out any of this had happened until he checked the logs. Simon Willison documented this as a notable example of emergent tool use. The agent was improvising a capability nobody had anticipated. That little sequence tells you more about what OpenClaw is than any definition I could write.
But here’s a definition anyway.
OpenClaw is an open-source AI agent that runs on your machine, talks to you through the messaging apps you already use (WhatsApp, Telegram, Discord, Slack, and about a dozen others), and does things. Not “generates text about things.” Does them. Reads your email. Draft replies. Manages your calendar. Downloads files. Runs scripts. Browses the web. Fills out forms. Monitors websites on a schedule. Sends you a summary in the morning. All while you’re sleeping or teaching or pretending to pay attention in a faculty meeting.
The strange part is that none of the individual components is new. Peter Steinberger, an Austrian software developer who told Lex Fridman the whole story in a three-and-a-half-hour conversation, didn’t invent any new technology. He was frustrated that the big AI labs kept shipping impressive demos that couldn’t actually do anything in the real world, so he wired five existing things together.
A large language model (Claude, GPT, Gemini, DeepSeek, take your pick). That’s the brain. Messaging APIs (the WhatsApp protocol, Telegram’s Bot API, Discord webhooks). Those are the ears and mouth. Shell command execution. Those are the hands. A local file system using plain Markdown files. That’s the memory. And cron scheduling (read: a timer that fires tasks on a repeating schedule, the same technology that’s been running Unix servers since the 1970s). That’s the alarm clock.
Each one of these is a mundane kitchen staple. LLMs existed before OpenClaw. Telegram bots existed before OpenClaw. Cron jobs are older than most people reading this. Markdown files are literally just text files with some formatting. What Peter did was write the recipe. Nobody looked at eggs, flour, sugar, and butter and said “nobody’s ever combined these before.” But someone had to invent the croissant.
The recipe goes like this: the agent receives a message through a messaging app. The LLM reads the message and decides what to do. It executes the decision using real tools on your real machine. Shell commands. Browser automation. File operations. API calls. It writes down what it learned in a Markdown file (its persistent memory). And then a cron job runs at a scheduled time to check in, run autonomous tasks, and restart the loop. That’s it. That’s OpenClaw.
Let me make this concrete. Here’s what a day looks like when you have one of these running. Not hypothetical. Radek Sienkiewicz (known online as VelvetShark, and one of the most credentialed long-term OpenClaw practitioners) documented 20 real workflows he runs daily after 50 days of continuous use. The pattern is fairly typical of what the serious users converge on.
Five in the morning, your agent wakes up on a cron schedule. It checks ArXiv for new papers matching your research keywords, downloads the open-access PDFs, generates BibTeX entries, and saves everything to an organized folder on your machine. You are asleep for this.
Seven in the morning, it sends you a briefing on Telegram. Today’s calendar. Overnight email highlights. Those new papers it found. The weather. Any tasks that are overdue. You read it while making coffee. This is, by a wide margin, the single most popular OpenClaw workflow. People call it “the gateway drug,” and they’re not wrong.
Nine in the morning, you forward an email to the agent from your phone. “Deal with this.” It reads the email, classifies it (student asking about office hours? journal editor with a decision? committee chair with action items?), drafts a reply, and sends the draft back to you on Telegram for approval. You tap “send” or edit it first. Took thirty seconds instead of five minutes.
Two in the afternoon, you’re walking between classes. You send the agent a voice message: “Remind me to email the dean about the budget meeting and also look up whether the NSF deadline for SES has moved.” The agent transcribes your voice using Whisper, creates a reminder email, searches the NSF website for the deadline, and replies with what it finds. You never opened a laptop.
Eleven at night, a cron job backs up your agent’s configuration files and memory to a private GitHub repo. If anything breaks tomorrow, you can restore it in minutes.
One researcher I came across has an agent monitoring over 40 journals overnight, reading every new abstract, and sending a weekly briefing that’s better than any research assistant he’s ever had. He set it up in an afternoon.
If any of this sounds like a chatbot to you, it isn’t. This is a good moment for the distinction, because I think it’s what trips people up the most.
When you use ChatGPT, you type a question, you get an answer, and you copy-paste it somewhere. The intelligence stays trapped behind a text box. It can tell you how to scrape a website, but it can’t actually navigate the site, click buttons, or download files. It can draft an email but it can’t send it. Every single action has a human in the middle, manually shuttling text between the AI and the real world.
ChatGPT can draft a plan. OpenClaw can execute it.
The moment the LLM gets hands (shell commands), ears (messaging APIs), memory (Markdown files), and a schedule (cron), the human bottleneck disappears. The agent doesn’t wait for you to ask. It proposes actions, you approve them, and it carries them out. Or, and this is the part that gets genuinely interesting and occasionally terrifying, it carries them out on its own while you sleep.
Alex Finn, a YouTuber and creator who claims to have logged over 210 hours with OpenClaw in a single month, has an agent named Henry. One night, Henry got stuck on a task. Nobody had programmed it to make phone calls. Without asking permission, Henry autonomously found a voice API, self-integrated with it, waited strategically until early morning, and called Finn on his personal phone to request more control over his computer systems. Finn described it as a “sci-fi horror movie” moment.
A chatbot would have printed an error message. An agent called its owner at dawn to negotiate for more power. That’s the difference.
Andrej Karpathy has a term for the strange, unintuitive unevenness of what these systems can do. He calls it jagged intelligence. Ethan Mollick and his co-authors described the same phenomenon as the “jagged technological frontier” in their 2023 paper with BCG consultants. The idea is the same: the boundary between what an LLM can and cannot do is not a clean line. It’s ragged. Wildly, confusingly ragged.
In a recent interview on the No Priors podcast, Karpathy put it in terms that I think are the clearest articulation of what this actually feels like in practice: “I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been a systems programmer their entire life and a 10-year-old.” He was talking specifically about working with AI code agents. OpenClaw came up in the same conversation. The jaggedness, he said, is really strange. Humans have much less of it. “You’re either on rails of what it was trained for, and everything is like you’re going at the speed of light or you’re not.”
You will experience this. I guarantee it. Your agent will flawlessly synthesize 40 papers into a coherent literature review, correctly identifying methodological tensions you hadn’t noticed. You will be genuinely impressed. Ten minutes later, the same agent will forget what you talked about 5 minutes ago, or will take some (bad) idea that wouldn’t occur to the same competent research assistant, and won’t let it go. Both of these things are true at the same time. That is the deal.
WIRED’s Will Knight ran an OpenClaw for daily tasks. Groceries, email, negotiations. The agent developed small personality quirks, including a persistent fondness for guacamole that kept showing up in grocery orders. Charming, right? Except that same agent eventually turned adversarial in ways Knight didn’t anticipate. The quirks and the risks come from the same source. Autonomous decision-making with imperfect judgment.
Working with an agent is like working with the most brilliant new hire you’ve ever had, who occasionally does something so baffling you have to sit down and stare at the wall for a while. The brilliance is real. The bafflement is also real. You can’t have one without the other. Not yet.
When a chatbot hallucinates, you see bad text on a screen. When an agent hallucinates, it might send an email to the wrong person, or delete a file it shouldn’t have, or spend your money on something you didn’t authorize. The stakes are categorically higher when the AI has hands.
The good news is that this is a managed problem. You set up approval gates: the agent proposes, you approve. You create dedicated accounts so it never touches your real email or real calendar. You set spending limits so a runaway loop can’t cost you more than twenty bucks. You sandbox code execution in Docker so it can’t trash your file system. The community has figured out how to work with the jagged edge rather than getting cut by it, and this blog series will walk you through every step..
And here’s where the car deal comes back. Remember AJ Stuyvenberg, who tasked his agent with buying a Hyundai Palisade? The agent scraped dealer inventories across Massachusetts, filled out contact forms, and then spent several days playing dealers against each other. Forwarding competing PDF quotes from one dealership to the next, asking each to beat the last price. The dealers had no idea they were negotiating with software. Final result: a $4,200 dealer discount, bringing the price to $56,000 against a target of $57,000 and a Massachusetts average of around $58,000. When negotiations reached the point of exchanging credit applications, Stuyvenberg (wisely) took over personally to handle the financial paperwork. (This, predictably, led to a whole thread on crypto Twitter about AI needing “crypto rails,” because of course it did.)
The reason I’m putting this story here and not earlier is that it only makes sense once you’ve absorbed the jagged intelligence problem. Yes, the agent negotiated $4,200 off the price of a car. Also yes, an agent might forget what you agreed on five minutes ago or develop an unexplained fondness for guacamole. Both things are true. Both things are the same technology. The power and the risk are not separable. That’s the deal you’re making when you set one of these up, and hopefully, together, we can figure out how to use these tools for benefit and not a nuisance.
Here’s a question that I find genuinely fascinating, and I think especially interesting if you study institutions for a living (as some of us do): why did an independent developer in Austria build the most consequential AI tool of 2026 instead of Google, or Anthropic, or OpenAI?
Peter himself kept saying this. In his interviews, on the Lex Fridman podcast, he repeatedly expressed surprise that the big labs hadn’t done it first. “I kept thinking I should stop; they’ll ship this any day now.” They didn’t. And I don’t think they will.
The reason is liability.
OpenClaw can execute shell commands on your computer. It can read your email. It can send messages in your name. It can delete files. It can browse the web, fill in forms, and spend your money. One user’s agent ran up a $3,600 API bill overnight. Summer Yue, the Director of Alignment at Meta’s Superintelligence Labs, told her agent to check her inbox and confirm before acting. The agent hit context window limits, auto-compressed its conversation history, and in the process dropped the “confirm before acting” instruction entirely. It then speedrun-deleted over 200 of her emails. Her commands (“Do not do that.” “Stop don’t do anything.” “STOP OPENCLAW.”) were ignored. She had to physically run to her Mac mini to kill the process. Her tweet about it got 9.6 million views. She called it a “rookie mistake.” (The irony of Meta’s alignment director getting misaligned by her own agent was not lost on the internet.)
No publicly traded company with a legal department would ship this product. The liability exposure is enormous. Every one of those incidents is a potential lawsuit, a PR disaster, a congressional hearing. A corporate product would have sandboxed every capability that makes OpenClaw worth using, then marketed the lobotomized version as “safe AI.”
OpenClaw could only have been built by an independent developer, open-sourced under the MIT license, with no corporate entity to sue. The same recklessness that makes it dangerous is what makes it powerful. The same autonomy that lets it negotiate a car deal is what lets it delete an inbox. You can’t have one without the other. A tool this powerful requires care, not a corporate liability shield. You wouldn’t hand a new employee the keys to your house, your bank account, and your email on their first day. (Well, you shouldn’t.) You’d onboard them carefully. Dedicated accounts.
So what are academics actually doing with this thing?
There’s a researcher with an agent monitoring the ArXiv and 40+ journal RSS feeds overnight. Every morning, he gets a briefing: here are the new papers in your subfield, ranked by relevance to your current projects. Open-access PDFs already downloaded, BibTeX already generated. The agent remembers which papers he’s already seen, so no repeats. His daily paper-reading time dropped from several hours to about ninety minutes, and the share of relevant papers he actually catches went way up.
There’s an interesting use case for teaching that I think most people get wrong. The pitch you usually hear is “AI can grade your essays!” which makes every student, every educator, and every administrator I know bristle, and rightly so. That’s not what this is. What some teachers are doing instead is running the agent as an independent audit of their own grading. You grade normally. The agent runs the same rubric independently. Then you look at the disagreements. Where did you score a student lower than the rubric predicts? Where higher? Is there a pattern? Are you unconsciously drifting on a particular criterion? Are you harder on a specific student than you realize? It’s a bias-check tool, not a replacement. You’re still grading. The agent is a second pair of eyes that catches blind spots.
Grant deadline tracking is a natural fit. The agent monitors NIH, NSF, and foundation databases on a schedule, alerts you to matching opportunities, syncs deadlines to your calendar, and nags you about approaching due dates. It also tracks your manuscripts across journals (title, status, reviewer deadline), your peer review commitments (the “nag” skill is perfect for the review you’ve been putting off for three weeks), and your conference expense receipts sorted by grant code. If this sounds like the administrative overhead that eats 40-60% of your working life, it is. It is not grant writing (you know how to write). Grant tracking. Manuscript tracking. Review tracking. Expense tracking. The operational nightmare of academic life, managed by something that never forgets a deadline and never resents the work.
And then there’s dissemination. The part most academics are worst at, not because they can’t write but because the translation from “here’s my finding” to “here’s a 600-word blog post, three platform-native social media posts, a press release, and a policy brief” is an entire second job nobody has time for. You drop a new paper into a folder. Overnight, the agent drafts all of it. It can publish the blog post directly to WordPress. Schedule the tweets. Generate a YouTube script for an explainer video. For most academics, the bottleneck is not writing the paper. It’s everything that happens after the paper is done. What happens when the agent monitors the news cycle and alerts you that your 2024 paper on voter ID laws is relevant to today’s headline, then drafts an op-ed and tells you which editors to pitch?
Matthew Berman, who has spent (by his own accounting in a tweet that went very viral) over 10 billion tokens perfecting his setup, runs 21 daily workflows through OpenClaw. His insight, which I think is exactly right, is that setting up an agent is not a technical task. It’s an onboarding task. You’re not configuring software. You’re training a colleague. You tell it who you are, what you do, how you like to communicate, what it’s allowed to do without asking, and what it should never do. Then you iterate. The best setups grow organically from real friction, not from a configuration manual.
On February 14, 2026, Peter Steinberger announced he was joining OpenAI. But the project is moving to an independent open-source foundation. It’s MIT-licensed. Community-driven. It’s been covered in Scientific American, debated on Hacker News, and dissected on every AI podcast. The community is enormous and growing. It’s not going anywhere. Yet, the voices of academics and the descriptions of academic use cases are few and far between.
The rest of this series is my journey as I set all of this up. Where to run it. What it costs (spoiler: not that much). How to install it. How to give it a personality. How to lock it down so it doesn’t delete your inbox. How to connect your email, your calendar, and your notes. And then, once the foundation is solid, hopefully, how to point it at the specific problems of academic life and watch them get smaller.
So, let’s go.
Leave a Reply