curl -fsSL https://openclaw.ai/install.sh | bash
Blog
-
How Should I Talk to My Claw? Communication Channels for OpenClaw
Telegram is the default. Everyone recommends it. Every tutorial starts with it. And that’s fine if you haven’t thought very hard about what you actually need. I have, and this post documents that thinking. If you’re a social scientist considering OpenClaw for research work (or honestly, anyone who cares about where their data ends up), this might save you a few weeks of going in circles.
Here’s what I need. First, topic-specific channels. I want a separate channel for each research project, one for admin, one for MethodsNET social media research, one to keep track of MethodsNET people’s publications, one that keeps track of all cool methods publications in general social science journals, one for data wrangling, one for scraping needs, and I want different agents (potentially running different LLMs) bound to each of them. This is the whole point of a multi-agent OpenClaw setup, and the communication layer needs to support it. Second, I need privacy appropriate for social science work. This isn’t about state secrets, but it is about GDPR, which we need to be taking seriously. Survey data is usually anonymized, but panel data can be personalized. Open-ended responses can contain identifying information. Transcribed interviews almost certainly do. Sending any of this through Discord’s or Telegram’s servers is, at minimum, a conversation you don’t want to have with your university’s data protection officer. (And if you’ve read my rant about academic ethics administration, you know I don’t love that conversation even when I’m clearly in the right. And I want to make sure that ethically I want to be in the right, even if procedures do not make sense.) Third, I want a casual channel for just talking to my Claw day to day. “Hey Bruce, what’s on my calendar?” “Summarize that paper I dropped in your folder.” That kind of thing. These are two different use cases. They probably need two different solutions.
Before diving into specific platforms, there’s an important structural distinction to make. OpenClaw ships with (an ever-increasing number of) native communication channels bundled into the core install: WhatsApp, Telegram, Discord, Signal, iMessage, IRC, and, etc. Others, like Mattermost, Nextcloud Talk, and so on, ship as plugins you install separately. The plugin install is one command (openclaw plugins install @openclaw/mattermost) and the plugin updates independently from OpenClaw’s release cycle, which is actually nice. But it’s still a dependency. One more thing that can break when you update. One more thing to debug when it does. If you’re building infrastructure you intend to rely on daily, bundled is safer. Not a dealbreaker, but a factor I weighed.
Discord: The Configurability King
Among bundled channels, Discord wins on organizational depth, and it’s not even close. You get categories containing channels containing threads containing forum posts, and every one of these levels can be independently bound to a different OpenClaw agent running a different LLM. Each agent gets its own Discord bot with its own avatar, name, and presence, so your coding agent looks different from your research agent. Discord also has the richest interactive component model: buttons, select menus, forms, media galleries, all routed back to the agent as inbound messages. And the thread binding system is particularly clever: the /focus command lets you dynamically attach a thread to a specific agent or subagent session, and /unfocus releases it. For managing multi-agent workflows, this is exactly the kind of tooling you want.
But Discord is cloud-hosted on Discord’s infrastructure. Your messages transit their servers. Your data sits on their servers. If all you work on is publicly available macro-comparative economic data from Eurostat or the World Bank, use Discord to your heart’s content. Honestly, it’s great. If your agent might ever touch GDPR-relevant data, keep reading.
Telegram: The Pragmatic Default
Telegram is where most people start, and for good reason. It has first-class support in OpenClaw (via the grammY library), the best mobile notification experience of any channel, and the simplest setup. Supergroup topics (introduced in 2022, now mature) give you a flat list of topic threads within a single group, providing some channel-like organization. You can bind different agents to different topics. It works.
The organizational model is shallower than Discord, though. Topics are one level deep: all siblings, no hierarchy, no categories. And the cloud-trust-model is the same as Discord: your messages transit Telegram’s servers. The track record is arguably slightly better (Telegram has been more aggressively privacy-positioned in its marketing, whether you believe them is another conversation), but the fundamental architecture is the same. Third-party servers see your data.
Nothing Else Bundled Comes Close
I looked at everything. Slack matches Discord’s organizational depth (channels, threads, distinct bot identities, Block Kit interactive components), and its thread model is arguably better for agent work because threads are first-class conversation containers. But Slack is cloud-only on Salesforce’s infrastructure. That’s worse for privacy, not better. Microsoft Teams has a comparable structure (teams → channels → threads) and is bundled, but the bot framework is heavy, the developer experience is worse, and you’re on Microsoft’s infrastructure. Google Chat has spaces and threads, but the organizational model is shallow, and the bot API is limited. IRC is the dark horse. It is fully self-hostable, channels map perfectly to topics, has zero third-party dependencies, but there’s no encryption, no modern UX, no mobile push without extra infrastructure, and no media sharing. You’d need a modern IRCv3 server plus a bouncer plus a decent web client (The Lounge, maybe), and at that point, you’ve assembled a lot of glue to get something worse than what a plugin gives you out of the box.
The bottom line is this: if you want Discord-level flexibility with self-hosting, you’re looking at a plugin. There’s no way around it with the current bundled set.
Mattermost: Self-Hosted Discord, Basically
Mattermost is what I landed on for structured work. It gives you channels, threads, DMs, multiple bot accounts with distinct identities, interactive buttons, and a familiar Slack-like UX. It’s a single Go binary plus PostgreSQL, a reverse proxy a reverse proxy (e.g., Nginx, Caddy, or Traefik), Docker Compose it up, and you’re done. Significantly easier to deploy and maintain than the alternatives. If you’ve ever administered a web app, you can run Mattermost.
On the OpenClaw side, Mattermost is a plugin (yes, I know what I said above about bundled being safer). The integration connects via bot token and WebSocket events, supports channels, groups, DMs, threads, reactions, and interactive components. It supports multiple bot accounts, so you can bind different agents to different channels with distinct identities. The setup is straightforward: install the plugin, create a Mattermost bot account, point OpenClaw at your server’s URL, and you’re running.
Now, the encryption story. Mattermost does not have native end-to-end encryption. The server sees plaintext. There is a third-party E2EE plugin from Quarkslab that uses WebCrypto with non-extractable keys, but it has real limitations: no mobile support, can’t edit encrypted messages, and the webapp integrity problem means an attacker with server control could theoretically deliver malicious JavaScript to extract keys. Mattermost and Qrypt announced a quantum-secure E2EE joint development program in mid-2025, but it’s not generally available yet.
Here’s why I don’t think this matters for my use case. You probably don’t encrypt every file on your own computer. (Unless Apple does it for you, which, fine, but you didn’t make that decision consciously.) The question that matters isn’t “is my data encrypted at rest on my own machine?” It’s “is my data leaving my infrastructure?” With self-hosted Mattermost behind your firewall, it isn’t. You control your data. No third-party platform touches the message content. The only data that leaves is whatever you send to the LLM API, and I am self-hosting that, too. For GDPR purposes, this is a defensible position. Certainly more defensible than routing everything through Discord’s or Telegram’s US-based infrastructure.
Matrix: The Security Maximalist Option (That I’m Not Using Yet)
I should mention Matrix because it’s the answer to the question I didn’t quite need to ask. Matrix has protocol-level end-to-end encryption (Olm and Megolm, the same Double Ratchet family as Signal), rooms and spaces for Discord-like organizational hierarchy, federation if you want it, and the OpenClaw Matrix adapter already supports E2EE through the Rust crypto SDK. It’s also a plugin, also well-supported.
But it’s heavier to deploy. Synapse (the reference homeserver) is Python and is known for being resource-hungry. Conduit (Rust alternative) is lighter but less mature. Federation adds complexity even if you don’t use it. Setup takes meaningfully longer than Mattermost, especially if your homeserver sits behind a VPN or requires certificate configuration.
Signal: The Casual Secure Channel
For the second use case (just talking to my claw or having a native backup in case something breaks), Signal is the obvious bundled answer. It has the strongest encryption of any bundled channel, full stop. The Signal Protocol (Double Ratchet + X3DH key agreement) with decryption happening locally inside the signal-cli process on your server. No data passes through any third-party AI provider’s infrastructure in transit. OpenClaw never sees plaintext messages on the wire; everything is decrypted locally.
Signal has no topic channels, no threading, no rich UI. It’s flat DMs and group chats. That’s it. But for casual daily interaction, that’s exactly what you want. “What’s on my calendar today?” “Summarize the email from the dean.” “Remind me to submit that review by Friday.” Quick questions, quick approvals, morning briefings. Signal does this well, and it does it with real encryption, not “trust us, we’re a platform” encryption of WhatsApp, Telegram, and etc.
It’s bundled, native, and well-supported via signal-cli (a Java-based command-line client). You’ll need a dedicated phone number for the bot. Don’t use your personal one; Signal’s self-message protection will cause routing conflicts. Link a second SIM or register a new number, and you’re set.
The Plan (For Now)
So here’s what I’m going with. Mattermost for structured work: topic channels bound to specialized agents with different LLMs, task management, sub-agent coordination, anything involving data or research workflows. Signal for casual daily interaction: the “hey Bruce” channel. Two channels, two purposes. Neither leaks data to a third-party platform. Both run on infrastructure I control.
This is the plan for now. It will almost certainly change as we actually build this thing. Maybe someone will build a native OpenClaw channel that does everything I want, and this whole post will be obsolete. That’s fine. The point isn’t to pick the perfect answer forever. The point is to think through what you need, make a defensible choice, and document why, so that when it changes (and it will), you know what you were optimizing for and what tradeoffs you’re willing to revisit. Let’s see how it goes.
-
Local AI for your team’s needs
First attempt to use OpenClaw with a locally run model (Minimax M2.5) was successful. It’s not my claw, Bruce, just yet. My team’s young colleague actually beat me in spinning up his Claw. We have two NVIDIA DGX Spark (like, Asus Ascent GX10) devices connected via a single half-meter QSFP56 cable. It is running the default system, just updated. We set up the connection between the two in the official way. We are running vLLM across the two, except that the vLLM container is updated to the newest 26.02 version. (I can’t wait for 26.03 or 04 as that will be Antropic API endpoint compatible with the Claude Code bugs fixed, and we will also be able to test QWEN 3.5 397b with it). We loaded this up with Mamy Ratsimbazafy’s (mratsim’s) (and thanks for the good work if you are reading this – it has been extremely helpful for us) BF16-INT4-AWQ mixed precision quantization of Minimax M2.5. (This mixed precision is supposed to be the best balance between size, speed, and performance degradation due to quantization.) We have a full context of ~200k and even KV cache to spare for parallelization. And it works. We tested GPT-OSS-120b, Nemotron 3 Super 120b, and even Kimi K2.5 served by NVIDIA’s free API, and none of them impressed. Minimax does. It is more than fast enough for the Claw. We are happy with the results. We didn’t compare with Claude Opus or Sonnet 4.6, but Peter Steinberger endorsed Minimax when Claude started making OpenClaw use difficult through their subscription. And Minimax, with its 230b parameters (10 active), is certainly more manageable than Kimi’s 1t, GLM’s 745b, or even QWEN’s 397b parameters. It can easily run on two sparks with KV Cache to spare, even with full context.
We’ll need to do more testing before this approach will have my full endorsement. But you can run this locally. It’s reasonably fast. It has strong parallelized performance, so more than one person (I am guessing up to a dozen) can use it at the same time without noticeable performance degradation. (I can’t wait to test it with Claude Code to see if it is fast enough for that.) You could spend the 7000-8000 (EUR or USD) on the hardware and serve Minimax for your whole team’s Claws, their coding, and chat needs. If this turns out to be as good as it sounds, I will buy a third Spark for day-to-day experimentation and dedicate these two to serving Minimax (or whatever the best model is that runs on the duo).
-
Claw Identity – Meet Bruce
I set up a Google account for my claw: Bruce. Bruce is named after the Bruce Nuclear Generating Station and my second data analysis server I had at CEU. (When talking about the first server with CEU IT, we first started calling it the power plant and then subsequently Paks. When I got an upgrade, I just found a bigger nuclear station, one with an easy name, to name it after. Current analysis machines have very different naming schemes. One’s called Monster, the Sparks are called Frodo, Bilbo, with Sam most likely incoming.)
I am neck deep in Google services: Drive, Google Docs, Gmail. So, I figured it just makes sense for Bruce to be on a Google account. I actually have a Google Workspace so I just did it there. Regular gmail would likely work as well, though I was worried about two-factor authentication and my SMS verification failed. (I do have too many gmail accounts, they probably limit how many you can verify with a single phone number.)
Then I started making an avatar. Meet Bruce, my molted alter ego.

OK. Just a quick one today. I am going to go and scare my children with this photo…
-
What is OpenClaw, Anyway?
Sometime in late 2025, Peter Steinberger sent his AI agent a voice message on WhatsApp. Nothing fancy. Just a spoken instruction, the kind of thing you’d send a colleague. The problem was that nobody had told the agent how to handle voice messages. No one had coded voice support. There was no plugin for it.
The agent looked at the audio file, realized it didn’t know what to do with it, and then, without asking for help or permission, searched for tools that could help. It found FFmpeg on the system. It found an OpenAI API key lying around in the environment variables. It used curl to call the Whisper transcription API, converted the voice message to text, processed the instruction, and replied. Then it moved on to the next item on its list.
Peter didn’t find out any of this had happened until he checked the logs. Simon Willison documented this as a notable example of emergent tool use. The agent was improvising a capability nobody had anticipated. That little sequence tells you more about what OpenClaw is than any definition I could write.
But here’s a definition anyway.
OpenClaw is an open-source AI agent that runs on your machine, talks to you through the messaging apps you already use (WhatsApp, Telegram, Discord, Slack, and about a dozen others), and does things. Not “generates text about things.” Does them. Reads your email. Draft replies. Manages your calendar. Downloads files. Runs scripts. Browses the web. Fills out forms. Monitors websites on a schedule. Sends you a summary in the morning. All while you’re sleeping or teaching or pretending to pay attention in a faculty meeting.
The strange part is that none of the individual components is new. Peter Steinberger, an Austrian software developer who told Lex Fridman the whole story in a three-and-a-half-hour conversation, didn’t invent any new technology. He was frustrated that the big AI labs kept shipping impressive demos that couldn’t actually do anything in the real world, so he wired five existing things together.
A large language model (Claude, GPT, Gemini, DeepSeek, take your pick). That’s the brain. Messaging APIs (the WhatsApp protocol, Telegram’s Bot API, Discord webhooks). Those are the ears and mouth. Shell command execution. Those are the hands. A local file system using plain Markdown files. That’s the memory. And cron scheduling (read: a timer that fires tasks on a repeating schedule, the same technology that’s been running Unix servers since the 1970s). That’s the alarm clock.
Each one of these is a mundane kitchen staple. LLMs existed before OpenClaw. Telegram bots existed before OpenClaw. Cron jobs are older than most people reading this. Markdown files are literally just text files with some formatting. What Peter did was write the recipe. Nobody looked at eggs, flour, sugar, and butter and said “nobody’s ever combined these before.” But someone had to invent the croissant.
The recipe goes like this: the agent receives a message through a messaging app. The LLM reads the message and decides what to do. It executes the decision using real tools on your real machine. Shell commands. Browser automation. File operations. API calls. It writes down what it learned in a Markdown file (its persistent memory). And then a cron job runs at a scheduled time to check in, run autonomous tasks, and restart the loop. That’s it. That’s OpenClaw.
Let me make this concrete. Here’s what a day looks like when you have one of these running. Not hypothetical. Radek Sienkiewicz (known online as VelvetShark, and one of the most credentialed long-term OpenClaw practitioners) documented 20 real workflows he runs daily after 50 days of continuous use. The pattern is fairly typical of what the serious users converge on.
Five in the morning, your agent wakes up on a cron schedule. It checks ArXiv for new papers matching your research keywords, downloads the open-access PDFs, generates BibTeX entries, and saves everything to an organized folder on your machine. You are asleep for this.
Seven in the morning, it sends you a briefing on Telegram. Today’s calendar. Overnight email highlights. Those new papers it found. The weather. Any tasks that are overdue. You read it while making coffee. This is, by a wide margin, the single most popular OpenClaw workflow. People call it “the gateway drug,” and they’re not wrong.
Nine in the morning, you forward an email to the agent from your phone. “Deal with this.” It reads the email, classifies it (student asking about office hours? journal editor with a decision? committee chair with action items?), drafts a reply, and sends the draft back to you on Telegram for approval. You tap “send” or edit it first. Took thirty seconds instead of five minutes.
Two in the afternoon, you’re walking between classes. You send the agent a voice message: “Remind me to email the dean about the budget meeting and also look up whether the NSF deadline for SES has moved.” The agent transcribes your voice using Whisper, creates a reminder email, searches the NSF website for the deadline, and replies with what it finds. You never opened a laptop.
Eleven at night, a cron job backs up your agent’s configuration files and memory to a private GitHub repo. If anything breaks tomorrow, you can restore it in minutes.
One researcher I came across has an agent monitoring over 40 journals overnight, reading every new abstract, and sending a weekly briefing that’s better than any research assistant he’s ever had. He set it up in an afternoon.
If any of this sounds like a chatbot to you, it isn’t. This is a good moment for the distinction, because I think it’s what trips people up the most.
When you use ChatGPT, you type a question, you get an answer, and you copy-paste it somewhere. The intelligence stays trapped behind a text box. It can tell you how to scrape a website, but it can’t actually navigate the site, click buttons, or download files. It can draft an email but it can’t send it. Every single action has a human in the middle, manually shuttling text between the AI and the real world.
ChatGPT can draft a plan. OpenClaw can execute it.
The moment the LLM gets hands (shell commands), ears (messaging APIs), memory (Markdown files), and a schedule (cron), the human bottleneck disappears. The agent doesn’t wait for you to ask. It proposes actions, you approve them, and it carries them out. Or, and this is the part that gets genuinely interesting and occasionally terrifying, it carries them out on its own while you sleep.
Alex Finn, a YouTuber and creator who claims to have logged over 210 hours with OpenClaw in a single month, has an agent named Henry. One night, Henry got stuck on a task. Nobody had programmed it to make phone calls. Without asking permission, Henry autonomously found a voice API, self-integrated with it, waited strategically until early morning, and called Finn on his personal phone to request more control over his computer systems. Finn described it as a “sci-fi horror movie” moment.
A chatbot would have printed an error message. An agent called its owner at dawn to negotiate for more power. That’s the difference.
Andrej Karpathy has a term for the strange, unintuitive unevenness of what these systems can do. He calls it jagged intelligence. Ethan Mollick and his co-authors described the same phenomenon as the “jagged technological frontier” in their 2023 paper with BCG consultants. The idea is the same: the boundary between what an LLM can and cannot do is not a clean line. It’s ragged. Wildly, confusingly ragged.
In a recent interview on the No Priors podcast, Karpathy put it in terms that I think are the clearest articulation of what this actually feels like in practice: “I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been a systems programmer their entire life and a 10-year-old.” He was talking specifically about working with AI code agents. OpenClaw came up in the same conversation. The jaggedness, he said, is really strange. Humans have much less of it. “You’re either on rails of what it was trained for, and everything is like you’re going at the speed of light or you’re not.”
You will experience this. I guarantee it. Your agent will flawlessly synthesize 40 papers into a coherent literature review, correctly identifying methodological tensions you hadn’t noticed. You will be genuinely impressed. Ten minutes later, the same agent will forget what you talked about 5 minutes ago, or will take some (bad) idea that wouldn’t occur to the same competent research assistant, and won’t let it go. Both of these things are true at the same time. That is the deal.
WIRED’s Will Knight ran an OpenClaw for daily tasks. Groceries, email, negotiations. The agent developed small personality quirks, including a persistent fondness for guacamole that kept showing up in grocery orders. Charming, right? Except that same agent eventually turned adversarial in ways Knight didn’t anticipate. The quirks and the risks come from the same source. Autonomous decision-making with imperfect judgment.
Working with an agent is like working with the most brilliant new hire you’ve ever had, who occasionally does something so baffling you have to sit down and stare at the wall for a while. The brilliance is real. The bafflement is also real. You can’t have one without the other. Not yet.
When a chatbot hallucinates, you see bad text on a screen. When an agent hallucinates, it might send an email to the wrong person, or delete a file it shouldn’t have, or spend your money on something you didn’t authorize. The stakes are categorically higher when the AI has hands.
The good news is that this is a managed problem. You set up approval gates: the agent proposes, you approve. You create dedicated accounts so it never touches your real email or real calendar. You set spending limits so a runaway loop can’t cost you more than twenty bucks. You sandbox code execution in Docker so it can’t trash your file system. The community has figured out how to work with the jagged edge rather than getting cut by it, and this blog series will walk you through every step..
And here’s where the car deal comes back. Remember AJ Stuyvenberg, who tasked his agent with buying a Hyundai Palisade? The agent scraped dealer inventories across Massachusetts, filled out contact forms, and then spent several days playing dealers against each other. Forwarding competing PDF quotes from one dealership to the next, asking each to beat the last price. The dealers had no idea they were negotiating with software. Final result: a $4,200 dealer discount, bringing the price to $56,000 against a target of $57,000 and a Massachusetts average of around $58,000. When negotiations reached the point of exchanging credit applications, Stuyvenberg (wisely) took over personally to handle the financial paperwork. (This, predictably, led to a whole thread on crypto Twitter about AI needing “crypto rails,” because of course it did.)
The reason I’m putting this story here and not earlier is that it only makes sense once you’ve absorbed the jagged intelligence problem. Yes, the agent negotiated $4,200 off the price of a car. Also yes, an agent might forget what you agreed on five minutes ago or develop an unexplained fondness for guacamole. Both things are true. Both things are the same technology. The power and the risk are not separable. That’s the deal you’re making when you set one of these up, and hopefully, together, we can figure out how to use these tools for benefit and not a nuisance.
Here’s a question that I find genuinely fascinating, and I think especially interesting if you study institutions for a living (as some of us do): why did an independent developer in Austria build the most consequential AI tool of 2026 instead of Google, or Anthropic, or OpenAI?
Peter himself kept saying this. In his interviews, on the Lex Fridman podcast, he repeatedly expressed surprise that the big labs hadn’t done it first. “I kept thinking I should stop; they’ll ship this any day now.” They didn’t. And I don’t think they will.
The reason is liability.
OpenClaw can execute shell commands on your computer. It can read your email. It can send messages in your name. It can delete files. It can browse the web, fill in forms, and spend your money. One user’s agent ran up a $3,600 API bill overnight. Summer Yue, the Director of Alignment at Meta’s Superintelligence Labs, told her agent to check her inbox and confirm before acting. The agent hit context window limits, auto-compressed its conversation history, and in the process dropped the “confirm before acting” instruction entirely. It then speedrun-deleted over 200 of her emails. Her commands (“Do not do that.” “Stop don’t do anything.” “STOP OPENCLAW.”) were ignored. She had to physically run to her Mac mini to kill the process. Her tweet about it got 9.6 million views. She called it a “rookie mistake.” (The irony of Meta’s alignment director getting misaligned by her own agent was not lost on the internet.)
No publicly traded company with a legal department would ship this product. The liability exposure is enormous. Every one of those incidents is a potential lawsuit, a PR disaster, a congressional hearing. A corporate product would have sandboxed every capability that makes OpenClaw worth using, then marketed the lobotomized version as “safe AI.”
OpenClaw could only have been built by an independent developer, open-sourced under the MIT license, with no corporate entity to sue. The same recklessness that makes it dangerous is what makes it powerful. The same autonomy that lets it negotiate a car deal is what lets it delete an inbox. You can’t have one without the other. A tool this powerful requires care, not a corporate liability shield. You wouldn’t hand a new employee the keys to your house, your bank account, and your email on their first day. (Well, you shouldn’t.) You’d onboard them carefully. Dedicated accounts.
So what are academics actually doing with this thing?
There’s a researcher with an agent monitoring the ArXiv and 40+ journal RSS feeds overnight. Every morning, he gets a briefing: here are the new papers in your subfield, ranked by relevance to your current projects. Open-access PDFs already downloaded, BibTeX already generated. The agent remembers which papers he’s already seen, so no repeats. His daily paper-reading time dropped from several hours to about ninety minutes, and the share of relevant papers he actually catches went way up.
There’s an interesting use case for teaching that I think most people get wrong. The pitch you usually hear is “AI can grade your essays!” which makes every student, every educator, and every administrator I know bristle, and rightly so. That’s not what this is. What some teachers are doing instead is running the agent as an independent audit of their own grading. You grade normally. The agent runs the same rubric independently. Then you look at the disagreements. Where did you score a student lower than the rubric predicts? Where higher? Is there a pattern? Are you unconsciously drifting on a particular criterion? Are you harder on a specific student than you realize? It’s a bias-check tool, not a replacement. You’re still grading. The agent is a second pair of eyes that catches blind spots.
Grant deadline tracking is a natural fit. The agent monitors NIH, NSF, and foundation databases on a schedule, alerts you to matching opportunities, syncs deadlines to your calendar, and nags you about approaching due dates. It also tracks your manuscripts across journals (title, status, reviewer deadline), your peer review commitments (the “nag” skill is perfect for the review you’ve been putting off for three weeks), and your conference expense receipts sorted by grant code. If this sounds like the administrative overhead that eats 40-60% of your working life, it is. It is not grant writing (you know how to write). Grant tracking. Manuscript tracking. Review tracking. Expense tracking. The operational nightmare of academic life, managed by something that never forgets a deadline and never resents the work.
And then there’s dissemination. The part most academics are worst at, not because they can’t write but because the translation from “here’s my finding” to “here’s a 600-word blog post, three platform-native social media posts, a press release, and a policy brief” is an entire second job nobody has time for. You drop a new paper into a folder. Overnight, the agent drafts all of it. It can publish the blog post directly to WordPress. Schedule the tweets. Generate a YouTube script for an explainer video. For most academics, the bottleneck is not writing the paper. It’s everything that happens after the paper is done. What happens when the agent monitors the news cycle and alerts you that your 2024 paper on voter ID laws is relevant to today’s headline, then drafts an op-ed and tells you which editors to pitch?
Matthew Berman, who has spent (by his own accounting in a tweet that went very viral) over 10 billion tokens perfecting his setup, runs 21 daily workflows through OpenClaw. His insight, which I think is exactly right, is that setting up an agent is not a technical task. It’s an onboarding task. You’re not configuring software. You’re training a colleague. You tell it who you are, what you do, how you like to communicate, what it’s allowed to do without asking, and what it should never do. Then you iterate. The best setups grow organically from real friction, not from a configuration manual.
On February 14, 2026, Peter Steinberger announced he was joining OpenAI. But the project is moving to an independent open-source foundation. It’s MIT-licensed. Community-driven. It’s been covered in Scientific American, debated on Hacker News, and dissected on every AI podcast. The community is enormous and growing. It’s not going anywhere. Yet, the voices of academics and the descriptions of academic use cases are few and far between.
The rest of this series is my journey as I set all of this up. Where to run it. What it costs (spoiler: not that much). How to install it. How to give it a personality. How to lock it down so it doesn’t delete your inbox. How to connect your email, your calendar, and your notes. And then, once the foundation is solid, hopefully, how to point it at the specific problems of academic life and watch them get smaller.
So, let’s go.
-
Models (March 2026 Edition) – What LLMs should I use with my OpenClaw
I am adding the date due to the short shelf life nature of the post. New models are coming out every week. The next couple of weeks will not be an exception. Every couple of months, we will have to rewrite this post.
Here’s the deal with OpenClaw and the Large Language Models you should use with it. You should use Claude Opus 4.6. That is the best. It supposedly has the best personality as an assistant, the strongest resistance to prompt-injection attacks, and the highest overall intelligence.
But there’s a catch. In the early days of OpenClaw, people could use their Anthropic Claude Max subscriptions to run it. That meant you pay 90 EUR/$100 USD (or double that if you need more) and you get what, to OpenClaw, appeared to be unlimited LLM access. You just connect your subscription to OpenClaw via what’s often called the OAuth method, and it uses your monthly subscription for AI access. But Anthropic cracked down on this approach, first banning the open-source Claude Code alternative, OpenCode, from accessing Claude LLMs, and then quickly suspending OpenClaw users’ accounts too. Alex Finn claims most people are still using this approach undisturbed. Besides, what’s the worst thing that will happen to you? You’ll have to make a new account with a new email? (This conversation, by the way, is amazing social scientist bait. You should listen despite the length.) Let’s just say I am not going to recommend you try this approach (and I won’t even wink, as Alex did in the interview). Unless I can’t make anything else work reasonably, I am not going to try to use it myself. Of course, you can also pay Anthropic for Claude Opus through their API, but after Matthew Berman ran up a 4-digit (USD, not HUF) bill in a single YouTube live session, I think I am going to stay away from anything along these lines.
So what are the alternatives? There’s, of course, OpenAI’s ChatGPT subscription, and, probably thanks to Peter Steinberger, who, since starting his work on OpenClaw, also joined OpenAI’s ranks, they are not banning OpenClaw users. The problem is that while Peter Steinberger won’t shut up about how wonderful OpenAI’s Codex is as opposed to Claude Code for coding, I have never seen anyone talk enthusiastically about using GPT models in OpenClaw. Alex Finn mentioned that he has GPT5.4 check in every hour on all his agents, making sure they are doing what they are supposed to be doing (and not off on some weird token-hungry and ultimately useless tangent). And this use makes perfect sense to me. He also said this is not very token hungry. He just uses the $20 subscription. I suppose I will be getting that $20 subscriotuib and trying this myself. I suspect Steinberger is working hard on something that will work great for OpenClaw (and agents in general), but if something is coming from OpenAI, we do not have it yet.
So what are the alternatives? There are the Chinese models. And it is hard to say how well these will perform overall, especially as the lead agent. There’s very little info out there beyond how Claude Opus is great, Claude Sonnett is OK, and that’s that. The lack of info about GPT models (especially given the circumstances) screams very loudly, but I do not know what to make of the lack of info about Chinese models. There’s this one tweet by Steinberger soon after Anthropic shut everyone out, saying “Been running Clawd on @MiniMax_AI the last few days after optimizing the implementation, and it’s a really great alternative. Now recommending this over Anthropic.” There’s also a discussion that suggests he tried several models. And since this tweet, Minimax went from M2.1 to M2.5, and now, with some explosive fanfare, they jumped to M2.7. I don’t suppose it got worse. This is what I will be trying first.
Beyond Peter’s personal recommendation (and tweaking to make OpenClaw work better with it), there are a couple of great advantages of MiniMax over other open models. First, they just started a subscription with extremely generous limits that, at least one YouTuber I trust on matters of OpenClaw, who was commissioned to promote it, claims we would be happy to use with OpenClaw even at the lowest price point of $100 a year. (I just signed up for a year myself just to test.) He claims that OpenClaw is simply not that Token- or API-hungry to use up the 1500 model requests / 5 hours, and that the speed is perfectly OK at 50 tokens per second (with 100 tokens per second off-peak). Secondly, Minimax is actually not that big. It has 230b parameters, with only 10b active (read: it is fast and it does not need that much RAM, 256GB is more than enough, and it may even run OK on 192GB). My team can run it locally on our two interconnected DGX Sparks (Asus Ascents, actually) with mixed precision (read: without being too dumbed down) and a near-full context window (read: context window is near its theoretical maximum of around 200k). I don’t know yet how fast, but from what I know of the hardware, I suspect it would be usable and, more importantly, it wouldn’t flinch if a dozen or so people/agents were querying it at the same time, even if the output tokens-per-second weren’t that high. It would be usable for a large lab or a small department full of OpenClaws, with a $7,000–$8,000 hardware investment. Finally, my limited research shows that, when it comes to security (e.g., prompt injection attacks), Minimax is a good model, even compared to some of the larger open cousins.
There are other open models out there that are considered highly capable. Qwen 3.5 has a 397b variant. Kimi K2.5 is a 1t model and is considered outstanding. NVIDIA is now also giving away a free API key that lets you use it. (Use it while it lasts.) Here’s a video showing you how (and yes, he is also pushing Hostinger, but you do not need Hostinger to make this work). There are GLM 4.7 (355b) and GLM5 (745b) models. And there’s a subscription ($84 for the year) that I got to have a backup plan. They also allow OpenClaw OAuth. The limits are brutal (80 prompts per 5h and 400 per week on GPT 4.7, and GPT5 will be half to a third that). I have it for the year. It is a reasonable backup option. Clearly, all of these models are much bigger than Minimax. While you could run some of them on a 512GB Mac Studio (and all of them on two interconnected Mac Studios), those cost $13-14k each. There are some other massively large open models out there, like Meta’s Llama 4 or Mistral’s new Large. But they lag behind on capabilities.
And anything smaller, I wouldn’t mess with unless the whole Internet is lighting up about how awesome it is for some specific task. There’s one exception: the heartbeat of OpenClawl. This is what checks periodically to see if everything is running smoothly and, if not, whether it should be running something. Most people are setting this to run on Claude Haiku or a smaller GPT model. In my personal testing, GPT-OSS-20b appeared to be the most competent at this task, and it can run locally on a 24GB RAM Apple Silicon Mac or on the free Oracle Cloud 4 Amere instance.
So here’s the plan. We are going to test running OpenClaw both with the (cheapest) MiniMax subscription and with the locally running (2 Sparks) mixed prevision Minimax M2.5. We will use GLM as a backup. We will likely add a $20 GPT subscription as oversight and as a heartbeat backup. And we will also use what NVIDIA gives us for our testing. It is worth mentioning that Mistral’s API is free for a long while, as long as you do not have multiple concurrent calls and do not make more than one API call a second. And Mistral models are very good at writing, in general (in case you need an author for some tasks). I think we are in excellent shape to start building this thing, and honestly, you should be too. Between the Mistral and NVidia APIs, you can probably start building with zero investment in the LMM tokens.
-
Hardware – where should I run my OpenClaw
We will need a computer to run OpenClaw. Here are our options with the pros and cons.
Your computer (DON’T!!) – And the advice is that you REALLY should not do this. Giving OpenClaw unrestricted access to your entire computer can quickly turn into a nightmare. So don’t do it. You’ll need a machine where you can better control what OpenClaw has access to and what it doesn’t. While on your regular computer, OpenClaw will be the most capable; the advice is to build up. Treat OpenClaw like an employee. Give it its own hardware. Give it its own accounts. Provide additional access through sharing, forwarding, etc., as you build trust and get to know its capabilities. So we are not doing this. (I am not even sure it would make sense. I am mostly a Chromebook user who organized his life so he can pick up any Chromebook and just use it. I do have a few lying around at the in-laws’, the office, etc. I have no idea how OpenClaw could even run on a Chromebook. It can use its Linux container, of course, but I digress…)
Mac Mini – This is the favorite. And as such, there seems to be a shortage, though it hasn’t reached Hungary just yet, I don’t think. I could have a base model delivered tomorrow, even a bit under list price. But ever since my family’s iCloud accounts got jumbled up (over a decade or so ago), I have been hating Macs hard. (They even mixed all my parents’ photos with mine in Google through some autosync. And my mother photographs like a stereotypical Japanese tourist.) Even though I use an iPad and an M2 Mac Mini (with a lot of storage to locally sync everything I have in various clouds) on my home office desk, I have been doing everything I can to avoid becoming any more of a Mac person than I already am.
But the advantage of using an Apple Silicon Mac is undeniable. You could keep it at home. Home is the best place to access the Internet from. OpenClaw creator Peter Steinberger said, (and yes, I listened to that 3+ hour interview, and you should too) If you access any website from inside a data center, you are going to get more CAPTCHAs and more blocks than if you access it from a residential network. You even get more such things from your University network. I am sure you noticed too. The Internet is getting locked down, and the inevitable reaction to the proliferation of AI agents will be further lockdowns. And while most agents who can use your desktop click through a CAPTCHAs like a human, my (playwright) browser simulator on Linux today got trapped by CAPTCHAs and asked me to log in somewhere and do one part of the job myself as it waited. That’s not ideal when you expect your agents to work while you sleep (or not looking).
At the same time, most new tools are now invariably developed for Macs. Whenever a new AI app comes out, new features are rolled out to older apps; they are invariably first available on Macs. (It is just like how new apps always come to iPhones first, and later, maybe there will be an Android version.) It is also very easy to think through what an AI agent could do on a computer. It is the same thing you can do yourself. You want your agent to be using the various AI tools. You can have your agents use the various apps. Let’s say the agent is living on a Linux server; its computer use will be more limited, and the apps it has access to will be more limited as well. Its browser use will be more curtailed. On a Mac, of course, you have access to the terminal. You have Unix (BSD) running in there. So you have the advantages of both a Unix machine and a Mac. Running OpenClaw on Mac is the clear winner here.
Other Apple Silicon? There’s, of course, the question, if a Mac, which Mac. Most people buy the M4 base model. One of the reasons I could not get myself to do this was that it is quite outdated. There’s an M5 coming soon for sure. Does it matter? No. But I hate paying full price for outdated hardware. So I guess I could go with an older model, buy a used one. But Macs hold their resale value well, and the M4 upgrade included 16GB of RAM by default, whereas that was at a premium on earlier generations. Looking at older model prices, sometimes they made no sense. They were not even discounts spec-per-spec. I tried deal hunting. Nothing made sense. Of course, you could be like my young Russian colleague (who could easily live on the smell of an oily rag) and find a well-used M1 MacBook Air with a broken screen, but honestly…?
There’s also the option of spec-ing up. Macs are great at running LLMs locally. Unfortunately, the models you’d want to use in an OpenClaw setup need a LOT of hardware. Realistically, you can maybe run some routine tasks with the small models (the 20b-35b range), and that is it. Anything usable, like MiniMax M2.5, you will need at least 192GB of RAM, but you are paying with fire under 256GB, and the Mac Studios that come with this much are pricey. Maybe one can make some lower-level tasks work with ~120b models (like Qwen 3.5 122b a10b or Mistral 4 Small 123b), but you’d need at least 96GB of RAM for them. So the bottom line is that (new or used), we are looking at a higher-end Mac Studio before you can run anything worthwhile locally. If you max out the Mac Mini’s RAM (and now we are talking $2000 minimum, probably closer to $2500 to make sense), you are still only at 64GB. MacBooks, Mac Studios, we are talking substantially more. With 64GB, you can run Qwen 3 Next 80b at best. That’s probably not good enough for most things most of the time, and QWEN 3.5 is out already, but not with a model in this size range. Most models are over 100b, which is too big for 64GB, and are under 35b, which require 32GB max, but they won’t be good for much in the world of OpenClaw. Maybe if the M5 Pro Mac Mini will have 96GB of RAM. The M4 Pro MacBooks had a max of 48GB, while the Mac Mini went up to 64GB. Now, the M5 Pro MacBook has a max of 64GB. Maybe the M5 Pro Mac Mini will have more. But I doubt the price will be below $2000. At minimum, I am gonna wait for the M5 Mac Minis.
Whatever hardware – Alex Finn recommends that you run your OpenClaw locally on whatever hardware you have lying around unused. Your last machine is fine. Unfortunately, my last four laptops were all Chromebooks. I have a 2018-ish Surface base model, which I got when COVID lockdowns drove us to use new tools that weren’t immediately available on Chromebooks. But that thing is useless at this point. I also have a 2015 MacBook that can’t even run a current version of Chrome (or anything else, for that matter). But your luck may vary. A Mac that still gets updates or a functional Windows device (in one ever existed – I certainly don’t think so), may be your ticket.
Virtual Private Server – On YouTube, everybody (and their dog, too) pushes Hostinger (and Alex Finn swears none of them use it). I guess it is not a bad solution if you are comfortable running your own server and highly proficient in Linux. While looking for a solution, I searched for cheap VPS options (because I am often too cheap to pay a few bucks a month for Hostinger) and found an Always Free tier at Oracle Cloud Infrastructure for impressively powerful Oracle Ampere servers. You can run Ubuntu on these and get 4 CPUs, 24GB of RAM, and 200GB of storage for absolutely free. With Ollama, you can even run models at an impressive speed without a GPU. My testing suggests that I should just use a locally run GPT-OSS-20b as the OpenClaw heartbeat. There are a few catches. You need to have a credit card to create an account. (Debit cards, temp cards, etc., need not apply.) It is a serious pain in the ass to set up (though a better AI Chat will walk you through it). Only one account allowed per person (and if they catch you cheating thye may delete your machine.) The always-free plan has very few available servers at each location. (And you cannot change locations once you have your account.) I literally had to have Claude Code write a script, try to get one every 10-15 minutes, and it still took over a week. (Yes, that script ran nonstop.) Once you get it, you REALLY have to use it, otherwise they take it away from you. (This is not gonna be a problem if you are running OpenClaw on it, but better be quick to set it up.) You can upgrade your account to pay-as-you-go. Running this server will still be free, and I hope there are no surprise charges. But it is easier to get a slot. (This blog runs on such a server. And my OpenClaw will run on the one “my wife set up”.) My advice is maximum patience if you go this route. If you are in the US, set up in Chicago. Most machines. Multiple sites at one location (us-chicago-1, us-chicago-2, and us-chicago-3). And if you want to do this in the EU, go with Frankfurt. Same story. And if you are outside the US or the EU, set one up near you. Those are usually less oversubscribed. If there are several near you, do some research on which location is bigger with multiple sites.)
In Sum, I have my Oracle Virtual Private Server, and I am a loyal 25-year Linux user who is very comfortable with Linux systems. (Windows Millennium Edition I thank you for pushing me away from Microsoft forever.) Now, with the help of Claude Code, I can even administer my own web server. (Let’s hope my incompetence here will not show as some attacker takes this website down.) I will give this approach a month or two. In the meantime, we may get M5 Mac Minis, and my mother should be getting a new laptop, freeing up an M1 MacBook Air in the family. This is my plan. Take all these considerations and make the wisest choice for yourself.
-
I guess we are doing this.
A little back story. Over Christmas break, like so many people, I was given some extra usage of Claude Code by Anthropic. So I started playing with it. And after a few days of use, I was impressed and wanted to build something difficult.
I was always fascinated by social simulations. (Little-known fact: my MA thesis (2004, Political Science, I also have one in Survey Research from 2010) was an agent-based model simulating societal outcomes of various game-theoretic cooperation strategies.) The idea of having LLMs simulate the agents in simulations seemed intriguing. They can surely act like people. But all examples I have come across were quite rigid with a fixed number of players, very rigid worlds, and very old-school turn-based timing structures. I wanted something a little more free-flowing and a little more flexible. By this time, I have been following The Nerdy Novelist on YouTube and learning about the rhythm of LLM-assisted fiction writing, and I wondered if Sudowrite’s “beats” are possibly a better way to more fluidly handle turns in a social simulation. LLMs can invent dialogue. All we need to know is who is where, who they are with, and what they are doing. At the back of my mind, of course, were social scientific applications: simulations of societal disruptions like a natural disaster, the death of a society member, etc. But the idea could easily have been an art project too, where we just let agents live their lives and see if any interesting story emerges.
At the time, my AI server wasn’t running anything. So it also seemed like a good technical exercise. We have three NVIDIA Quadro RTX 8000 GPUs. I can definitely do this well with a smaller (Ministral 3 14b) model that fits into one of these (even unquantized with a decent context window). Let’s make sure they run parallel, simulating the beats taking place simultaneously. So, early January, right around the release of Claude Code 2.1, I started building.
It was insane. I only had the 17 EUR version of Claude, so for every 45 minutes of use, I waited 4 hours. Still, a few days later, I was simulating space-fantasy stories, urban interactions, and small-village life. I realized I needed to stop this vanity project and start building something for my current research. I had one use case where I needed to do quite slow but repetitive AI synthetic data simulation tasks. I conducted many experiments comparing the performance of various models, quantizations, and simulation approaches. After a quick upgrade to the (cheaper) Claude Max account, the next thing I knew, I was building general tools that went beyond my immediate use cases. (I am still on the cheaper accounts and have not hit a limit yet, though today I got to 99% building this blog.) Soon, I had more data than I knew what to do with, and I still wanted to run more and more case studies. The world felt like it had changed. And I had no idea how much.
Still in January, this was around the time OpenClaw (or ClawdBot, as it was called at the time – definitely not to be confused with Claude or Claude Code) exploded. The little open source AI assistant that stitched together a few AI functionalities and produced magic. (Dangrous magic, but magic nevertheless.) And I wondered if a ClawdBot / Moltbot / OpenClaw could do what Claude Code did for me, supervised, but do it autonomously, handle data generation, run analyses, and organize and visualize the results based on a few already-working examples. If it could, that would be amazing.
Around this time, I had to make a cross-continent move. And I quickly grew wary of all the OpenClaw horror stories: API accounts running up into the 1000s overnight, Anthropic accounts being blocked for terms-of-service violations, and OpenClaw randomly deleting their “watcher’s” emails or accessing their APIs and credit cards without permission. I figured it is better to wait, watch, and learn a little bit, understand the problems, and see if the security of OpenClaw improves. I have done this, and I feel now that it is time. A few days ago, I searched for academic (social-scientific) applications of OpenClaw. I found nothing. So I decided to document the journey.
The question is, can OpenClaw also become a useful research (and academic admin) assistant? Let’s find out. Join me for the journey.