Category: AI Models

Models, providers, and prompts

  • Local AI for your team’s needs

    First attempt to use OpenClaw with a locally run model (Minimax M2.5) was successful. It’s not my claw, Bruce, just yet. My team’s young colleague actually beat me in spinning up his Claw. We have two NVIDIA DGX Spark (like, Asus Ascent GX10) devices connected via a single half-meter QSFP56 cable. It is running the default system, just updated. We set up the connection between the two in the official way. We are running vLLM across the two, except that the vLLM container is updated to the newest 26.02 version. (I can’t wait for 26.03 or 04 as that will be Antropic API endpoint compatible with the Claude Code bugs fixed, and we will also be able to test QWEN 3.5 397b with it). We loaded this up with Mamy Ratsimbazafy’s (mratsim’s) (and thanks for the good work if you are reading this – it has been extremely helpful for us) BF16-INT4-AWQ mixed precision quantization of Minimax M2.5. (This mixed precision is supposed to be the best balance between size, speed, and performance degradation due to quantization.) We have a full context of ~200k and even KV cache to spare for parallelization. And it works. We tested GPT-OSS-120b, Nemotron 3 Super 120b, and even Kimi K2.5 served by NVIDIA’s free API, and none of them impressed. Minimax does. It is more than fast enough for the Claw. We are happy with the results. We didn’t compare with Claude Opus or Sonnet 4.6, but Peter Steinberger endorsed Minimax when Claude started making OpenClaw use difficult through their subscription. And Minimax, with its 230b parameters (10 active), is certainly more manageable than Kimi’s 1t, GLM’s 745b, or even QWEN’s 397b parameters. It can easily run on two sparks with KV Cache to spare, even with full context.

    We’ll need to do more testing before this approach will have my full endorsement. But you can run this locally. It’s reasonably fast. It has strong parallelized performance, so more than one person (I am guessing up to a dozen) can use it at the same time without noticeable performance degradation. (I can’t wait to test it with Claude Code to see if it is fast enough for that.) You could spend the 7000-8000 (EUR or USD) on the hardware and serve Minimax for your whole team’s Claws, their coding, and chat needs. If this turns out to be as good as it sounds, I will buy a third Spark for day-to-day experimentation and dedicate these two to serving Minimax (or whatever the best model is that runs on the duo).

  • Models (March 2026 Edition) – What LLMs should I use with my OpenClaw

    I am adding the date due to the short shelf life nature of the post. New models are coming out every week. The next couple of weeks will not be an exception. Every couple of months, we will have to rewrite this post.

    Here’s the deal with OpenClaw and the Large Language Models you should use with it. You should use Claude Opus 4.6. That is the best. It supposedly has the best personality as an assistant, the strongest resistance to prompt-injection attacks, and the highest overall intelligence.

    But there’s a catch. In the early days of OpenClaw, people could use their Anthropic Claude Max subscriptions to run it. That meant you pay 90 EUR/$100 USD (or double that if you need more) and you get what, to OpenClaw, appeared to be unlimited LLM access. You just connect your subscription to OpenClaw via what’s often called the OAuth method, and it uses your monthly subscription for AI access. But Anthropic cracked down on this approach, first banning the open-source Claude Code alternative, OpenCode, from accessing Claude LLMs, and then quickly suspending OpenClaw users’ accounts too. Alex Finn claims most people are still using this approach undisturbed. Besides, what’s the worst thing that will happen to you? You’ll have to make a new account with a new email? (This conversation, by the way, is amazing social scientist bait. You should listen despite the length.) Let’s just say I am not going to recommend you try this approach (and I won’t even wink, as Alex did in the interview). Unless I can’t make anything else work reasonably, I am not going to try to use it myself. Of course, you can also pay Anthropic for Claude Opus through their API, but after Matthew Berman ran up a 4-digit (USD, not HUF) bill in a single YouTube live session, I think I am going to stay away from anything along these lines.

    So what are the alternatives? There’s, of course, OpenAI’s ChatGPT subscription, and, probably thanks to Peter Steinberger, who, since starting his work on OpenClaw, also joined OpenAI’s ranks, they are not banning OpenClaw users. The problem is that while Peter Steinberger won’t shut up about how wonderful OpenAI’s Codex is as opposed to Claude Code for coding, I have never seen anyone talk enthusiastically about using GPT models in OpenClaw. Alex Finn mentioned that he has GPT5.4 check in every hour on all his agents, making sure they are doing what they are supposed to be doing (and not off on some weird token-hungry and ultimately useless tangent). And this use makes perfect sense to me. He also said this is not very token hungry. He just uses the $20 subscription. I suppose I will be getting that $20 subscriotuib and trying this myself. I suspect Steinberger is working hard on something that will work great for OpenClaw (and agents in general), but if something is coming from OpenAI, we do not have it yet.

    So what are the alternatives? There are the Chinese models. And it is hard to say how well these will perform overall, especially as the lead agent. There’s very little info out there beyond how Claude Opus is great, Claude Sonnett is OK, and that’s that. The lack of info about GPT models (especially given the circumstances) screams very loudly, but I do not know what to make of the lack of info about Chinese models. There’s this one tweet by Steinberger soon after Anthropic shut everyone out, saying “Been running Clawd on  @MiniMax_AI the last few days after optimizing the implementation, and it’s a really great alternative. Now recommending this over Anthropic.” There’s also a discussion that suggests he tried several models. And since this tweet, Minimax went from M2.1 to M2.5, and now, with some explosive fanfare, they jumped to M2.7. I don’t suppose it got worse. This is what I will be trying first.

    Beyond Peter’s personal recommendation (and tweaking to make OpenClaw work better with it), there are a couple of great advantages of MiniMax over other open models. First, they just started a subscription with extremely generous limits that, at least one YouTuber I trust on matters of OpenClaw, who was commissioned to promote it, claims we would be happy to use with OpenClaw even at the lowest price point of $100 a year. (I just signed up for a year myself just to test.) He claims that OpenClaw is simply not that Token- or API-hungry to use up the 1500 model requests / 5 hours, and that the speed is perfectly OK at 50 tokens per second (with 100 tokens per second off-peak). Secondly, Minimax is actually not that big. It has 230b parameters, with only 10b active (read: it is fast and it does not need that much RAM, 256GB is more than enough, and it may even run OK on 192GB). My team can run it locally on our two interconnected DGX Sparks (Asus Ascents, actually) with mixed precision (read: without being too dumbed down) and a near-full context window (read: context window is near its theoretical maximum of around 200k). I don’t know yet how fast, but from what I know of the hardware, I suspect it would be usable and, more importantly, it wouldn’t flinch if a dozen or so people/agents were querying it at the same time, even if the output tokens-per-second weren’t that high. It would be usable for a large lab or a small department full of OpenClaws, with a $7,000–$8,000 hardware investment. Finally, my limited research shows that, when it comes to security (e.g., prompt injection attacks), Minimax is a good model, even compared to some of the larger open cousins.

    There are other open models out there that are considered highly capable. Qwen 3.5 has a 397b variant. Kimi K2.5 is a 1t model and is considered outstanding. NVIDIA is now also giving away a free API key that lets you use it. (Use it while it lasts.) Here’s a video showing you how (and yes, he is also pushing Hostinger, but you do not need Hostinger to make this work). There are GLM 4.7 (355b) and GLM5 (745b) models. And there’s a subscription ($84 for the year) that I got to have a backup plan. They also allow OpenClaw OAuth. The limits are brutal (80 prompts per 5h and 400 per week on GPT 4.7, and GPT5 will be half to a third that). I have it for the year. It is a reasonable backup option. Clearly, all of these models are much bigger than Minimax. While you could run some of them on a 512GB Mac Studio (and all of them on two interconnected Mac Studios), those cost $13-14k each. There are some other massively large open models out there, like Meta’s Llama 4 or Mistral’s new Large. But they lag behind on capabilities.

    And anything smaller, I wouldn’t mess with unless the whole Internet is lighting up about how awesome it is for some specific task. There’s one exception: the heartbeat of OpenClawl. This is what checks periodically to see if everything is running smoothly and, if not, whether it should be running something. Most people are setting this to run on Claude Haiku or a smaller GPT model.  In my personal testing, GPT-OSS-20b appeared to be the most competent at this task, and it can run locally on a 24GB RAM Apple Silicon Mac or on the free Oracle Cloud 4 Amere instance.

    So here’s the plan. We are going to test running OpenClaw both with the (cheapest) MiniMax subscription and with the locally running (2 Sparks) mixed prevision Minimax M2.5. We will use GLM as a backup. We will likely add a $20 GPT subscription as oversight and as a heartbeat backup. And we will also use what NVIDIA gives us for our testing. It is worth mentioning that Mistral’s API is free for a long while, as long as you do not have multiple concurrent calls and do not make more than one API call a second. And Mistral models are very good at writing, in general (in case you need an author for some tasks). I think we are in excellent shape to start building this thing, and honestly, you should be too. Between the Mistral and NVidia APIs, you can probably start building with zero investment in the LMM tokens.