Building – Academic Lobster

First attempt to use OpenClaw with a locally run model (Minimax M2.5) was successful. It’s not my claw, Bruce, just yet. My team’s young colleague actually beat me in spinning up his Claw. We have two NVIDIA DGX Spark (like, Asus Ascent GX10) devices connected via a single half-meter QSFP56 cable. It is running the default system, just updated. We set up the connection between the two in the official way. We are running vLLM across the two, except that the vLLM container is updated to the newest 26.02 version. (I can’t wait for 26.03 or 04 as that will be Antropic API endpoint compatible with the Claude Code bugs fixed, and we will also be able to test QWEN 3.5 397b with it). We loaded this up with Mamy Ratsimbazafy’s (mratsim’s) (and thanks for the good work if you are reading this – it has been extremely helpful for us) BF16-INT4-AWQ mixed precision quantization of Minimax M2.5. (This mixed precision is supposed to be the best balance between size, speed, and performance degradation due to quantization.) We have a full context of ~200k and even KV cache to spare for parallelization. And it works. We tested GPT-OSS-120b, Nemotron 3 Super 120b, and even Kimi K2.5 served by NVIDIA’s free API, and none of them impressed. Minimax does. It is more than fast enough for the Claw. We are happy with the results. We didn’t compare with Claude Opus or Sonnet 4.6, but Peter Steinberger endorsed Minimax when Claude started making OpenClaw use difficult through their subscription. And Minimax, with its 230b parameters (10 active), is certainly more manageable than Kimi’s 1t, GLM’s 745b, or even QWEN’s 397b parameters. It can easily run on two sparks with KV Cache to spare, even with full context.

We’ll need to do more testing before this approach will have my full endorsement. But you can run this locally. It’s reasonably fast. It has strong parallelized performance, so more than one person (I am guessing up to a dozen) can use it at the same time without noticeable performance degradation. (I can’t wait to test it with Claude Code to see if it is fast enough for that.) You could spend the 7000-8000 (EUR or USD) on the hardware and serve Minimax for your whole team’s Claws, their coding, and chat needs. If this turns out to be as good as it sounds, I will buy a third Spark for day-to-day experimentation and dedicate these two to serving Minimax (or whatever the best model is that runs on the duo).

Category: Building

OK, here we go

Local AI for your team’s needs