How the Hermes Agent self-improving loop works (and why it is changing AI agents)

May 25, 20268 min readClaudio Branno

Hermes overtook OpenClaw on Open Router in under 90 days. The do-learn-improve loop, Tenacity release, and three-layer memory explain why.

In May 2026, Hermes Agent did something most observers did not expect: it overtook OpenClaw as the most-used AI agent on Open Router's global daily rankings. 224 billion tokens per day versus OpenClaw's 186 billion. The project had launched in February. It reached the top of the leaderboard in under 90 days.

OpenClaw had everything: first-mover advantage, a massive skill ecosystem, 370,000+ GitHub stars, and years of community momentum. Hermes had none of that. What it had was a self-improving loop. And that was enough.

Here is what that loop is, how it works technically, and why it is a fundamentally different approach to AI agents.

The core problem with most AI agents

Most AI agents treat every task as a fresh start. You give them a task, they execute it, and the work disappears. Tomorrow you give them a similar task and they start from scratch again. The AI is smart, but it does not compound. Every interaction is independent.

This creates a ceiling. You get good results when you prompt well, and mediocre results when you do not. The agent never gets better at your specific workflows. It is a capable tool, not a developing employee.

What Hermes does differently: Do → Learn → Improve

Hermes is built around what its creators at Noose Research call the do-learn-improve loop. It sounds simple. The implications are significant.

Do. You give Hermes a task. It executes it using the available tools — file editing, web search, email, code, whatever the task requires. This part looks like any other AI agent.

Learn. After it completes the task, Hermes reflects on what happened. It does not just discard the experience. It asks: what was the useful pattern here? What procedure did I follow? What could be reused?

Improve. It writes a skill file. A markdown file containing the reusable procedure for this type of task. The next time you give it a similar task, it triggers that skill first instead of reasoning from scratch. The task gets faster. The output gets better. The agent gets more specific to your work.

This is the loop. Each iteration makes the next one better. Over time, Hermes builds a library of skills specific to you, your workflows, your preferences, and your business. It becomes harder to replace every time you use it.

What this looks like in practice

When you watch Hermes execute a task, you can see this happening. It fetches the relevant skill, executes the task, then reviews what it did, writes improvements back to the skill file, and saves the updated version.

Users who have set up Hermes for content workflows report that after a few sessions, it stops asking how to structure a transcript analysis and just does it the way they prefer. After a few weeks of email drafting, it stops asking about tone and length. It has learned their voice.

This is the kind of behavior you would normally only get from a well-onboarded human assistant. Hermes gets there through repetition and skill file refinement.

The Tenacity release: what made it production-ready

The self-improving loop is not new to Hermes. But the May 2026 Tenacity release made it substantially more robust. 864 commits, 588 merged pull requests, contributions from 295 developers in one week. This was not a patch.

Key additions that matter for the learn-improve loop:

Durable multi-agent Kanban system: Hermes can now manage tasks across multiple agents with heartbeat monitoring, retry budgets, and zombie worker recovery. Long-running tasks do not silently fail anymore.
/goal command: Keeps the agent locked on a long-term objective across turns. Agents often get distracted by intermediate steps. A persistent goal prevents that drift.
Session recall: Hermes can recall every session you have ever had with it — what you worked on on any specific date, what decisions were made, what tasks ran. Without using AI tokens. Programmatic retrieval.
Background tasks: True multitasking without spinning up multiple agents. You can give it 5 background tasks and still have a conversation with it while they run.

The three-layer memory system

The skill files are only one layer. Hermes runs a three-layer memory architecture:

Session memory: What is happening in the current work session.
Episodic memory: Past sessions, stored via SQLite, searchable by date and content. This is what powers session recall.
Procedural memory: The skill files. Reusable patterns extracted from past tasks.

Together, these three layers give Hermes a memory that spans sessions, builds over time, and improves through use. Most AI agents have session memory only. Some have basic long-term memory through vector stores. None have the procedural layer that automatically writes reusable skills.

Why it runs on your own infrastructure

Hermes is local-first. It runs on your own machine, server, VPS, or cloud environment. MIT licensed. No forced cloud lock-in. No dependency on a single model provider. It works with Anthropic, OpenAI, Grok, Kimmy, GLM, Ollama, local models, and custom endpoints.

This matters for the self-improving loop because your skill files are yours. They live on your machine. They accumulate over months. They become a library of reusable intelligence built from your actual work. If Hermes ever shut down, your skills do not disappear.

The local-first design is also why it is spreading so fast. Developers want an agent that lives on their own infrastructure. Your data stays under your control. The accumulated intelligence stays with you.

How this compares to OpenClaw

OpenClaw has a massive skill ecosystem built by its community. The skills are good. But they are created upfront by humans, and they are general-purpose. They do not adapt to your specific workflows.

Hermes creates skills from your actual work. They are specific to how you do things, not how the average user does things. The more niche your workflows, the more pronounced the advantage becomes.

OpenClaw's model is build-and-share. Hermes's model is use-and-refine. Both are valid. But for users who want an agent that increasingly understands their specific work, Hermes's approach compounds in a way OpenClaw's does not.

What this means if you are setting up an agent now

The self-improving loop changes how you should think about setup. The early sessions are an investment. You are seeding the skill library. The more varied and representative tasks you give it early, the faster it builds reusable patterns.

The practical implication: do not cherry-pick easy tasks for the first few weeks. Give it your real workflows, even if the early results are not perfect. Each imperfect execution becomes a better skill file. Each better skill file makes the next execution sharper.

At month one, you have an agent that is decent at your workflows. At month three, you have an agent that is better at your workflows than most tools are at anyone's. At six months, you have something genuinely difficult to replace.

That is the compounding effect. And it starts on day one.