Smarter Tools for a Smarter You.

Discover the best AI & productivity tools with utilo

Hermes Agent Review 2026: The Self-Improving AI Agent That Actually Remembers You

Hermes Agent Deep Dive 2026: The Self Improving AI Agent That Actually Remembers You You've probably seen Hermes Agent on GitHub Trending — 73,000+ stars,...

Utilo Team

4/15/2026

#hermes-agent#ai-agent#review#open-source#nous-research
Hermes Agent Review 2026: The Self-Improving AI Agent That Actually Remembers You

Hermes Agent Deep Dive 2026: The Self-Improving AI Agent That Actually Remembers You

You've probably seen Hermes Agent on GitHub Trending — 73,000+ stars, climbing fast. Built by Nous Research (the lab behind the Hermes and Nomos model families), it's an open-source AI agent that runs on your own hardware. Not a chatbot wrapper. Not an IDE plugin. A full autonomous agent with memory, scheduling, tool use, and a learning loop that gets better the longer it runs.

This isn't a press-release recap. This is a practical deep dive: what Hermes actually does, how to set it up, what works well, what doesn't, and whether it's worth your time. Every feature described here comes with a real usage scenario you can try today.


What Hermes Agent Actually Is

Hermes Agent is a self-hosted AI agent that lives on your server (or laptop, or a $5 VPS) and talks to you through a terminal, Telegram, Discord, Slack, WhatsApp, Signal — 15+ platforms from a single gateway process. It uses whatever LLM you point it at: OpenAI, Anthropic, DeepSeek, Nous Portal, OpenRouter with 200+ models, or your own local endpoint.

The pitch that makes it different from "yet another agent framework": it has a closed learning loop. It remembers things across sessions, creates reusable skills from complex tasks, improves those skills during use, and builds a profile of who you are over time. Most agents start fresh every conversation. Hermes accumulates context.

It's MIT-licensed, which matters if you're building on top of it.

Key numbers:

  • 73,600+ GitHub stars (as of April 2026)
  • 647 skills across 4 registries (79 built-in, 47 optional, 521 community-contributed)
  • 15+ messaging platforms supported from one gateway
  • 6 terminal backends: local, Docker, SSH, Daytona, Singularity, Modal
  • Minimum context requirement: 64K tokens (models below this are rejected at startup)

Installation: 60 Seconds, No Joke

# One-line install — Linux, macOS, WSL2, even Android via Termux
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Reload your shell
source ~/.bashrc  # or: source ~/.zshrc

# Start chatting
hermes

The installer handles everything: Python 3.11 (via uv, no sudo), Node.js v22, ripgrep, ffmpeg. The only prerequisite is git.

After install, you get a set of CLI commands that cover most configuration:

hermes model    # Pick your LLM provider interactively
hermes tools    # Enable/disable tool groups
hermes setup    # Full setup wizard (does everything at once)
hermes gateway  # Start the messaging gateway
hermes doctor   # Diagnose issues
hermes update   # Update to latest version

Works on Android too. Termux gets a dedicated install path with a curated .[termux] extra that skips Android-incompatible voice dependencies. You can literally run an AI agent from your phone.


Choosing a Model Provider

Hermes doesn't lock you into any provider. Run hermes model and pick from the list:

ProviderWhat It IsAuth Method
Nous PortalSubscription, zero-configOAuth login
OpenAI CodexChatGPT OAuth, Codex modelsDevice code auth
AnthropicClaude models directlyClaude Code auth or API key
OpenRouter200+ models, multi-providerAPI key
DeepSeekDirect APIAPI key
GitHub CopilotGPT-5.x, Claude, Gemini via CopilotOAuth
Hugging Face20+ open modelsHF_TOKEN
Custom EndpointVLLM, SGLang, Ollama, any OpenAI-compatibleBase URL + key

Plus: Z.AI/GLM, Kimi/Moonshot, MiniMax, Alibaba Cloud/DashScope, Arcee AI, and more.

The 64K rule: Hermes requires at least 64,000 tokens of context. Models with less get rejected at startup. This makes sense — multi-step tool-calling workflows eat context fast, and a small window means the agent loses track of what it's doing halfway through a task. If you're running a local model, set --ctx-size 65536 or higher.

Switch providers any time with hermes model. No code changes, no lock-in.


The Memory System: Small, Bounded, and Deliberate

This is where Hermes diverges from most agent frameworks. Instead of dumping everything into a vector database, Hermes uses two tiny, character-limited files:

FilePurposeLimit
MEMORY.mdAgent's notes — environment facts, conventions, lessons learned2,200 chars (~800 tokens)
USER.mdUser profile — your preferences, communication style1,375 chars (~500 tokens)

Both live in ~/.hermes/memories/ and get injected into the system prompt as a frozen snapshot at session start.

How memory actually works in practice:

══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§This machine runs Ubuntu 22.04, has Docker and Podman installed
§User prefers concise responses, dislikes verbose explanations

The agent manages its own memory through three actions:

  • add — Store a new fact
  • replace — Update an existing entry (substring matching)
  • remove — Delete something no longer relevant

The frozen snapshot catch: When Hermes writes to memory during a session, changes are persisted to disk immediately — but they won't appear in the system prompt until the next session starts. This is intentional (preserves LLM prefix cache for performance), but it means the agent might "forget" something it just learned if you keep talking in the same session.

When memory fills up, the agent gets an error with current entries and usage stats, then has to consolidate or replace entries to make room. It's like a human with a notebook that only has 15 lines — you learn to be selective about what you write down.

What to save vs skip:

  • ✅ Save: User preferences, environment facts, project conventions, corrections, workflow patterns
  • ❌ Skip: Trivial info, easily-searchable facts, large code blocks, session-specific temp data

This bounded approach is refreshing. Most agent memory systems either have no limits (and fill up with noise) or use vector retrieval (which hallucinates relevance). Hermes forces discipline.


Skills: Procedural Memory the Agent Creates Itself

Skills are Hermes's answer to "how do you get better at recurring tasks?" When the agent completes something complex, it can create a skill — essentially a SKILL.md file with instructions for next time. Skills self-improve during use.

The ecosystem is surprisingly large: 647 skills across 4 registries. The built-in ones cover:

  • Coding agents: Claude Code, Codex, OpenCode delegation
  • Creative tools: ASCII art, p5.js generative art, Manim math animations, Excalidraw diagrams, architecture diagrams
  • Platform integrations: Apple Notes, Apple Reminders, FindMy, iMessage
  • Fun stuff: Minecraft modpack server setup, Pokemon player (yes, it plays Pokemon autonomously via headless emulation)
  • Design: 54 production-quality design system templates extracted from real websites (Stripe, Linear, Vercel, Notion, Airbnb…)

Skills follow the agentskills.io open standard, so they're portable and community-shareable.

Real scenario: You ask Hermes to set up a Docker Compose stack for a Postgres + Redis + Node app. It does it, then creates a skill called "docker-compose-setup" with the template, common gotchas, and port conventions it discovered. Next time you ask for a similar stack, it loads the skill and gets it done in half the steps.


Tools: 47 Built-in, Organized by Category

Hermes ships with a broad tool registry. You enable/disable groups with hermes tools:

CategoryExamplesWhat For
Webweb_search, web_extractSearch and scrape the web
Terminal & Filesterminal, process, read_file, patchRun commands, edit files
Browserbrowser_navigate, browser_snapshot, browser_visionFull browser automation
Mediavision_analyze, image_generate, text_to_speechImage analysis, generation, TTS
Agent orchestrationtodo, execute_code, delegate_taskPlanning, subagents, code execution
Memory & recallmemory, session_searchPersistent memory, cross-session search
Automationcronjob, send_messageScheduled tasks, outbound messaging

Quick enable/disable:

# Start with only web and terminal tools
hermes chat --toolsets "web,terminal"

# Or configure interactively
hermes tools

Terminal Backends: Run Anywhere, Safely

This is one of Hermes's strongest practical features. You can choose where the agent's terminal commands actually execute:

BackendUse Case
localDefault — runs on your machine
dockerIsolated containers — safe for untrusted tasks
sshRemote server — agent can't touch its own code
daytonaCloud sandbox — persistent, hibernates when idle
modalServerless — scaled, pay-per-use
singularityHPC containers — rootless cluster computing
# ~/.hermes/config.yaml
terminal:
  backend: docker
  docker_image: python:3.11-slim
  container_persistent: true  # packages survive across sessions
  container_cpu: 1
  container_memory: 5120      # 5GB

The SSH backend is the security sweet spot: the agent works on a remote machine and literally cannot modify its own code or config. Container backends (Docker, Singularity, Modal) add further hardening: read-only root filesystem, all Linux capabilities dropped, no privilege escalation, PID limits, full namespace isolation.

Practical tip: If you're running Hermes on a VPS and giving it real tasks, start with docker backend. If you trust the tasks but want separation, use ssh. Only use local for development or tasks you'd run yourself.


Cron: Built-in Scheduled Automation

Hermes has a built-in cron scheduler. No external tools needed. Create jobs in natural language or cron expressions, and results get delivered to any messaging platform.

# From the chat
/cron add "every 6h" "Check GitHub trending repos in Python and summarize the top 5 new ones. If nothing interesting, respond with [SILENT]." --name "GitHub watcher" --deliver telegram

# From the CLI
hermes cron create "0 9 * * 1" \
  "Generate a weekly report of top AI news, trending ML repos, and most-discussed HN posts." \
  --name "Weekly AI digest" \
  --deliver telegram

The critical thing to understand: Cron jobs run in fresh agent sessions with no memory of your current chat. Prompts must be completely self-contained. This trips people up — they write a cron prompt like "do that thing we discussed" and wonder why the agent has no idea what they mean.

The --script parameter is the power move. You can attach a Python script that runs before each execution. Its stdout becomes context for the agent:

# ~/.hermes/scripts/watch-site.py
import hashlib, json, os, urllib.request

URL = "https://example.com/pricing"
STATE_FILE = os.path.expanduser("~/.hermes/scripts/.watch-state.json")

content = urllib.request.urlopen(URL, timeout=30).read().decode()
current_hash = hashlib.sha256(content.encode()).hexdigest()

# Load previous state
prev_hash = None
if os.path.exists(STATE_FILE):
    with open(STATE_FILE) as f:
        prev_hash = json.load(f).get("hash")

# Save current state
with open(STATE_FILE, "w") as f:
    json.dump({"hash": current_hash, "url": URL}, f)

if prev_hash and prev_hash != current_hash:
    print(f"CHANGE DETECTED on {URL}")
    print(f"Content preview:\n{content[:2000]}")
else:
    print("NO_CHANGE")
/cron add "every 1h" "If script says CHANGE DETECTED, summarize what changed. If NO_CHANGE, respond with [SILENT]." --script ~/.hermes/scripts/watch-site.py --name "Pricing monitor" --deliver telegram

The [SILENT] trick: When the agent's response contains [SILENT], delivery is suppressed. You only get notified when something actually happens. No spam.


Messaging Gateway: Talk to It from Your Phone

hermes gateway setup  # interactive — picks your platform
hermes gateway        # starts the gateway process

Hermes supports 15+ messaging platforms from one gateway: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, BlueBubbles, Home Assistant, and Open WebUI.

Telegram setup example (the most common):

  1. Create a bot via @BotFather (/newbot)
  2. Get your user ID via @userinfobot
  3. Run hermes gateway setup, select Telegram, paste token and user ID
  4. Start the gateway: hermes gateway

That's it. Now you can chat with your agent from your phone while it works on your server.

Voice memos work too — send a voice message on Telegram, Hermes auto-transcribes it with faster-whisper (runs locally, free) and responds to the text.

Group chat tip: Telegram bots have privacy mode enabled by default — the bot only sees /commands and direct replies. To let it see all messages in a group, either disable privacy mode in BotFather or promote the bot to admin.


MCP Integration: Extend with External Tools

Hermes supports the Model Context Protocol (MCP) — connect to any MCP server to add tools:

# ~/.hermes/config.yaml
mcp:
  servers:
    - name: "github"
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-github"]
      env:
        GITHUB_TOKEN: "your-token"

MCP tools show up alongside built-in tools. You can filter which MCP tools the agent can use to avoid tool overload.


Security: Seven Layers Deep

Hermes has a genuine defense-in-depth model, not just "we added an approval prompt":

  1. User authorization — allowlists control who can talk to the agent
  2. Dangerous command approval — human-in-the-loop for destructive operations (rm -rf, chmod 777, etc.)
  3. Container isolation — Docker/Singularity/Modal with hardened settings
  4. MCP credential filtering — env var isolation for MCP subprocesses
  5. Context file scanning — prompt injection detection in project files
  6. Cross-session isolation — sessions can't access each other's data
  7. Input sanitization — working directory parameters validated against allowlist

Approval modes:

# ~/.hermes/config.yaml
approvals:
  mode: manual   # manual | smart | off
  timeout: 60    # seconds before auto-deny
  • manual (default): Always asks before dangerous commands
  • smart: Uses an auxiliary LLM to assess risk — auto-approves low-risk, auto-denies dangerous, escalates uncertain
  • off / --yolo: Bypasses all checks. Use in CI/CD or disposable containers only.

The timeout is fail-closed: If you don't respond within 60 seconds, the command is denied. Not approved. This is the right default.


Subagents: Delegate and Parallelize

Hermes can spawn isolated subagents for parallel workstreams:

❯ Research these three topics simultaneously:
  1. Latest Rust async runtime benchmarks
  2. PostgreSQL 17 new features
  3. Best practices for LLM caching in production

Each subagent gets its own session, tools, and context. Results come back to the parent. This is useful for tasks that are naturally parallel — research, batch processing, multi-repo operations.

You can also use execute_code to write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.


Real Drawbacks (The Honest Part)

Every review that only says good things is useless. Here's what actually hurts:

1. Memory Is Tiny and Requires Active Management

2,200 characters for agent memory. 1,375 for user profile. That's roughly 20 short entries total. For a personal assistant that's supposed to "grow with you," hitting the ceiling is frustratingly fast. You'll find the agent spending turns consolidating and replacing memory entries instead of doing actual work. The bounded approach is philosophically sound, but in practice it means the agent forgets things you wish it hadn't.

2. The Frozen Snapshot Creates a "Memory Lag"

Memory changes during a session only take effect next session. This means if you tell the agent "remember I switched to PostgreSQL 17," it writes it to disk — but if you ask about your database setup later in the same conversation, the system prompt still shows the old info. The agent can check live state via tool responses, but it doesn't always think to. This leads to confusing moments where the agent seems to have forgotten what you just told it.

3. Cron Prompts Must Be Fully Self-Contained

Every cron job runs in a blank session. No memory, no conversation history, no context from previous runs. This means your cron prompts need to spell out everything — what to do, how to do it, what output format to use, where to deliver. Writing good cron prompts is its own skill, and the first few attempts usually produce useless results because people underspecify.

4. 64K Context Minimum Locks Out Smaller Local Models

If you want to run fully local with a 7B or 13B model, you're likely out of luck unless you can afford the RAM for 64K context. This is a reasonable engineering decision (small context = broken agent loops), but it means Hermes isn't truly "runs on anything" — it runs on anything that can serve a 64K-context model.

5. Gateway Restart Drops Connections

If you need to restart the gateway (update, config change, crash recovery), all active messaging sessions disconnect. There's no graceful handoff. Users on Telegram/Discord just see the bot go silent, then come back. For personal use this is fine; for team deployments it's a rough edge.


Where Hermes Fits: 3 Quick Comparisons

These aren't full reviews — just positioning notes so you know when to pick what.

Hermes vs OpenClaw: Both are self-hosted personal AI agents with messaging gateway, cron, memory, and tool use. OpenClaw is Node.js-based with a focus on channel diversity and plugin architecture. Hermes is Python-based with a focus on the learning loop (skills, self-improvement) and research-readiness (trajectory export, RL training). If you want "growing agent intelligence," lean Hermes. If you want "stable message routing across 15 platforms with extensive plugin ecosystem," lean OpenClaw.

Hermes vs LangGraph: LangGraph is a framework for building agent workflows — you write the graph, define the nodes, handle the state. Hermes is a ready-to-use agent — install and chat. If you need custom multi-agent orchestration for a product, use LangGraph. If you need a personal agent that works out of the box, use Hermes.

Hermes vs CrewAI: CrewAI focuses on multi-agent role-playing ("researcher," "writer," "editor" agents collaborating). Hermes is a single agent with subagent delegation. CrewAI is better for predefined team workflows. Hermes is better for open-ended personal assistance where the task isn't known in advance.


Quick Reference Cheat Sheet

Essential Commands

hermes                    # Start chatting
hermes model              # Switch LLM provider
hermes tools              # Enable/disable toolsets
hermes gateway setup      # Configure messaging platforms
hermes gateway            # Start the messaging gateway
hermes cron list          # List scheduled jobs
hermes config set KEY VAL # Set a config value
hermes doctor             # Diagnose issues
hermes update             # Update to latest
hermes --continue         # Resume last session
hermes --yolo             # Bypass command approval (careful!)
# ~/.hermes/config.yaml

# Use Docker for safety
terminal:
  backend: docker
  docker_image: python:3.11-slim
  container_persistent: true

# Keep approval prompts on
approvals:
  mode: manual
  timeout: 60

Common Gotchas

ProblemCauseFix
Agent ignores memory you just addedFrozen snapshot — memory only loads at session startStart a new session (hermes)
Cron job produces garbage outputPrompt isn't self-containedSpell out everything in the cron prompt
Bot doesn't see group messagesTelegram privacy modeDisable in BotFather, then re-add bot to group
Model rejected at startupContext window < 64KUse a larger model or increase --ctx-size
hermes: command not found after installShell not reloadedRun source ~/.bashrc

Bottom Line

Hermes Agent is the most complete open-source personal AI agent available in April 2026. The learning loop (memory + skills + user modeling) is genuinely novel — most competing agents don't even attempt cross-session improvement. The 6 terminal backends give you real deployment flexibility. The 647-skill ecosystem means you're not starting from zero.

The tradeoffs are real: tiny memory limits, frozen snapshot lag, cron prompt overhead, and a 64K context floor. But these are engineering choices, not bugs — they keep the system bounded and predictable.

If you want an AI agent that lives on your server, talks to you from Telegram, runs scheduled tasks, and actually gets better over time — Hermes is the one to try. Install takes 60 seconds. You'll know within an hour whether it fits your workflow.

Links: