Hermes Agent Review 2026: The Self-Improving AI Agent That Actually Remembers You
Hermes Agent Deep Dive 2026: The Self Improving AI Agent That Actually Remembers You You've probably seen Hermes Agent on GitHub Trending — 73,000+ stars,...
Utilo Team
4/15/2026

Hermes Agent Deep Dive 2026: The Self-Improving AI Agent That Actually Remembers You
You've probably seen Hermes Agent on GitHub Trending — 73,000+ stars, climbing fast. Built by Nous Research (the lab behind the Hermes and Nomos model families), it's an open-source AI agent that runs on your own hardware. Not a chatbot wrapper. Not an IDE plugin. A full autonomous agent with memory, scheduling, tool use, and a learning loop that gets better the longer it runs.
This isn't a press-release recap. This is a practical deep dive: what Hermes actually does, how to set it up, what works well, what doesn't, and whether it's worth your time. Every feature described here comes with a real usage scenario you can try today.
What Hermes Agent Actually Is
Hermes Agent is a self-hosted AI agent that lives on your server (or laptop, or a $5 VPS) and talks to you through a terminal, Telegram, Discord, Slack, WhatsApp, Signal — 15+ platforms from a single gateway process. It uses whatever LLM you point it at: OpenAI, Anthropic, DeepSeek, Nous Portal, OpenRouter with 200+ models, or your own local endpoint.
The pitch that makes it different from "yet another agent framework": it has a closed learning loop. It remembers things across sessions, creates reusable skills from complex tasks, improves those skills during use, and builds a profile of who you are over time. Most agents start fresh every conversation. Hermes accumulates context.
It's MIT-licensed, which matters if you're building on top of it.
Key numbers:
- 73,600+ GitHub stars (as of April 2026)
- 647 skills across 4 registries (79 built-in, 47 optional, 521 community-contributed)
- 15+ messaging platforms supported from one gateway
- 6 terminal backends: local, Docker, SSH, Daytona, Singularity, Modal
- Minimum context requirement: 64K tokens (models below this are rejected at startup)
Installation: 60 Seconds, No Joke
# One-line install — Linux, macOS, WSL2, even Android via Termux
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Reload your shell
source ~/.bashrc # or: source ~/.zshrc
# Start chatting
hermes
The installer handles everything: Python 3.11 (via uv, no sudo), Node.js v22, ripgrep, ffmpeg. The only prerequisite is git.
After install, you get a set of CLI commands that cover most configuration:
hermes model # Pick your LLM provider interactively
hermes tools # Enable/disable tool groups
hermes setup # Full setup wizard (does everything at once)
hermes gateway # Start the messaging gateway
hermes doctor # Diagnose issues
hermes update # Update to latest version
Works on Android too. Termux gets a dedicated install path with a curated .[termux] extra that skips Android-incompatible voice dependencies. You can literally run an AI agent from your phone.
Choosing a Model Provider
Hermes doesn't lock you into any provider. Run hermes model and pick from the list:
| Provider | What It Is | Auth Method |
|---|---|---|
| Nous Portal | Subscription, zero-config | OAuth login |
| OpenAI Codex | ChatGPT OAuth, Codex models | Device code auth |
| Anthropic | Claude models directly | Claude Code auth or API key |
| OpenRouter | 200+ models, multi-provider | API key |
| DeepSeek | Direct API | API key |
| GitHub Copilot | GPT-5.x, Claude, Gemini via Copilot | OAuth |
| Hugging Face | 20+ open models | HF_TOKEN |
| Custom Endpoint | VLLM, SGLang, Ollama, any OpenAI-compatible | Base URL + key |
Plus: Z.AI/GLM, Kimi/Moonshot, MiniMax, Alibaba Cloud/DashScope, Arcee AI, and more.
The 64K rule: Hermes requires at least 64,000 tokens of context. Models with less get rejected at startup. This makes sense — multi-step tool-calling workflows eat context fast, and a small window means the agent loses track of what it's doing halfway through a task. If you're running a local model, set --ctx-size 65536 or higher.
Switch providers any time with hermes model. No code changes, no lock-in.
The Memory System: Small, Bounded, and Deliberate
This is where Hermes diverges from most agent frameworks. Instead of dumping everything into a vector database, Hermes uses two tiny, character-limited files:
| File | Purpose | Limit |
|---|---|---|
MEMORY.md | Agent's notes — environment facts, conventions, lessons learned | 2,200 chars (~800 tokens) |
USER.md | User profile — your preferences, communication style | 1,375 chars (~500 tokens) |
Both live in ~/.hermes/memories/ and get injected into the system prompt as a frozen snapshot at session start.
How memory actually works in practice:
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§This machine runs Ubuntu 22.04, has Docker and Podman installed
§User prefers concise responses, dislikes verbose explanations
The agent manages its own memory through three actions:
- add — Store a new fact
- replace — Update an existing entry (substring matching)
- remove — Delete something no longer relevant
The frozen snapshot catch: When Hermes writes to memory during a session, changes are persisted to disk immediately — but they won't appear in the system prompt until the next session starts. This is intentional (preserves LLM prefix cache for performance), but it means the agent might "forget" something it just learned if you keep talking in the same session.
When memory fills up, the agent gets an error with current entries and usage stats, then has to consolidate or replace entries to make room. It's like a human with a notebook that only has 15 lines — you learn to be selective about what you write down.
What to save vs skip:
- ✅ Save: User preferences, environment facts, project conventions, corrections, workflow patterns
- ❌ Skip: Trivial info, easily-searchable facts, large code blocks, session-specific temp data
This bounded approach is refreshing. Most agent memory systems either have no limits (and fill up with noise) or use vector retrieval (which hallucinates relevance). Hermes forces discipline.
Skills: Procedural Memory the Agent Creates Itself
Skills are Hermes's answer to "how do you get better at recurring tasks?" When the agent completes something complex, it can create a skill — essentially a SKILL.md file with instructions for next time. Skills self-improve during use.
The ecosystem is surprisingly large: 647 skills across 4 registries. The built-in ones cover:
- Coding agents: Claude Code, Codex, OpenCode delegation
- Creative tools: ASCII art, p5.js generative art, Manim math animations, Excalidraw diagrams, architecture diagrams
- Platform integrations: Apple Notes, Apple Reminders, FindMy, iMessage
- Fun stuff: Minecraft modpack server setup, Pokemon player (yes, it plays Pokemon autonomously via headless emulation)
- Design: 54 production-quality design system templates extracted from real websites (Stripe, Linear, Vercel, Notion, Airbnb…)
Skills follow the agentskills.io open standard, so they're portable and community-shareable.
Real scenario: You ask Hermes to set up a Docker Compose stack for a Postgres + Redis + Node app. It does it, then creates a skill called "docker-compose-setup" with the template, common gotchas, and port conventions it discovered. Next time you ask for a similar stack, it loads the skill and gets it done in half the steps.
Tools: 47 Built-in, Organized by Category
Hermes ships with a broad tool registry. You enable/disable groups with hermes tools:
| Category | Examples | What For |
|---|---|---|
| Web | web_search, web_extract | Search and scrape the web |
| Terminal & Files | terminal, process, read_file, patch | Run commands, edit files |
| Browser | browser_navigate, browser_snapshot, browser_vision | Full browser automation |
| Media | vision_analyze, image_generate, text_to_speech | Image analysis, generation, TTS |
| Agent orchestration | todo, execute_code, delegate_task | Planning, subagents, code execution |
| Memory & recall | memory, session_search | Persistent memory, cross-session search |
| Automation | cronjob, send_message | Scheduled tasks, outbound messaging |
Quick enable/disable:
# Start with only web and terminal tools
hermes chat --toolsets "web,terminal"
# Or configure interactively
hermes tools
Terminal Backends: Run Anywhere, Safely
This is one of Hermes's strongest practical features. You can choose where the agent's terminal commands actually execute:
| Backend | Use Case |
|---|---|
local | Default — runs on your machine |
docker | Isolated containers — safe for untrusted tasks |
ssh | Remote server — agent can't touch its own code |
daytona | Cloud sandbox — persistent, hibernates when idle |
modal | Serverless — scaled, pay-per-use |
singularity | HPC containers — rootless cluster computing |
# ~/.hermes/config.yaml
terminal:
backend: docker
docker_image: python:3.11-slim
container_persistent: true # packages survive across sessions
container_cpu: 1
container_memory: 5120 # 5GB
The SSH backend is the security sweet spot: the agent works on a remote machine and literally cannot modify its own code or config. Container backends (Docker, Singularity, Modal) add further hardening: read-only root filesystem, all Linux capabilities dropped, no privilege escalation, PID limits, full namespace isolation.
Practical tip: If you're running Hermes on a VPS and giving it real tasks, start with docker backend. If you trust the tasks but want separation, use ssh. Only use local for development or tasks you'd run yourself.
Cron: Built-in Scheduled Automation
Hermes has a built-in cron scheduler. No external tools needed. Create jobs in natural language or cron expressions, and results get delivered to any messaging platform.
# From the chat
/cron add "every 6h" "Check GitHub trending repos in Python and summarize the top 5 new ones. If nothing interesting, respond with [SILENT]." --name "GitHub watcher" --deliver telegram
# From the CLI
hermes cron create "0 9 * * 1" \
"Generate a weekly report of top AI news, trending ML repos, and most-discussed HN posts." \
--name "Weekly AI digest" \
--deliver telegram
The critical thing to understand: Cron jobs run in fresh agent sessions with no memory of your current chat. Prompts must be completely self-contained. This trips people up — they write a cron prompt like "do that thing we discussed" and wonder why the agent has no idea what they mean.
The --script parameter is the power move. You can attach a Python script that runs before each execution. Its stdout becomes context for the agent:
# ~/.hermes/scripts/watch-site.py
import hashlib, json, os, urllib.request
URL = "https://example.com/pricing"
STATE_FILE = os.path.expanduser("~/.hermes/scripts/.watch-state.json")
content = urllib.request.urlopen(URL, timeout=30).read().decode()
current_hash = hashlib.sha256(content.encode()).hexdigest()
# Load previous state
prev_hash = None
if os.path.exists(STATE_FILE):
with open(STATE_FILE) as f:
prev_hash = json.load(f).get("hash")
# Save current state
with open(STATE_FILE, "w") as f:
json.dump({"hash": current_hash, "url": URL}, f)
if prev_hash and prev_hash != current_hash:
print(f"CHANGE DETECTED on {URL}")
print(f"Content preview:\n{content[:2000]}")
else:
print("NO_CHANGE")
/cron add "every 1h" "If script says CHANGE DETECTED, summarize what changed. If NO_CHANGE, respond with [SILENT]." --script ~/.hermes/scripts/watch-site.py --name "Pricing monitor" --deliver telegram
The [SILENT] trick: When the agent's response contains [SILENT], delivery is suppressed. You only get notified when something actually happens. No spam.
Messaging Gateway: Talk to It from Your Phone
hermes gateway setup # interactive — picks your platform
hermes gateway # starts the gateway process
Hermes supports 15+ messaging platforms from one gateway: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, BlueBubbles, Home Assistant, and Open WebUI.
Telegram setup example (the most common):
- Create a bot via @BotFather (
/newbot) - Get your user ID via @userinfobot
- Run
hermes gateway setup, select Telegram, paste token and user ID - Start the gateway:
hermes gateway
That's it. Now you can chat with your agent from your phone while it works on your server.
Voice memos work too — send a voice message on Telegram, Hermes auto-transcribes it with faster-whisper (runs locally, free) and responds to the text.
Group chat tip: Telegram bots have privacy mode enabled by default — the bot only sees /commands and direct replies. To let it see all messages in a group, either disable privacy mode in BotFather or promote the bot to admin.
MCP Integration: Extend with External Tools
Hermes supports the Model Context Protocol (MCP) — connect to any MCP server to add tools:
# ~/.hermes/config.yaml
mcp:
servers:
- name: "github"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_TOKEN: "your-token"
MCP tools show up alongside built-in tools. You can filter which MCP tools the agent can use to avoid tool overload.
Security: Seven Layers Deep
Hermes has a genuine defense-in-depth model, not just "we added an approval prompt":
- User authorization — allowlists control who can talk to the agent
- Dangerous command approval — human-in-the-loop for destructive operations (rm -rf, chmod 777, etc.)
- Container isolation — Docker/Singularity/Modal with hardened settings
- MCP credential filtering — env var isolation for MCP subprocesses
- Context file scanning — prompt injection detection in project files
- Cross-session isolation — sessions can't access each other's data
- Input sanitization — working directory parameters validated against allowlist
Approval modes:
# ~/.hermes/config.yaml
approvals:
mode: manual # manual | smart | off
timeout: 60 # seconds before auto-deny
manual(default): Always asks before dangerous commandssmart: Uses an auxiliary LLM to assess risk — auto-approves low-risk, auto-denies dangerous, escalates uncertainoff/--yolo: Bypasses all checks. Use in CI/CD or disposable containers only.
The timeout is fail-closed: If you don't respond within 60 seconds, the command is denied. Not approved. This is the right default.
Subagents: Delegate and Parallelize
Hermes can spawn isolated subagents for parallel workstreams:
❯ Research these three topics simultaneously:
1. Latest Rust async runtime benchmarks
2. PostgreSQL 17 new features
3. Best practices for LLM caching in production
Each subagent gets its own session, tools, and context. Results come back to the parent. This is useful for tasks that are naturally parallel — research, batch processing, multi-repo operations.
You can also use execute_code to write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.
Real Drawbacks (The Honest Part)
Every review that only says good things is useless. Here's what actually hurts:
1. Memory Is Tiny and Requires Active Management
2,200 characters for agent memory. 1,375 for user profile. That's roughly 20 short entries total. For a personal assistant that's supposed to "grow with you," hitting the ceiling is frustratingly fast. You'll find the agent spending turns consolidating and replacing memory entries instead of doing actual work. The bounded approach is philosophically sound, but in practice it means the agent forgets things you wish it hadn't.
2. The Frozen Snapshot Creates a "Memory Lag"
Memory changes during a session only take effect next session. This means if you tell the agent "remember I switched to PostgreSQL 17," it writes it to disk — but if you ask about your database setup later in the same conversation, the system prompt still shows the old info. The agent can check live state via tool responses, but it doesn't always think to. This leads to confusing moments where the agent seems to have forgotten what you just told it.
3. Cron Prompts Must Be Fully Self-Contained
Every cron job runs in a blank session. No memory, no conversation history, no context from previous runs. This means your cron prompts need to spell out everything — what to do, how to do it, what output format to use, where to deliver. Writing good cron prompts is its own skill, and the first few attempts usually produce useless results because people underspecify.
4. 64K Context Minimum Locks Out Smaller Local Models
If you want to run fully local with a 7B or 13B model, you're likely out of luck unless you can afford the RAM for 64K context. This is a reasonable engineering decision (small context = broken agent loops), but it means Hermes isn't truly "runs on anything" — it runs on anything that can serve a 64K-context model.
5. Gateway Restart Drops Connections
If you need to restart the gateway (update, config change, crash recovery), all active messaging sessions disconnect. There's no graceful handoff. Users on Telegram/Discord just see the bot go silent, then come back. For personal use this is fine; for team deployments it's a rough edge.
Where Hermes Fits: 3 Quick Comparisons
These aren't full reviews — just positioning notes so you know when to pick what.
Hermes vs OpenClaw: Both are self-hosted personal AI agents with messaging gateway, cron, memory, and tool use. OpenClaw is Node.js-based with a focus on channel diversity and plugin architecture. Hermes is Python-based with a focus on the learning loop (skills, self-improvement) and research-readiness (trajectory export, RL training). If you want "growing agent intelligence," lean Hermes. If you want "stable message routing across 15 platforms with extensive plugin ecosystem," lean OpenClaw.
Hermes vs LangGraph: LangGraph is a framework for building agent workflows — you write the graph, define the nodes, handle the state. Hermes is a ready-to-use agent — install and chat. If you need custom multi-agent orchestration for a product, use LangGraph. If you need a personal agent that works out of the box, use Hermes.
Hermes vs CrewAI: CrewAI focuses on multi-agent role-playing ("researcher," "writer," "editor" agents collaborating). Hermes is a single agent with subagent delegation. CrewAI is better for predefined team workflows. Hermes is better for open-ended personal assistance where the task isn't known in advance.
Quick Reference Cheat Sheet
Essential Commands
hermes # Start chatting
hermes model # Switch LLM provider
hermes tools # Enable/disable toolsets
hermes gateway setup # Configure messaging platforms
hermes gateway # Start the messaging gateway
hermes cron list # List scheduled jobs
hermes config set KEY VAL # Set a config value
hermes doctor # Diagnose issues
hermes update # Update to latest
hermes --continue # Resume last session
hermes --yolo # Bypass command approval (careful!)
Recommended First-Time Config
# ~/.hermes/config.yaml
# Use Docker for safety
terminal:
backend: docker
docker_image: python:3.11-slim
container_persistent: true
# Keep approval prompts on
approvals:
mode: manual
timeout: 60
Common Gotchas
| Problem | Cause | Fix |
|---|---|---|
| Agent ignores memory you just added | Frozen snapshot — memory only loads at session start | Start a new session (hermes) |
| Cron job produces garbage output | Prompt isn't self-contained | Spell out everything in the cron prompt |
| Bot doesn't see group messages | Telegram privacy mode | Disable in BotFather, then re-add bot to group |
| Model rejected at startup | Context window < 64K | Use a larger model or increase --ctx-size |
hermes: command not found after install | Shell not reloaded | Run source ~/.bashrc |
Bottom Line
Hermes Agent is the most complete open-source personal AI agent available in April 2026. The learning loop (memory + skills + user modeling) is genuinely novel — most competing agents don't even attempt cross-session improvement. The 6 terminal backends give you real deployment flexibility. The 647-skill ecosystem means you're not starting from zero.
The tradeoffs are real: tiny memory limits, frozen snapshot lag, cron prompt overhead, and a 64K context floor. But these are engineering choices, not bugs — they keep the system bounded and predictable.
If you want an AI agent that lives on your server, talks to you from Telegram, runs scheduled tasks, and actually gets better over time — Hermes is the one to try. Install takes 60 seconds. You'll know within an hour whether it fits your workflow.
Links:
- GitHub: github.com/NousResearch/hermes-agent
- Docs: hermes-agent.nousresearch.com/docs
- Discord: discord.gg/NousResearch
- Skills Hub: agentskills.io
- License: MIT