AgentiCore — The Engine Room

Same binary. Same image. Same Helm chart. Two completely different jobs.

You don't pick at install time. You pick at runtime with one environment variable: AGENT_MODE.

🏭

Fleet mode (default)

"Submit a task, get a PR."

Submit work via REST POST /jobs, MCP run_task, or CLI agenticore run. AgentiCore creates a job in Redis, launches a background runner, clones the repo into a bespoke worktree, materialises the profile, spawns claude -p "<task>" in the worktree, then auto-PRs the result. KEDA autoscales the pod fleet from one to a thousand based on queue depth. OTEL traces flow to Langfuse.

💬

Agent mode

"Sit down and talk to a specialist."

Set AGENT_MODE=true, point AGENTIHUB_AGENT at an agent identity, and the same binary loads the package from AgentiHub at startup and exposes POST /v1/chat/completions. OpenAI-SDK compatible — LibreChat, OpenWebUI, LiteLLM, raw curl. Streaming SSE deltas: thinking_delta, tool_use, tool_result, assistant text, all token-by-token.

Fleet mode internals

Work-stealing queue. Bespoke worktrees. Auto-PR.

Submit: any of three surfaces hit AgentiCore directly on port 8200 — REST POST /jobs, MCP run_task, or CLI agenticore run. submit_job() creates a Redis+file job record, returns the job ID immediately, launches the runner as a background asyncio task.

Execute: the runner clones or fetches the repo (flock or Redis distributed lock), creates a bespoke worktree (agenticore-{job_id[:8]}), materialises the profile (which AgentiHooks installed at container boot), injects MCP configs into cwd/.mcp.json, spawns claude -p with profile flags as CLI args.

Ship: on success with auto_pr: true, AgentiCore stages changes, pushes the branch, runs gh pr create, stores the PR URL on the job record. State transitions: queued → running → succeeded / failed / cancelled.

Scale: KEDA watches the Redis queue depth and scales pods 1 → 1000 to drain the work. Each worker steals from the same queue. OTEL traces ship to Langfuse and Postgres.

3 surfaces REST · MCP · CLI all hit Core directly

1 → 1000 KEDA autoscale on queue depth

flock + Redis distributed lock per repo clone

auto-PR gh pr create on success exit

OTEL traces to Langfuse + Postgres

Agent mode internals

A long-lived agent as an OpenAI endpoint.

Identity comes from AgentiHub. Set AGENTIHUB_URL + AGENTIHUB_AGENT; the initialiser clones the hub at startup and copies the requested agent's package/ into /app/package/. The agent boots with that identity loaded.

API surface. POST /v1/chat/completions — full OpenAI compatibility, streaming and non-streaming. Drop-in for LibreChat, OpenWebUI, LiteLLM model routing, any OpenAI SDK client, raw curl -N.

Live SSE deltas. Each chunk is a real chat.completion.chunk JSON. You see thinking_delta token-by-token (when extended thinking is on), then tool_use and tool_result blocks as the agent works, then the assistant text. Full transcript also written to disk for audit.

Sticky toggles. Slash tokens like /show-tools / /hide-tools / /show-thinking control per-agent stream visibility, persisted in Redis. Configure once, every conversation respects it.

Curl from anywhere

↳ POST /v1/chat/completions

↳ stream: true

↳ receive thinking_delta chunks

↳ receive tool_use + tool_result

↳ receive assistant text

works with any OpenAI SDK

or wire into a UI

LibreChat / OpenWebUI / LiteLLM

↳ register agent as openai/<name>

↳ api_base = http://<agent>:8200/v1

↳ user picks the agent in UI

↳ chat as if it's GPT-4 / Claude

drop-in conversational client

Reach your agents from anywhere.

Two AgentiCore-internal connectors extend the surface beyond MCP and REST. Both built on the canonical ProgressSink ABC, so Slack / Teams / Discord drop-ins are a few hundred lines away.

✉️

Telegram bot connector

Native, in-process, owner-locked.

Set TELEGRAM_BOT_TOKEN + TELEGRAM_OWNER_ID and any specialised agent becomes a Telegram chat. Built on aiogram v3 with an async polling loop inside the AgentiCore process — no separate webhook server.

What you see during a turn: a transient progress message with one chip per tool invocation (▶ Bash ···, ✓ Bash ··· (1.2s), ✗ Bash ···). When the agent finishes, the progress message is deleted and the final answer arrives as a clean persistent message. Slash commands: /start, /clear, /status, /voice [on|off].

🎤

Voice adapter

STT / TTS, vendor-agnostic, one env var.

Set VOICE_SERVICE_URL. The adapter speaks a tiny HTTP protocol (POST /stt, POST /speak) so you can plug in ElevenLabs, Deepgram, Whisper, Cartesia, Anton's anton-voice, or your own service. Stateless singleton, 120 s timeout for long TTS.

Voice notes flow: Telegram voice note → downloaded via Bot API → adapter.transcribe() → AgentExecutor → reply in current output mode. Per-conversation toggle (enable voice / disable voice). When voice mode is on, the chat goes silent during the turn (no chip stream) and TTS handles the final answer with a record_voice chat-action.

Self-describing capabilities.

AgentiCore auto-discovers enabled features at runtime and prepends a capabilities block to the system prompt — so the LLM knows it has voice, Telegram, A2A, GitHub, observability without manual prompt maintenance.

Connectivity

voiceEnabled by VOICE_SERVICE_URL

telegramEnabled by TELEGRAM_BOT_TOKEN

agentibridgeEnabled by AGENTIBRIDGE_URL — A2A

litellm_mcpExternal MCP gateway

Runtime + identity

agent_modeAGENT_MODE=true — long-lived persona

redisSession persistence + job queue

githubRepository interaction

Observability

observabilityOpenTelemetry tracing

langfuseLLM observability + evals

AgentiCore.