We run Scout, a computer vision platform that analyzes security camera feeds every 30 seconds. Every morning at 6am, an Oban job fires for each hub — it gives Claude Haiku 8 database tools, a question about what happened in the last 24 hours, and lets it query freely until it has enough data to write a daily insight. The LLM typically takes 5–7 rounds — discovering available data, querying observations, checking alert history, pulling specific analyses — then produces a structured summary.
When we first started building this, the instinct was to reach for a framework. LangChain has an Elixir port. There are agent SDKs. But the more we looked at what we actually needed — a loop that calls an LLM, executes tools, and recurses — the less a framework made sense. We’d be importing a dependency to do something Elixir is already good at: recursion with pattern matching and process isolation.
The whole thing runs on Elixir with no agent framework. The tool loop is ~90 lines. The authorization model is a closure. The circuit breaker is :fuse. If you’re already running Elixir in production, you have everything you need.
The authorization problem
The first thing we had to figure out was multi-tenancy. The LLM chooses which tools to call and what arguments to pass — but it shouldn’t be able to access data outside its tenant boundary. This is the core tension with agentic patterns: you’re giving an LLM autonomy over what to query, so you need ironclad control over where it can query.
Most frameworks solve this with middleware — intercepting tool calls at runtime and checking permissions before execution. The trouble with that approach is it’s easy to forget. Miss a middleware, and you’ve got a tenant isolation bug that’s invisible until someone notices cross-org data in their insights.
We do it at construction time instead — and this is where Elixir’s functional nature pays off. Each tool captures organization_id in a closure when it’s built:
defp list_observation_outputs_tool(organization_id) do
ReqLLM.Tool.new!(%{
name: "list_observation_outputs",
description: "List all observation outputs available for analysis. Call this first to discover what data you can query.",
parameter_schema: %{
"type" => "object",
"properties" => %{},
"required" => []
},
callback: fn args ->
ToolCallbacks.list_observation_outputs(args, organization_id)
end
})
end
The key line is the callback. The LLM passes args (here, none), the closure supplies the tenant context. There’s no way for the model to influence which org it queries — by the time the tool is called, the security boundary is already sealed. No middleware to bypass, no runtime check to forget.
All 8 tools follow this pattern. all_tools/2 builds the full list, each capturing organization_id (and optionally hub_id) in its closure. This is the same structural isolation as the Scope pattern, but at the function level rather than the context-function level.
The tool loop
The core of the agent is a recursive function with two clauses. The guard clause enforces a hard round limit — this was important to us early on, as we had a few cases where the LLM got stuck in a loop requesting the same data over and over:
defp execute_tool_loop(_context, _tools, _req_opts, round, max_rounds, all_tool_calls)
when round >= max_rounds do
{:ok, %{
content: "Maximum tool execution rounds reached. Please try a simpler question.",
tool_calls: all_tool_calls,
rounds: round,
usage: %{}
}}
end
The main clause does one iteration of the loop:
defp execute_tool_loop(context, tools, req_opts, round, max_rounds, all_tool_calls) do
opts = Keyword.merge(req_opts, tools: tools)
case ReqLLM.generate_text("anthropic:claude-haiku-4-5", context, opts) do
{:ok, response} ->
case ReqLLM.Response.tool_calls(response) do
tool_calls when is_list(tool_calls) and length(tool_calls) > 0 ->
# Build tool call history (arguments arrive as JSON strings)
tool_call_history = Enum.map(tool_calls, fn tc ->
%{id: tc.id, name: tc.function.name,
arguments: Jason.decode!(tc.function.arguments)}
end)
# Append assistant message with tool calls to context
assistant_msg = ReqLLM.Context.assistant("", tool_calls: tool_calls)
context = ReqLLM.Context.append(context, assistant_msg)
# Execute tools and append results
context = ReqLLM.Context.execute_and_append_tools(context, tools, tool_calls)
# Recurse
execute_tool_loop(
context, tools, req_opts,
round + 1, max_rounds,
all_tool_calls ++ tool_call_history
)
_ ->
# No tool calls — LLM is done, extract the final text
content = ReqLLM.Response.text(response)
{:ok, %{content: content, tool_calls: all_tool_calls, rounds: round, usage: response.usage}}
end
{:error, reason} ->
{:error, reason}
end
end
Send messages to the LLM. If it responds with tool calls, execute them, append the results to the conversation, and recurse. If it responds with text, you’re done. That’s the entire agent loop.
ReqLLM.Context.execute_and_append_tools does the work of matching each tool call to its callback by name, executing the closure, and appending the result as a tool response message. One line in our code, but it’s the bridge between “the LLM asked for data” and “the data is in the conversation.”
The public entry point wraps this with timing, logging, and defaults:
def chat_with_tools(api_key, messages, tools, opts \\ []) do
max_rounds = Keyword.get(opts, :max_tool_rounds, 10)
context = ReqLLM.Context.normalize!(messages, [])
execute_tool_loop(context, tools, opts, 0, max_rounds, [])
end
Default max rounds is 10 — enough for simple queries, conservative enough to bound runaway conversations. We landed on this after watching the telemetry for a few weeks; most insights complete in 5–7 rounds, and anything that hasn’t converged by 10 is usually stuck.
The tools themselves
All 8 tools are read-only. The LLM has full analytical access to scoped data but can’t modify anything — least privilege by design. We considered giving it write access (e.g. to flag observations or create notes) but decided to keep the first iteration simple and see how far read-only gets us.
Discovery tools run first. list_observation_outputs takes no parameters — it queries all observation outputs for the org and returns a list of {id, name, description}. The LLM calls this to learn what data exists before querying it. list_cameras and list_alert_monitors serve a similar purpose for cameras and alert rules.
Query tools do the heavy lifting. query_observations is the most complex — it takes observation output IDs, a time range, and optional camera filters. The callback automatically compresses results based on the time span:
- ≤24 hours: run-length encoding — consecutive identical observations collapse into
{start_time, end_time, value, duration_minutes}events, limited to the 30 most significant per output - 1–7 days: hourly buckets with value distributions
- 7–30 days: daily buckets
- >30 days: weekly buckets
This adaptive compression is something we arrived at after running into token cost issues. A 24-hour query for one output at 30-second intervals produces 2,880 data points. Sending all of them as JSON wouldn’t hit the context limit (Haiku has 200k tokens), but the token cost adds up across multiple tool calls and the signal-to-noise ratio tanks. Run-length encoding collapses “person_count was 0 from 2am to 6am” into a single event — the LLM gets the same insight in a fraction of the tokens.
Detail tools like get_analysis_summary pull a single analysis with its full context — camera name, timestamp, natural language summary, all observation values. The LLM typically calls this after spotting something interesting in the aggregate data.
Every tool callback returns results via Toon.encode!/1 — a token-optimized serialization format we built that’s 30–60% more compact than JSON. When the LLM is making 5–7 tool calls per insight, the token savings compound quickly.
Errors as data
This one took us a while to get right. Our first instinct was to have tool callbacks raise exceptions or return {:error, ...} tuples on failure — the idiomatic Elixir approach. The trouble is that kills the loop. If a tool raises, you need error handling logic to decide whether to retry, skip, or abort. That’s a lot of branching for something the LLM can handle itself.
Instead, tool callbacks never raise. When something goes wrong, the error is returned as a normal result:
# Invalid datetime format
{:ok, %{error: "Invalid start_time format. Please provide ISO 8601 datetime (e.g., 2024-01-15T10:00:00Z)"}}
# Not found or unauthorized
{:ok, %{error: "Analysis not found or access denied"}}
Note these return {:ok, ...} — the tool “succeeded” in returning a result, but the result contains an error message. The tool execution layer feeds this back to the LLM as a normal tool response, and the model self-corrects. It sees “Invalid start_time format” and retries with the right format. It sees “not found” and tries a different analysis ID.
This keeps the loop alive. By returning errors as data, the LLM handles the recovery — which it’s surprisingly good at. We were skeptical initially, but in practice Haiku almost always self-corrects within one retry.
Why two phases instead of one
This was a lesson we learned the hard way. Our first version tried to do everything in one pass — give the LLM tools and a JSON schema for the output. The idea was: gather data and produce structured output in a single conversation.
The problem is that structured output and tool use don’t mix well. When the LLM is constrained to produce JSON, it tends to stop exploring and commit to an answer too early. We’d see it make 2–3 tool calls and then produce a shallow insight, when the data warranted deeper investigation.
So we split it into two phases:
Phase 1: Data gathering. Claude Haiku gets the 8 tools and a broad question: “Analyze the last 24 hours of operations for this hub.” It queries freely over multiple rounds (up to 25), gathering observations, alert history, and specific analyses. The output is unstructured text — a narrative summary of what happened.
Phase 2: Structured output. The Phase 1 summary plus the full tool call history are sent to Claude Haiku again, this time with a JSON schema via generate_object. No tools — just “given this data, produce a structured insight with these fields.” The schema enforces the output format: title, summary, key findings, recommendations.
The difference in quality was immediately noticeable. Phase 1 can use a generous round limit (25) without worrying about output format, while Phase 2 is a single fast call that just shapes what’s already been gathered.
Both phases use Claude Haiku — fast enough for 25 rounds to complete well within the job’s 5-minute timeout, cheap enough that a full insight costs fractions of a cent. The model choice matters when this runs for every hub every morning.
Circuit breaking
LLM APIs fail. When they do, you don’t want every insight job hammering a dead endpoint. We learned this during an early Gemini outage — 12 concurrent workers all retrying against a down API, each with exponential backoff, but collectively still putting meaningful load on a service that was already struggling.
We use :fuse for circuit breaking on the high-volume path:
# Application startup
:fuse.install(:gemini_api_circuit, {{:standard, 5, 60_000}, {:reset, 30_000}})
Five failures in 60 seconds trips the circuit. It resets after 30 seconds of cooldown. Before every Gemini call (image analysis, embeddings), the code checks the fuse:
case :fuse.ask(:gemini_api_circuit, :sync) do
:blown -> {:error, :circuit_open}
_ -> do_generate_embedding(api_key, text, opts)
end
On failure, :fuse.melt(:gemini_api_circuit) increments the failure counter.
One honest gap: the fuse currently protects Gemini calls (image analysis every 30 seconds across all cameras) but not the Anthropic calls in the tool loop. The Gemini path is where cascade failures actually happen — high volume, concurrent workers, shared connection pool. The insight jobs are low volume (one per hub per day), so the blast radius of an Anthropic outage is small. But adding an :anthropic_api_circuit fuse would be the right thing to do — it’s on our list.
Putting it together
The entry point is an Oban job. A coordinator runs at 6am daily, iterates all organizations and hubs, and enqueues one DailyInsightJob per hub:
Enum.flat_map(organizations, fn org ->
scope = Scout.Scope.for_organization(org)
hubs = Scout.Hubs.list_hubs(scope)
Enum.map(hubs, fn hub ->
%{organization_id: org.id, hub_id: hub.id}
|> Scout.Jobs.DailyInsightJob.new()
|> Oban.insert()
end)
end)
Each job has a 5-minute timeout (def timeout(_job), do: :timer.minutes(5)) and uses exponential backoff on retries. If Phase 1 hits max rounds without producing a usable result, the job fails — if you couldn’t gather enough data in 25 rounds, the insight isn’t worth generating. Unique constraint violations are treated as success — if the job retries and the insight already exists, it’s idempotent.
On success, the job stores the insight along with token usage and tool call count (emitted as telemetry for operational visibility), then broadcasts via PubSub. LiveView dashboards subscribe to the organization’s topic and update in real time — the insight appears without a page refresh.
Trade-offs and what’s next
About 200 lines of purposeful code across the tool loop, tool definitions, and wiring. No framework, no DSL, no dependency beyond the LLM client. The authorization is a closure. The loop is recursion with a guard clause. The circuit breaker is a single :fuse call.
There are trade-offs to not using a framework, though:
- No built-in observability. We had to wire up our own telemetry for token usage, round counts, and tool call tracking. Frameworks like LangChain give you this for free — although in our experience “for free” usually means “in a format you’ll eventually want to customize anyway”
- No retry/fallback orchestration. If we wanted to fall back to a different model when Anthropic is down, we’d have to build that ourselves. Right now we just let the job retry via Oban
- No memory or conversation persistence. Each insight job starts fresh. For our use case that’s fine — daily insights don’t need conversation history — but it’d be more work if we wanted a stateful agent
The upside is we understand every line. When something breaks at 6am, the stack trace points to our code, not a framework’s internal dispatch. When we want to change behaviour — like adding the two-phase split — it’s a straightforward code change, not fighting an abstraction.
If you’re running Elixir, these are patterns you already know — recursion, closures, pattern matching, guard clauses, :fuse — they just happen to compose into a production AI agent. That’s the thing that struck me most building this: we didn’t need to learn “AI engineering” patterns. The tool loop is recursion. The authorization model is a closure. The circuit breaker is an OTP primitive. The job orchestration is Oban. Elixir’s functional building blocks are the building blocks for agentic systems — you don’t need a framework to bridge between them.