All posts
AI AgentsFunction CallingTool UseOpenAIRedisMongoDBAPI DesignProduction Engineering

Tool Use and Function Calling Patterns for AI Agents

Polystreak Team2026-04-0513 min read

Function calling (often called tool use) is the bridge between probabilistic language models and deterministic systems. When it works, agents feel omniscient. When it fails, you get silent wrong answers, runaway loops, or bills that scale with mistakes. This post is a field guide for engineers shipping that bridge: schemas, routing, failure modes, concurrency, and where Redis Cloud and MongoDB Atlas fit in observability and control planes.

What “tool use” actually is in production

At runtime, tool use is a contract: the model emits structured arguments, your runtime validates them, executes side effects, and returns a string (or JSON) back into the conversation. The LLM never “calls HTTP” directly; your orchestrator does. That separation is what lets you enforce authz, quotas, timeouts, and audit trails.

OpenAI-style schemas in one glance

Most stacks converge on JSON Schema-like tool definitions: a name, a human description the model reads, and strict parameter shapes. Descriptions matter as much as types; ambiguous wording produces creative argument values. Keep enums small, prefer explicit required fields, and document units (milliseconds vs seconds, currency codes, time zones).

  • One tool per user-visible capability, not one tool per REST endpoint (avoid combinatorial explosion in the prompt).
  • Normalize IDs and timestamps at the boundary; never ask the model to invent internal primary keys.
  • Return tool outputs as concise JSON or bullet summaries; dumping raw HTML or 500-line payloads poisons the next reasoning step.
  • Version tools in the name or namespace when behavior changes (search_v2) so logs and regressions stay interpretable.
PatternTypical extra latencyWhen to use
Single-turn tool (one call, answer)+120–450 ms round-trip to your APISimple lookups, deterministic workflows
ReAct-style loop (2–6 steps)+400 ms–3 s total wall timeResearch, multi-source synthesis
Parallel fan-out (2–8 tools)dominated by slowest dependencyIndependent reads (not writes with ordering)
Deferred tool (queue + worker)seconds to minutesHeavy jobs: PDF extraction, batch ETL

Latency numbers assume a hosted LLM with ~200–800 ms time-to-first-token for the planning step, plus your own service in the same region. Cross-region tool hops routinely add 50–150 ms per call; co-locate orchestrators, Redis, and primary APIs when you are chasing p95.

Tool routing: from model intent to the right backend

Routing is the policy layer on top of raw function names. A router decides which implementation runs, with what credentials, and under which budget. Small agents use a flat map from tool name to handler. Larger systems introduce tiers: a fast classifier or rules engine narrows the candidate set before the full tool list hits the model context.

Allowlists, RBAC, and shadow tools

Never expose every internal microservice as a tool. Ship an allowlist per tenant or role, and treat “admin” tools as a separate deployment surface. Shadow or dry-run modes let you log what the model would have invoked without committing side effects, which is invaluable when tuning prompts during incidents.

The model proposes; the runtime disposes. If your router cannot explain why a call was denied, you do not have governance—you have vibes.

For HTTP-backed tools, generate clients from OpenAPI where possible and validate responses against the same schema mindset. For database tools, prefer stored procedures or constrained query builders over free-form SQL from the model, even when the model is “just generating parameters.”

Errors, retries, and timeouts

Failure classUser-visible behaviorEngineering response
4xx from dependencyClear, non-technical messageFix schema or validation; do not blind retry
5xx / network blipOne bounded retryExponential backoff, max 2 attempts for reads
Timeout (e.g., 2.5 s)Partial answer or escalateShort client timeouts, longer worker queue for heavy work
Rate limit (429)Backoff + jitterToken bucket in Redis; shed load before the LLM retries

Instrument each tool with RED-style metrics: rate, errors, duration. Aim for sub-250 ms p50 on read tools that sit on the critical path of a single user turn; anything slower should move async or be cached. For writes, default to idempotency keys so duplicate model attempts do not double-charge or double-post.

  • Use a hard cap on tool calls per user message (commonly 3–8) and per session hour to contain cost.
  • Return structured error payloads to the model: { "error_code": "RATE_LIMIT", "retry_after_ms": 890 } beats prose paragraphs.
  • Separate "model retry" from "HTTP retry"; letting the model loop on the same mistake burns tokens fast.

Parallel vs sequential tool calls

Parallelism wins when calls are independent reads: fetching CRM, billing, and support tickets in one fan-out can shave 400–900 ms versus serial execution. Sequential calls are mandatory when outputs depend on prior results (create ticket then attach file) or when business rules require ordering (inventory checks before capture).

Concurrency ceilings

Unbounded Promise.all on tools is a denial-of-wallet attack against your own dependencies. Cap concurrency per tenant (often 4–8 parallel calls), propagate cancellation when the user aborts, and prefer a small pool of warm connections to external APIs to avoid TLS handshake storms during spikes.

TopologyToken/cost impactOperational note
Serial toolsHigher wall time, similar tokens if prompts are tightEasier debugging, clearer causality
Parallel toolsSlightly higher prompt size if you summarize many outputsWatch connection pools and downstream QPS
Speculative parallelCan waste tokens on unused branchesOnly for read-only probes with cheap tools

Rough order-of-magnitude for mid-2026 hosted models: a multi-tool turn that plans once and consumes 1.5k–4k completion tokens might land between $0.004 and $0.03 per turn depending on tier and caching. Tool output tokenization matters—trim fields aggressively before they re-enter context.

Tool output caching with Redis Cloud

Many tool results are read-mostly and stable for seconds to hours: geocoding, feature flags, reference data, embedding neighbor lists. Redis Cloud gives you a low-latency, TTL-friendly cache tier in front of slower HTTP or database paths. Key tools by a hash of normalized arguments plus tool version, set TTLs from data freshness requirements, and use probabilistic early expiration on hot keys to prevent thundering herds.

  • Cache hit targets: sub-5 ms p99 local network to Redis; end-to-end user-visible savings often 80–300 ms per avoided HTTP call.
  • Use Redis for rate limiting tool invocations per user, per tenant, and per tool with sliding windows.
  • Store short-lived idempotency records (write tools) with SET NX and TTL to dedupe duplicate model retries.
Treat Redis as the agent’s reflexes: fast, ephemeral, and explicit about expiry. Long-term truth still belongs in a database.

MongoDB Atlas for logs, registries, and audit trails

MongoDB Atlas is a natural home for append-only tool telemetry: each invocation document captures tool name, arguments hash, latency, status code, tenant id, trace id, and model id. That dataset powers post-incident replay, prompt regression tests, and billing reconciliation. Store full arguments only when policy allows; otherwise store redacted shapes plus secure pointers.

A tool registry collection can track active definitions, JSON Schema versions, owners, and rollout percentages. Compound indexes on (tenant_id, created_at) and (tool_name, status) keep dashboards snappy at millions of events per day. TTL indexes can automatically drop high-volume debug logs after 30 days while retaining summarized aggregates.

Designing for queryability

  • Log structured tool errors with stable codes; text search across free-form failures does not scale.
  • Materialize nightly aggregates: error rate by tool, p95 latency by dependency, top argument patterns causing 4xx.
  • Correlate with OpenTelemetry traces so one click goes from a bad answer to the exact tool span.

Putting it together

Production tool use is part schema design, part traffic engineering, part observability. Start strict: small tool surfaces, explicit validation, bounded loops, and metrics on every call. Layer Redis Cloud in front of hot reads and limits, and MongoDB Atlas underneath for durable history and registry metadata. The model will keep getting smarter; your obligation is to make the machinery around it boring, fast, and auditable.