freeradiantbunny.org

freeradiantbunny.org/blog

diy ai agent framework

  Great—here’s an in-depth explanation of each Rust library useful for building a performant, secure, and robust AI Agent framework in Rust. Each section will explain how the library works, how it would be used in an agent context, and design considerations. I’ll begin with the first few libraries and continue in parts to keep the output manageable.


  ---

  1. reqwest – HTTP Client for LLM and API Interaction

  Purpose:
  reqwest is a popular asynchronous HTTP client in Rust. It's used to make API requests — essential for querying LLMs like OpenAI, Anthropic, or even internal REST endpoints in your agent framework.

  Use in AI Agent:

  LLM Integration: Use reqwest to call external LLM APIs like https://api.openai.com/v1/chat/completions.

  Tool Use APIs: Agents can invoke APIs like DNS setup, certificate provisioning, or server status endpoints.

  Scraping or Retrieval: Though not ideal for scraping (use scraper or browser engines), it can pull structured data (e.g., JSON) from documentation, REST APIs, or websites.


  Design Considerations:

  Use reqwest::Client for reusability across requests (persistent connections, headers).

  Set proper timeout, headers, and exponential backoff for resilience.

  Use serde to deserialize JSON response into Rust structs, improving safety.


  Example:

  let client = reqwest::Client::new();
  let res = client.post("https://api.openai.com/v1/chat/completions")
  .bearer_auth("your_api_key")
  .json(&your_request_struct)
  .send()
  .await?;
  let response: OpenAIResponse = res.json().await?;

  reqwest will be the lifeblood of external communication for your AI agent, whether talking to models or orchestrating actions across your infrastructure.


  ---

  2. tokio – Asynchronous Runtime

  Purpose:
  tokio is the de facto standard for asynchronous programming in Rust. It provides a runtime, async I/O, timers, and task orchestration. Most modern Rust libraries — including reqwest and serde — are compatible with tokio.

  Use in AI Agent:

  Parallel API Calls: Call multiple LLMs, run shell commands, or interact with tools in parallel.

  Timers/Intervals: Delay execution, retry on failure, or handle rate-limiting via async time.

  Background Tasks: Fetch memory, update logs, monitor environments without blocking the agent’s main thread.

  Web server or gRPC server: Add an agent interface via HTTP/gRPC using tokio-compatible frameworks like warp or tonic.


  Design Considerations:

  Avoid spawning blocking tasks (like file I/O or heavy computation) directly in tokio threads — use tokio::task::spawn_blocking for those.

  Structure with async fn from the ground up for all I/O and interactions.


  Example Use:

  tokio::spawn(async {
  run_agent_loop().await;
  });

  For an agent that needs to manage multiple subprocesses, browser calls, API lookups, and file edits simultaneously, tokio is non-negotiable.


  ---

  I'll continue with serde, duct, qdrant-client, and the rest in the next part. Would you like me to proceed with the next 2–3 libraries now?

  Excellent—continuing with detailed breakdowns. Here's how you can use the next essential Rust libraries in an AI agent framework.


  ---

  3. serde / serde_json – Serialization & Deserialization

  Purpose:
  serde (Serialize/Deserialize) is Rust’s standard framework for converting data structures to and from formats like JSON, YAML, and TOML. serde_json is the crate for working with JSON specifically.

  Use in AI Agent:

  Handling API Payloads: Parse JSON responses from LLMs (OpenAI, Claude, etc.) into typed Rust structures.

  Tool I/O: Represent shell tool inputs and outputs in structured JSON.

  Memory Storage: Persist agent memory, goals, or task history in JSON.

  Plan Representation: Agents often use structured "thought-action-observation" chains — serde helps encode/decode them safely.


  Example:

  #[derive(Serialize, Deserialize)]
  struct OpenAIRequest {
  model: String,
  messages: Vec,
    temperature: f32,
    }

    let request = OpenAIRequest { ... };
    let json_body = serde_json::to_string(&request)?;

    Design Considerations:

    serde makes agents robust to changes — if an API changes slightly, you can use #[serde(default)] or #[serde(skip_serializing_if = "Option::is_none")] to maintain flexibility.

    Use Value (untyped JSON) only when necessary; prefer strong typing for safety.


    In your agent, serde will act as the glue for data exchange between the LLM interface, tools, memory, and user inputs.


    ---

    4. duct – Shell Process Execution

    Purpose:
    duct is a clean, composable crate for running shell commands in Rust. It wraps around std::process::Command, but with a more ergonomic API and Unix-like piping behavior.

    Use in AI Agent:

    Tool Execution: Run Nginx setup scripts, Docker commands, Certbot, or SSH commands from your agent.

    Shell Automation: Allow agents to dynamically generate and run shell commands.

    Chainable Commands: Pipe output of one tool into another, similar to bash.


    Example:

    use duct::cmd;

    let output = cmd!("ls", "-la")
    .pipe(cmd!("grep", "html"))
    .read()?;

    Design Considerations:

    Avoid exposing duct command input/output directly to LLM unless validated — to prevent command injection.

    Combine with serde_json for structured tool output.

    Use timeouts and exit code checks to control flow.


    For agents that operate on a virtual server or interact with local CLI tools (e.g., Nginx, git, systemctl), duct is a natural and high-level choice. It lets the AI act in the real world — not just plan.


    ---

    Would you like me to continue next with qdrant-client, tantivy, and tree-sitter?

    Excellent. Now let’s go deeper into memory, indexing, and code understanding — foundational capabilities for a smart AI agent. Here's how to use qdrant-client, tantivy, and tree-sitter in a Rust-based AI agent framework:


    ---

    5. qdrant-client – Vector Store for Semantic Memory

    Purpose:
    qdrant-client is a Rust client for interacting with the Qdrant vector database, which is optimized for similarity search on high-dimensional vectors (embeddings). This gives your AI agent long-term memory — a place to remember facts, tasks, files, and goals.

    Use in AI Agent:

    Embedding Storage: Store sentence embeddings from LLMs or local models.

    Semantic Retrieval: Retrieve most relevant past documents, plans, or API docs before forming a new response.

    Persistent Memory: Save and recall files, web searches, command outputs, or conversations.

    Tool Contextualization: Before invoking a tool, the agent can retrieve similar past invocations.


    Example:

    use qdrant_client::prelude::*;

    let client = QdrantClient::new(Some("http://localhost:6333".to_string()))?;
    let search_result = client.search_points(&SearchPoints {
    collection_name: "agent-memory".into(),
    vector: your_vector,
    top: 5,
    ..Default::default()
    }).await?;

    Design Considerations:

    Use OpenAI, Cohere, or local embeddings (e.g., sentence-transformers) for vector generation.

    Structure memory as { vector, metadata, content } to allow flexible querying.

    Combine with serde to serialize metadata and tool results.


    Qdrant adds fast, scalable retrieval of the agent's history, enabling deeper context and continuity across long sessions or multiple servers.


    ---

    6. tantivy – Full-Text Search for Symbolic Memory

    Purpose:
    tantivy is a high-performance, full-text search engine written in Rust (similar to Apache Lucene). Unlike Qdrant, which is semantic, tantivy is symbolic — it indexes tokens and phrases.

    Use in AI Agent:

    Search over Logs: Quickly find specific commands, file paths, or output lines in structured logs.

    Fast Retrieval of File Contents: If your agent ingests hundreds of files, use tantivy to index and search over them.

    Autocomplete and Keyword Tool Matching: Allow fuzzy matching on commands, options, or arguments.

    Symbolic Plan Recall: Store and query past agent task plans and subtask metadata.


    Example:

    let mut schema_builder = Schema::builder();
    let content = schema_builder.add_text_field("content", TEXT | STORED);
    let schema = schema_builder.build();
    let index = Index::create_in_ram(schema);

    Design Considerations:

    Use it alongside Qdrant: tantivy for exact text, qdrant for semantic similarity.

    You can store file names, command names, etc., and let the agent search symbolically before choosing the correct action.


    Incorporating tantivy turns your AI agent into a knowledgeable software librarian, able to answer: “When did I run that command? What file contains this class?”


    ---

    7. tree-sitter – Language-Aware Code Parsing

    Purpose:
    tree-sitter is a parser generator and incremental parsing system for programming languages. It builds concrete syntax trees from source code — allowing LLMs and agents to work with structured code rather than plain text.

    Use in AI Agent:

    Code Navigation: Identify function, class, or variable definitions programmatically.

    Code Refactoring: Modify only parts of the codebase safely without brittle regex hacks.

    Safe Insertion: Insert code only at valid syntactic positions (e.g., after imports, in function body).

    Diffing and Diagnostics: Let the agent determine what changed in a file and where.


    Example:

    use tree_sitter::{Parser, Language};

    extern "C" { fn tree_sitter_rust() -> Language; }

    let mut parser = Parser::new();
    parser.set_language(tree_sitter_rust()).unwrap();
    let tree = parser.parse(code_str, None).unwrap();

    Design Considerations:

    Choose grammars from tree-sitter/languages.

    Combine with an edit engine: modify syntax trees and write them back out.

    Integrate with agent memory: when editing a file, let the agent fetch AST and work structurally.


    This makes the AI agent code-structure-aware, enabling complex refactors, targeted bug fixes, or class injections — all without breaking code.


    ---

    Would you like me to continue with the final batch: rust-tokenizers, llama-rs, and llm?

    Here’s the final set of Rust libraries, rounding out your toolkit to build a fully capable AI agent framework in Rust. These cover tokenization, on-device LLM inference, and model orchestration — critical for agents that want low-latency, local reasoning.


    ---

    8. rust-tokenizers – Tokenization for LLMs

    Purpose:
    rust-tokenizers is a Rust-native port of Hugging Face’s tokenizers (used in Transformers), supporting BPE, WordPiece, and SentencePiece algorithms. It provides high-speed, accurate tokenization for use with models like GPT-2, BERT, and LLaMA.

    Use in AI Agent:

    Token Counting: Before sending prompts to LLMs (especially with a max token limit), calculate usage.

    Chunking Input: Break large documents into token-bound chunks (e.g., 512-token sliding window).

    Embedding Preparation: Preprocess inputs before computing sentence embeddings.

    Streaming Tokens: Monitor output in real-time if using an autoregressive decoder.


    Example:

    use rust_tokenizers::tokenizer::{Gpt2Tokenizer, TruncationStrategy, Tokenizer};

    let tokenizer = Gpt2Tokenizer::from_file("gpt2-vocab.json", "gpt2-merges.txt", true).unwrap();
    let tokens = tokenizer.encode("Build me a website", None, 512, &TruncationStrategy::LongestFirst, 0);

    Design Considerations:

    Use the same tokenizer as the model you’re calling — mismatches will cause nonsense behavior.

    Store vocabulary and merge files alongside your agent’s runtime to avoid network dependence.


    This crate is crucial for ensuring cost control, compatibility, and prompt efficiency when using local or remote models.


    ---

    9. llama-rs – Inference with LLaMA Models on CPU/GPU

    Purpose:
    llama-rs is a Rust implementation of Meta’s LLaMA model loader and inference engine. It runs GGUF-format models on CPUs or GPUs, leveraging ggml under the hood. It supports LLaMA 2/3, Mistral, and others via quantized models.

    Use in AI Agent:

    Offline Reasoning: Run the agent entirely locally without calling OpenAI or Anthropic.

    Faster Prototyping: Eliminate latency and rate limits for internal reasoning.

    Contextual Inference: Perform code completion, prompt evaluation, or self-reflection without internet.

    Sandboxed Testing: Run LLMs in secure, reproducible environments.


    Example:

    let model = llama_rs::Model::load("llama.gguf")?;
    let result = model.infer(prompt)?;

    Design Considerations:

    Use quantized models (e.g., Q4_K_M) to optimize memory and speed.

    Ensure llama.cpp compatibility if building cross-platform (e.g., Apple M1, Linux AMD).

    Add a layer for streaming tokens to simulate real-time inference.


    llama-rs gives your agent LLM autonomy — removing external dependencies and enabling private, controllable deployments.


    ---

    10. llm by Hugging Face / Rust community – Generalized GGML Model Runner

    Purpose:
    llm is a Rust crate that supports multiple GGML-format models like GPT-J, LLaMA, and Mistral. It provides a modular interface for loading, running, and sampling from models.

    Use in AI Agent:

    Pluggable Backend: Use llm to abstract over multiple model types with a common API.

    Inference Serving: Power a background reasoning thread or server that accepts tasks from the main agent loop.

    Prompt Completion: Let agent tasks like tool generation or plan creation run against local LLMs using consistent token streams.


    Example:

    use llm::{Model, InferenceSession};

    let model = llm::load("mistral.gguf")?;
    let mut session = model.start_session(Default::default());
    let response = session.infer("Build a Dockerfile for a Flask app.")?;

    Design Considerations:

    Integrate with tokio for async serving.

    Choose between GPU/CPU and match system memory to model size.

    Add prompt history, streaming output, and embedding layers for full integration.


    When combined with rust-tokenizers, tokio, and serde, llm enables you to build a standalone LLM agent with no external cloud requirement.


    ---

    ✅ Final Thoughts

    With these 10 libraries, you can build an AI agent framework in Rust that is:

    ✅ Fully asynchronous (tokio)

    ✅ LLM-capable (reqwest, llm, llama-rs)

    ✅ Structured and typed (serde, serde_json)

    ✅ Tool-using (duct)

    ✅ Memory-backed (qdrant-client, tantivy)

    ✅ Code-aware (tree-sitter)

    ✅ Token-efficient (rust-tokenizers)


    Would you like a project scaffold or folder structure next, or perhaps a basic agent runtime loop with these tools wired in?

    Here’s a clean, modular Rust project scaffold tailored for building an AI Agent framework using the libraries we discussed. It’s designed for clarity, scalability, and async-first operation, with separate crates or modules for core concerns.


    ---

    Project Scaffold: rust-ai-agent

    rust-ai-agent/
    ├── Cargo.toml
    ├── src/
    │   ├── main.rs                  # Entry point: agent runtime loop
    │   ├── agent/                   # Core agent logic: planning, reasoning
    │   │   ├── mod.rs
    │   │   ├── planner.rs           # LLM prompt crafting & plan generation
    │   │   ├── executor.rs          # Tool / shell command executor
    │   │   ├── memory.rs            # Long-term memory interface
    │   │   └── tokenizer.rs         # Tokenizer wrapper (rust-tokenizers)
    │   ├── tools/                   # Tool integrations (shell, API clients)
    │   │   ├── mod.rs
    │   │   ├── shell.rs             # Shell command abstraction (duct)
    │   │   ├── http_client.rs       # HTTP requests (reqwest)
    │   │   ├── qdrant.rs            # Vector DB client
    │   │   └── file_ops.rs          # File system utilities (read/write)
    │   ├── llm/                     # LLM model interaction layer
    │   │   ├── mod.rs
    │   │   ├── api.rs               # Remote LLM API wrappers (OpenAI, Anthropic)
    │   │   ├── local.rs             # Local model inference (llama-rs, llm)
    │   │   └── tokenizers.rs        # Tokenization helpers
    │   ├── parsing/                 # Code parsing (tree-sitter)
    │   │   ├── mod.rs
    │   │   └── parser.rs
    │   ├── memory/                  # Symbolic and semantic memory implementations
    │   │   ├── mod.rs
    │   │   ├── vector_store.rs      # qdrant client wrapper
    │   │   └── full_text_search.rs  # tantivy wrapper
    │   └── utils/                   # Utilities (logging, config, errors)
    │       ├── mod.rs
    │       ├── config.rs
    │       └── logging.rs
    ├── examples/                    # Example usages and tests
    │   └── basic_agent.rs
    ├── scripts/                     # Helper scripts for setup, deployment
    │   └── setup_env.sh
    ├── README.md
    └── .env                        # Environment variables (API keys, tokens)


    ---

    Description of key components:

    main.rs: Launches the async tokio runtime and runs the agent control loop, coordinates planning and execution.

    agent/planner.rs: Creates prompts, calls LLM APIs or local models, interprets results into executable plans.

    agent/executor.rs: Runs shell commands (via duct), calls tools, manages subprocesses.

    agent/memory.rs: Wraps vector and symbolic memory, interfaces with qdrant-client and tantivy.

    tools/shell.rs: Low-level shell command abstraction.

    tools/http_client.rs: Wraps reqwest for robust HTTP with retries and auth.

    llm/api.rs: Handles OpenAI/Anthropic API interactions.

    llm/local.rs: Local LLM inference using llama-rs or llm.

    parsing/parser.rs: Uses tree-sitter to parse source code for safe edits.

    utils/config.rs: Loads environment variables and configuration securely.

    utils/logging.rs: Centralized async logging setup.



    ---

    Example main.rs skeleton snippet

    #[tokio::main]
    async fn main() -> anyhow::Result<()> {
      let config = utils::config::load().await?;
      utils::logging::init(&config)?;

      let mut agent = agent::Agent::new(config).await?;
      agent.run_loop().await?;

      Ok(())
      }


      ---

      This scaffold gives you a clean separation of concerns, testability, and async safety. You can build incrementally:

      Start with a simple loop calling remote LLM with reqwest

      Add shell command execution with duct

      Integrate qdrant and tantivy as memory layers

      Add local inference with llama-rs or llm

      Enable source code editing with tree-sitter