Exploring Agentic AI and Building a Personal Agent Stack
Exploring Agentic AI and Building a Personal Agent Stack
This document is a running exploration of modern agentic AI systems and the practical path toward building a small personal agent stack.
Traditional AI systems are primarily designed to answer prompts. Agentic AI systems go further: they pursue goals, break work into steps, use tools, remember context, and in some cases take actions in the real world.
Agentic systems can:
- plan tasks
- use external tools
- maintain memory
- collaborate with other agents
- evaluate their own work
- iterate toward complex goals
This document is intentionally topic-oriented so readers can jump directly to the sections most relevant to them.
Topics
- Topic 1 — Real Agentic AI Systems You Can Run Today
- Topic 2 — Most Powerful Open-Source Agent Frameworks
- Topic 3 — Building Your Own AI Agent in ~100 Lines of Python
- Topic 4 — The Most Advanced Agent Tools Available Today
- Topic 5 — Modern Agent Architecture
- Topic 6 — Comparison of Major Agent Frameworks
- Topic 7 — Real-World Agent Workflow Example
- Topic 8 — Where Agentic AI Is Going
- Topic 9 — Where an Agent Stack Actually Lives
- Topic 10 — Sample Walkthrough: Building a Small Personal Agent Stack
Topic 1 — Real Agentic AI Systems You Can Run Today
These are working agent platforms you can study or run today. They differ in architecture and maturity, but they all help illustrate the move from prompt-response systems to goal-driven AI execution.
Full Autonomous AI Agents
These attempt something close to end-to-end goal completion.
OpenDevin
OpenDevin is one of the closest open-source efforts to reproduce the capabilities demonstrated by autonomous coding agents such as Devin.
Capabilities include:
- writing software
- running code
- debugging
- browsing the web
- interacting with files
- executing terminal commands
Example goal:
Build a website listing Ohio metal roofing suppliers.
A system like this might:
- research suppliers
- generate code
- run tests
- fix bugs
- deploy the application
Repository
https://github.com/OpenDevin/OpenDevin
AutoGPT (modern versions)
AutoGPT was one of the earliest agent systems to popularize the idea of autonomous task loops. Early versions were often unstable, but later versions became more structured.
Capabilities include:
- multi-step planning
- task persistence
- tool use
- web research
- long task execution
Example task:
Research skylight panel replacements compatible with Regal Rib roofing.
Repository
https://github.com/Significant-Gravitas/AutoGPT
MetaGPT
MetaGPT models a task as if it were being handled by an entire AI software company.
Typical roles include:
| Agent Role | Responsibility |
|---|---|
| CEO | defines project goal |
| Product Manager | writes requirements |
| Engineer | implements code |
| QA | tests output |
Example task:
Create a marketplace for building materials.
Repository
https://github.com/geekan/MetaGPT
Multi-Agent Collaboration Systems
These systems use teams of specialized agents instead of one general-purpose agent.
CrewAI
CrewAI has become one of the most intuitive frameworks for modeling AI systems as teams.
Example crew:
Researcher
Engineer
Writer
Manager
Example goal:
Find compatible skylight panels for Regal Rib roofing.
Possible workflow:
- research agent gathers panel specifications
- engineering agent compares profile compatibility
- writer agent drafts supplier outreach emails
Repository
https://github.com/joaomdmoura/crewAI
Microsoft AutoGen
AutoGen is designed around agents talking to one another in structured loops.
Typical interaction pattern:
User
↓
Planner
↓
Executor
↓
Critic
The system can use one agent to propose an approach, another to execute it, and another to evaluate or improve the result.
Repository
https://github.com/microsoft/autogen
Tool-Using Agent Systems
These systems are especially focused on letting the model act through tools.
LangGraph
LangGraph is one of the most important frameworks in modern agent engineering because it combines LLM reasoning with structured workflow control.
Capabilities include:
- deterministic execution
- branching workflows
- retries
- memory integration
- tool execution
Repository
https://github.com/langchain-ai/langgraph
Open Interpreter
Open Interpreter allows an AI system to interact with a computer environment more directly.
Capabilities include:
- running Python
- executing shell commands
- modifying files
- launching applications
Example task:
Analyze a spreadsheet and generate a dashboard.
Repository
https://github.com/OpenInterpreter/open-interpreter
Knowledge and Research Agents
These systems emphasize retrieval and reasoning over information sources.
LlamaIndex Agents
LlamaIndex focuses on connecting AI systems to documents, archives, and knowledge stores.
Example use cases:
- document analysis
- research automation
- internal knowledge assistants
- engineering documentation lookup
Repository
https://github.com/run-llama/llama_index
SuperAGI
SuperAGI is more of an orchestration platform for managing multiple agent workflows.
Features include:
- dashboards
- task monitoring
- memory systems
- tool integration
Repository
https://github.com/TransformerOptimus/SuperAGI
Experimental Systems
BabyAGI (modern versions)
BabyAGI explored recursive task generation:
Task
↓
Agent executes
↓
Agent generates new tasks
↓
Loop
This style of self-expanding task tree was an influential idea in early agent experimentation.
Repository
https://github.com/yoheinakajima/babyagi
Most Practical Systems Right Now
For real-world projects, the most practical and influential systems in this landscape include:
- CrewAI
- LangGraph
- AutoGen
- OpenDevin
- Open Interpreter
These represent the shift from chat-oriented AI toward goal-oriented automation systems.
Topic 2 — Most Powerful Open-Source Agent Frameworks
Agent frameworks provide the infrastructure required to build reliable AI agents. They are the layer that helps coordinate:
- reasoning
- tool execution
- memory
- workflows
- collaboration
- monitoring
The most important open-source frameworks today tend to fall into a few categories.
LangGraph
LangGraph is arguably the most important modern framework for production-grade agent systems.
It was built to solve a core problem:
LLM agents are powerful, but they are often unpredictable.
LangGraph addresses that by combining:
- deterministic workflows
- graph-based execution
- state tracking
- LLM reasoning
Instead of relying on a vague loop, it structures execution as a graph of states.
Example:
User Request
↓
Planning Node
↓
Tool Execution Node
↓
Evaluation Node
↓
Final Output
Each node can:
- call a model
- execute a tool
- write memory
- branch to another state
That makes LangGraph well suited for:
- research assistants
- automation pipelines
- coding agents
- production orchestration
Repository
https://github.com/langchain-ai/langgraph
CrewAI
CrewAI specializes in collaborating teams of agents.
It maps naturally to the idea of a human team, where each participant has:
- a role
- a goal
- tools
- responsibilities
Example crew:
| Agent | Role |
|---|---|
| Researcher | finds information |
| Analyst | processes results |
| Writer | creates output |
| Manager | coordinates tasks |
This makes CrewAI especially approachable for users who think in terms of delegated responsibilities.
Repository
https://github.com/joaomdmoura/crewAI
Microsoft AutoGen
AutoGen is one of the most flexible frameworks for agent-to-agent conversations.
A common pattern is:
User
↓
Planner Agent
↓
Executor Agent
↓
Critic Agent
That structure is useful when a task benefits from iterative refinement, debate, or review.
AutoGen is especially strong for:
- coding agents
- complex reasoning loops
- collaborative problem solving
Repository
https://github.com/microsoft/autogen
Semantic Kernel
Semantic Kernel is Microsoft’s enterprise-oriented orchestration framework.
It emphasizes:
- reliability
- structured integrations
- business workflows
- internal tools and APIs
Supported languages include:
- Python
- C#
- Java
It is especially relevant when an organization wants AI systems integrated with:
- cloud platforms
- internal services
- databases
- business systems
Repository
https://github.com/microsoft/semantic-kernel
LlamaIndex
LlamaIndex focuses on knowledge retrieval and reasoning.
Its core value is helping agents connect to external knowledge sources and use them effectively. That makes it central to many RAG-oriented systems.
Typical use cases include:
- document assistants
- research agents
- engineering documentation search
- enterprise knowledge bases
Repository
https://github.com/run-llama/llama_index
Haystack
Haystack is a strong framework for search-heavy AI systems.
It is particularly useful for:
- document pipelines
- search-driven assistants
- RAG systems
- enterprise question answering
Repository
https://github.com/deepset-ai/haystack
Core Comparison
| Framework | Primary Strength |
|---|---|
| LangGraph | workflow orchestration |
| CrewAI | multi-agent collaboration |
| AutoGen | structured agent conversation |
| Semantic Kernel | enterprise orchestration |
| LlamaIndex | knowledge retrieval |
| Haystack | search pipelines |
Key Insight
These frameworks are not always best thought of as direct competitors. In many real systems, they become complementary layers in a larger stack.
Topic 3 — Building Your Own AI Agent in ~100 Lines of Python
One of the most useful things to understand is that an agent is not magic. At its core, an agent is often just a loop.
A minimal agent usually follows a pattern like this:
Goal
↓
Plan
↓
Use tools
↓
Observe result
↓
Repeat until done
This is closely related to the ReAct pattern:
Reason → Act → Observe
Core Components of a Minimal Agent
A small agent generally needs five things:
| Component | Purpose |
|---|---|
| LLM | reasoning and planning |
| Tools | actions the agent can take |
| Memory | track previous results |
| Planner | determine next step |
| Loop | repeat until the task is complete |
Minimal Conceptual Example
A tiny agent can:
- accept a goal
- ask a model for the next step
- execute an action
- store the result
- repeat
Example behavior:
Step 1: Search population of Columbus Ohio
Step 2: Write result to a file
Step 3: Return DONE
That may not look glamorous, but conceptually it captures the essence of an agent.
Why This Matters
Many people imagine agents as requiring huge frameworks from the start. In practice, most real systems begin with something much simpler:
LLM
+ loop
+ tools
+ memory
Everything else is an engineering refinement of that core pattern.
Modern Improvements
Production systems extend this pattern using:
- structured tool calling
- persistent memory
- multi-step planning
- self-reflection
- retries
- evaluation loops
That gives us the modern agent stack, but the minimal loop remains the foundation.
Topic 4 — The Most Advanced Agent Tools Available Today
While many frameworks and projects exist, a smaller set of tools stand out as especially important because they represent the leading edge of what agents can do in practice.
OpenDevin
OpenDevin is one of the most visible open-source attempts to approximate an autonomous AI software engineer.
Capabilities include:
- browsing the web
- writing code
- running commands
- editing files
- debugging programs
It is especially interesting because it treats software engineering as a multi-step autonomous workflow rather than a single prompt.
Repository
https://github.com/OpenDevin/OpenDevin
LangGraph
LangGraph is included here again because it is not just a framework in the abstract; it has become one of the most important practical tools for orchestrating reliable AI behavior.
It supports:
- long-running tasks
- branching workflows
- persistent state
- controlled execution
Repository
https://github.com/langchain-ai/langgraph
CrewAI
CrewAI matters because it provides a very usable abstraction for delegated agent teams.
This makes it attractive for users who want systems like:
- researcher agents
- analyst agents
- writer agents
- manager agents
Repository
https://github.com/joaomdmoura/crewAI
Open Interpreter
Open Interpreter is especially important because it blurs the line between a model and a computer operator.
It enables things like:
- terminal execution
- file editing
- Python execution
- local automation
Repository
https://github.com/OpenInterpreter/open-interpreter
AutoGen
AutoGen is powerful because it treats agent systems as conversations among specialized participants.
That makes it strong for tasks where:
- one agent proposes a plan
- another executes it
- another critiques it
- the loop repeats until the result improves
Repository
https://github.com/microsoft/autogen
Emerging Combined Stack
A modern stack often combines several of these:
LLM
+ LangGraph
+ CrewAI
+ LlamaIndex
+ tools
The important shift is that these tools are moving AI away from answering questions and toward executing goals.
Topic 5 — Modern Agent Architecture
Most modern agent systems follow a layered architecture. That architecture is important because it helps explain why agents behave differently from ordinary chat systems.
A common mental model looks like this:
User / Goal
↓
Planner
↓
Agent Loop
↓
Tools
↓
Environment
↓
Memory
Goal Layer
Everything starts with a goal.
Example:
Identify compatible skylight panels for Regal Rib roofing and contact suppliers.
The agent needs to convert that goal into smaller tasks.
Planning Layer
The planning layer breaks a high-level objective into steps.
Example:
1. Identify Regal Rib panel profile
2. Search skylight panel manufacturers
3. Compare compatibility
4. Collect supplier contacts
5. Draft outreach email
Some systems plan once at the beginning. Others re-plan dynamically after each step.
Agent Execution Loop
The execution loop is the core engine.
Typical pattern:
Think
↓
Choose tool
↓
Execute
↓
Observe result
↓
Update memory
↓
Repeat
That loop is what makes the system feel agentic rather than conversational.
Tools Layer
Tools are what allow the agent to do things in the world.
Examples include:
| Tool | Purpose |
|---|---|
| web search | gather information |
| Python | calculations and analysis |
| browser | interact with websites |
| filesystem | read and write files |
| communicate externally |
Without tools, an agent is really just a chatbot with ambition. Tools turn it into an actor.
Memory Layer
Memory is what allows multi-step continuity.
Two common forms:
Short-Term Memory
Used for:
- current context
- recent steps
- temporary task reasoning
Long-Term Memory
Used for:
- persistent knowledge
- saved facts
- execution histories
- vector retrieval
Why This Architecture Matters
A chat system usually looks like this:
Prompt → Response
An agent system looks more like this:
Goal → Plan → Act → Evaluate → Repeat
That difference is the heart of agentic AI.
Topic 6 — Comparison of Major Agent Frameworks
Different frameworks solve different parts of the agent problem. Understanding that makes it easier to choose the right one.
Framework Categories
Most frameworks fall into a few broad categories.
| Category | Purpose |
|---|---|
| workflow orchestration | control execution flow |
| multi-agent coordination | organize teams of agents |
| knowledge systems | retrieval and reasoning |
| enterprise orchestration | reliability and integration |
LangGraph
Category: workflow orchestration
LangGraph is optimized for controlled, structured execution. It is best when the task requires:
- state tracking
- branching
- retries
- reliability
- persistence
Best use cases:
- automation pipelines
- research agents
- coding workflows
CrewAI
Category: multi-agent coordination
CrewAI is best when you want a clear mental model of specialized roles.
Best use cases:
- research and writing teams
- analysis pipelines
- delegated task systems
AutoGen
Category: agent conversation framework
AutoGen shines when you want agents to critique, debate, and refine results.
Best use cases:
- collaborative reasoning
- coding assistants
- iterative problem solving
Semantic Kernel
Category: enterprise orchestration
Semantic Kernel is strongest when integration with business systems matters.
Best use cases:
- enterprise copilots
- internal workflow automation
- business process tooling
LlamaIndex
Category: knowledge retrieval framework
LlamaIndex is designed to connect models with documents and knowledge stores.
Best use cases:
- document analysis
- research assistants
- internal knowledge systems
Haystack
Category: search and retrieval pipelines
Haystack is especially valuable when search is central to the system.
Best use cases:
- enterprise search
- knowledge discovery
- support automation
Capability Comparison Table
| Framework | Strength | Primary Role |
|---|---|---|
| LangGraph | workflow control | execution engine |
| CrewAI | agent teams | task collaboration |
| AutoGen | agent conversation | reasoning loops |
| Semantic Kernel | enterprise orchestration | production systems |
| LlamaIndex | knowledge retrieval | RAG systems |
| Haystack | search pipelines | document discovery |
Practical Rule of Thumb
| If you want… | Use |
|---|---|
| reliable workflows | LangGraph |
| teams of agents | CrewAI |
| agent debates | AutoGen |
| enterprise systems | Semantic Kernel |
| document reasoning | LlamaIndex |
| search-heavy pipelines | Haystack |
Topic 7 — Real-World Agent Workflow Example
A practical workflow helps tie all the abstractions together.
Suppose the system receives this goal:
Identify skylight panels compatible with Regal Rib roofing and contact suppliers.
This is a good example because it is not a single-answer question. It requires:
- research
- analysis
- memory
- drafting communication
- possibly taking action
Step 1 — Goal Intake
The request enters the system as a task:
Goal:
Find compatible skylight panels for Regal Rib roofing
and contact suppliers.
Step 2 — Planning
The planner decomposes the task:
1. Identify Regal Rib panel profile
2. Search skylight panel manufacturers
3. Compare rib spacing compatibility
4. Collect supplier contact information
5. Draft supplier outreach email
Step 3 — Research Phase
The agent uses tools such as:
- web search
- document retrieval
- image analysis
- specification lookup
Example:
Search:
"skylight panels compatible with 12 inch rib metal roofing"
Step 4 — Analysis Phase
The agent compares what it found.
Example reasoning:
Regal Rib spacing = 12 inches
Candidate compatible profiles:
- AG Panel
- PBR Panel
- R Panel
Step 5 — Supplier Identification
The system compiles a structured list:
| Supplier | Product Type |
|---|---|
| ABC Metal Roofing | skylight panel |
| Metal Sales | polycarbonate panel |
| Fabral | translucent panel |
Step 6 — Action Phase
Now the system can act.
Example:
Draft outreach email to suppliers
If the system is connected to an email capability, it could generate and send supplier messages, possibly with approval gates.
Step 7 — Evaluation
The agent checks progress:
Goal progress: 80%
Remaining tasks:
- confirm compatibility
- finalize supplier contact list
Workflow Summary
User Goal
↓
Planning
↓
Research Tools
↓
Analysis
↓
Action
↓
Evaluation
↓
Repeat
This is what distinguishes agentic AI from a normal chatbot.
Topic 8 — Where Agentic AI Is Going
The systems we have now are probably the first generation of practical AI agents. Several trends point toward where the next generation is going.
Persistent Agents
Most current agents are task-bounded:
User request → agent runs → task ends
Future agents are likely to run more continuously:
Agent runs continuously
↓
monitors environment
↓
detects events
↓
acts autonomously
Examples include:
- monitoring markets
- tracking infrastructure
- watching data feeds
- managing systems continuously
Hierarchical Agent Systems
Future systems are likely to involve agents coordinating other agents.
Example:
Manager Agent
↓
Research Agents
↓
Analysis Agents
↓
Execution Agents
This begins to resemble a digital organization.
Agent Operating Systems
An emerging idea is that AI becomes the orchestrator for software, rather than merely one feature inside software.
Example:
User intent
↓
AI OS
↓
Apps and APIs
↓
Real-world actions
Autonomous Research Agents
These systems can increasingly:
- read papers
- extract findings
- compare sources
- generate analyses
- propose conclusions
This is relevant to:
- science
- finance
- intelligence analysis
- technical research
Agent Economies
Some researchers are exploring ecosystems where agents exchange services or information.
Example:
Agent A needs data
↓
Agent B provides data
↓
Agent C performs analysis
This is still experimental, but conceptually important.
Self-Improving Agents
A future direction is agents that refine strategies over time:
Execute task
↓
Evaluate outcome
↓
Update strategy
↓
Try again
Key Long-Term Shift
The big transition is from:
Information tools
to
Autonomous systems
That is the deeper significance of agentic AI.
Topic 9 — Where an Agent Stack Actually Lives
One of the most confusing things for new builders is understanding where an “agent stack” actually exists.
It is usually not one thing in one place. Instead, it is spread across multiple layers.
Local Machine Layer
This is what lives on your PC.
Examples include:
- Python environment
- project files
- source code
- local databases
- config files
- installed frameworks
- Docker containers
- scripts and tools
This is the part you control directly.
Session / Runtime Layer
This exists only while the agent is running.
Examples include:
- current reasoning context
- temporary plans
- tool outputs from the current run
- in-memory state
- intermediate files
This often disappears unless you save it.
Persistent Storage Layer
This is what survives across runs.
Examples include:
.envfiles- SQLite or Postgres
- vector databases
- saved memory
- logs
- Git repositories
This is what gives the system continuity.
Cloud Services Layer
Many agent stacks also depend on external services.
Examples include:
- LLM APIs
- email APIs
- search APIs
- hosted databases
- cloud storage
These are the outside services your stack can call.
Mental Model
A useful metaphor is:
Your PC = workshop
Runtime = what is currently on the workbench
Persistent storage = the shelves and filing cabinets
Cloud APIs = outside vendors and tools
Building the Stack Layer by Layer
The easiest way to think about building an agent stack is in layers.
Layer 1 — Model Layer
This is the reasoning engine.
Examples:
- OpenAI
- Anthropic
- local model via Ollama
Question:
What will do the reasoning?
Layer 2 — Orchestration Layer
This manages the workflow.
Examples:
- plain Python
- LangGraph
- CrewAI
- AutoGen
Question:
How will tasks be coordinated?
Layer 3 — Tool Layer
This gives the agent actions.
Examples:
- web search
- filesystem access
- Python execution
- APIs
- browser automation
Question:
What can the agent actually do?
Layer 4 — Memory Layer
This determines what persists.
Examples:
- JSON
- SQLite
- Postgres
- vector DB
Question:
What should survive between runs?
Layer 5 — Interface Layer
This is how you interact with the system.
Examples:
- terminal
- chat UI
- web dashboard
- scheduled jobs
Question:
How will the user talk to the agent?
Can Agents Automatically Configure the Stack?
Partially, yes.
Modern coding agents can often help with:
- project scaffolding
- writing config files
- installing packages
- generating integration code
- debugging errors
But they still struggle with:
- OS-specific issues
- Windows path problems
- permissions
- authentication flows
- credential setup
- local environment quirks
So today the realistic model is usually:
AI-assisted setup, not fully hands-off autonomous setup.
Practical Build Order
A sensible progression is:
1. Pick one model
2. Build one simple loop
3. Add one or two tools
4. Add persistence
5. Add a nicer interface
6. Add multiple agents later
That is much safer than beginning with a giant multi-agent stack.
Topic 10 — Sample Walkthrough: Building a Small Personal Agent Stack
This section turns the ideas in the document into a practical walkthrough.
The purpose is not to build a massive production system immediately. The purpose is to show how a useful personal stack can evolve step by step.
We will use the phased progression discussed earlier:
Phase A — Minimal agent
Phase B — Add persistent memory
Phase C — Add a real external tool
Phase D — Expand into multi-agent orchestration
Suggested Project Layout
agent-stack/
├─ phase-a-agent.py
├─ phase-b-memory-agent.py
├─ phase-c-tool-agent.py
├─ agent_memory.db
└─ .env
Phase A — Minimal Agent
This phase creates the smallest useful agent.
Stack
Python
+
LLM API
+
simple reasoning loop
+
terminal interface
What It Does
This version can:
- receive a goal
- reason about the goal
- suggest a next step
It does not persist memory yet.
Architecture
User Goal
↓
Agent Loop
↓
LLM Reasoning
↓
Output
Runnable Example
# /// script
# dependencies = ["openai"]
# ///
from openai import OpenAI
client = OpenAI()
goal = input("Enter a goal for the agent: ").strip()
prompt = f"""
You are a simple AI agent.
Goal:
{goal}
Suggest the single best next step toward completing this goal.
Be concise and practical.
"""
response = client.responses.create(
model="gpt-4.1",
input=prompt,
)
print("\nAgent suggestion:\n")
print(response.output_text)
Run It
uv run phase-a-agent.py
Why This Matters
Even though it is tiny, this already demonstrates the core agent idea:
The system is reasoning about how to pursue a goal, not merely answering a question.
Phase B — Add Persistent Memory
The next step is to give the agent continuity.
Updated Stack
Python
+
LLM API
+
agent loop
+
SQLite memory
What Changes
Now the system can store:
- previous goals
- previous outputs
- notes from past runs
Architecture
User Goal
↓
Agent Loop
↓
LLM Reasoning
↓
Memory Store (SQLite)
↓
Output
Runnable Example
# /// script
# dependencies = ["openai"]
# ///
import sqlite3
from openai import OpenAI
client = OpenAI()
conn = sqlite3.connect("agent_memory.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS memory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
goal TEXT NOT NULL,
note TEXT NOT NULL
)
""")
conn.commit()
goal = input("Enter a goal for the agent: ").strip()
cursor.execute(
"SELECT goal, note FROM memory ORDER BY id DESC LIMIT 5"
)
recent_items = cursor.fetchall()
recent_context = "\n".join(
f"- Goal: {g}\n Note: {n}" for g, n in recent_items
) or "No prior memory."
prompt = f"""
You are a simple AI agent.
Current goal:
{goal}
Recent memory:
{recent_context}
Suggest the single best next step toward completing the current goal.
Be concise and practical.
"""
response = client.responses.create(
model="gpt-4.1",
input=prompt,
)
suggestion = response.output_text.strip()
cursor.execute(
"INSERT INTO memory (goal, note) VALUES (?, ?)",
(goal, suggestion)
)
conn.commit()
print("\nAgent suggestion:\n")
print(suggestion)
conn.close()
Run It
uv run phase-b-memory-agent.py
Why This Matters
This is the point where the system starts to feel less like a stateless prompt and more like a persistent assistant.
Phase C — Add a Real External Tool
Now we add a capability that allows the system to do something beyond thinking and remembering.
For a safe introductory example, the tool will draft and simulate sending an email rather than actually sending one. That keeps the example simple while still showing the mechanics.
Updated Stack
Python
+
LLM API
+
agent loop
+
SQLite memory
+
tool execution
Architecture
User Goal
↓
Agent Reasoning
↓
Draft Message
↓
Tool Call
↓
Stored Result
Runnable Example
# /// script
# dependencies = ["openai"]
# ///
import sqlite3
from openai import OpenAI
client = OpenAI()
conn = sqlite3.connect("agent_memory.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS memory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
goal TEXT NOT NULL,
note TEXT NOT NULL
)
""")
conn.commit()
def send_email(to: str, subject: str, body: str) -> str:
print("\n--- EMAIL TOOL (SIMULATED) ---")
print("To:", to)
print("Subject:", subject)
print(body)
print("--- END EMAIL TOOL ---\n")
return f"Simulated email sent to {to} with subject '{subject}'"
goal = input("Enter a goal for the agent: ").strip()
cursor.execute(
"SELECT goal, note FROM memory ORDER BY id DESC LIMIT 5"
)
recent_items = cursor.fetchall()
recent_context = "\n".join(
f"- Goal: {g}\n Note: {n}" for g, n in recent_items
) or "No prior memory."
prompt = f"""
You are a practical AI agent.
Current goal:
{goal}
Recent memory:
{recent_context}
A tool is available:
send_email(to, subject, body)
If drafting an email would help move the goal forward, produce:
TO: ...
SUBJECT: ...
BODY:
...
Otherwise, produce:
NEXT_STEP: ...
Be practical and concise.
"""
response = client.responses.create(
model="gpt-4.1",
input=prompt,
)
decision = response.output_text.strip()
print("\nAgent output:\n")
print(decision)
if decision.startswith("TO:"):
lines = decision.splitlines()
to = lines[0].replace("TO:", "").strip()
subject = lines[1].replace("SUBJECT:", "").strip()
body_lines = []
body_started = False
for line in lines[2:]:
if line.startswith("BODY:"):
body_started = True
body_lines.append(line.replace("BODY:", "").lstrip())
elif body_started:
body_lines.append(line)
body = "\n".join(body_lines).strip()
tool_result = send_email(to, subject, body)
note = f"{decision}\n\nTool result: {tool_result}"
else:
note = decision
cursor.execute(
"INSERT INTO memory (goal, note) VALUES (?, ?)",
(goal, note)
)
conn.commit()
conn.close()
Run It
uv run phase-c-tool-agent.py
Why This Matters
At this point the system becomes an action-capable agent.
The conceptual shift is:
LLM reasoning
→ tool execution
That is the moment where the system stops being just a smart text generator and becomes something more operational.
Phase D — Expand the System
Only after the single-agent version works reliably does it make sense to expand into a larger stack.
Possible additions include:
LangGraph → workflow orchestration
CrewAI → specialized agent teams
vector databases → richer memory
web search tools → automated research
approval gates → human review before actions
The architecture might evolve into:
User Interface
↓
Agent Framework
↓
Planner Agent
↓
Worker Agents
↓
Tools and APIs
↓
Persistent Memory
Why This Build Order Is Important
Many people try to begin here:
LangGraph
CrewAI
AutoGen
vector DB
search API
browser tools
email APIs
But a more realistic and durable path is:
Python
+
LLM
+
one tool
Then grow the system gradually.
Core Takeaway
Most strong agent stacks do not begin as giant systems. They begin small, prove one capability at a time, and expand only after each layer works.
Final Thoughts
Agentic AI represents a shift from software that merely responds to software that can pursue goals, coordinate actions, and complete tasks.
The most useful way to understand this space is not just through hype or tool lists, but through a layered view:
- what agent systems exist
- how the frameworks differ
- how agent architecture works
- where the stack actually lives
- how to build a small version yourself
This document is intended to grow over time and can later be reorganized for the best final narrative flow.