How to Get Started in AI: Building an AI Agent from Scratch

Posted by:Bhanu Chaddha Posted on:October 12, 2024 Comments:0
How to Get Started in AI: Building an AI Agent from Scratch

Artificial intelligence (AI) agents are revolutionizing software development by allowing developers to offload complex, multi-step processes to a system that can reason, act, and respond autonomously. These agents can interact with tools, perform calculations, and automate decision-making in ways that make software more efficient and scalable. One key aspect of modern AI agents is their ability to reason through problems and choose which tools to use to solve them.

In my previous post, “AI vs Traditional Application Development”, I discussed how artificial intelligence (AI) is shifting the landscape of software development. Unlike traditional systems where developers explicitly program every decision path, AI-driven systems, such as agents, can reason and act autonomously. This capability frees developers from writing exhaustive code for every potential scenario and allows agents to dynamically handle complex tasks.

Continuing from that post, this article dives deeper into how to build an AI agent from scratch. This agent will be able to reason through tasks, select the appropriate tools, and provide meaningful output based on user queries. By automating decision-making processes, AI agents can significantly modernize the software development process.

In this article, we will cover:

  1. Tool Integration: Defining the tools the agent can use.
  2. Task Breakdown: Enabling the agent to reason about and decompose tasks.
  3. Execution: Allowing the agent to select and use the appropriate tool or function.
  4. Response Generation: How the agent outputs structured, meaningful responses based on its actions.

Finally, we will create AI agent, which calls the tools in a loop to achieve the desired result.

Step 1: Defining Core Functions

To start, we define two essential functions that our agent will use to solve user queries. The first function calculates the distance between cities, and the second estimates the time it will take to travel a given distance using a specific vehicle.

Distance Calculation

This function returns predefined distances between pairs of cities. The agent will use this data to calculate travel times based on user input.

def distance_between(city_a, city_b):
    distances = {
        ("Paris", "London"): 34,   # 340 km
        ("Paris", "New York"): 584,  # 5840 km
        ("London", "New York"): 560,  # 5600 km
        ("London", "Tokyo"): 955,  # 9550 km
        ("New York", "Tokyo"): 1084,  # 10840 km
        ("Paris", "Tokyo"): 971  # 9710 km
    }

    city_pair = (city_a, city_b)
    reverse_pair = (city_b, city_a)

    if city_pair in distances:
        return distances[city_pair]
    elif reverse_pair in distances:
        return distances[reverse_pair]
    else:
        return -1  # Not found

Time Calculation

This function calculates the time it will take to travel a specified distance using a particular vehicle (cycle, car, train, or plane).

def time_taken_by_vehicle(vehicle, distance_km):
    vehicle_speeds = {
        "cycle": 15,
        "car": 100,
        "train": 150,
        "plane": 900
    }

    try:
        distance_km = float(distance_km)
    except ValueError:
        return -1

    if vehicle in vehicle_speeds:
        speed = vehicle_speeds[vehicle]
        time_hours = distance_km / speed
        return round(time_hours, 2)
    else:
        return -1  # Unknown vehicle

Known Actions

These two functions are part of the agent’s “toolbox.” The agent can call these functions dynamically based on the user’s request.

known_actions = {
    "distance_between": distance_between,
    "time_taken_by_vehicle": time_taken_by_vehicle
}

Step 2: Building the AI Agent

How to Get Started in AI: Building an AI Agent from Scratch Diagram
How to Get Started in AI: Building an AI Agent from Scratch Diagram

Now, we move on to constructing the Agent class and Query method. These are responsible for receiving user queries, processing them, and interacting with available tools to generate an output.

System Message

The agent uses a system message to guide its reasoning process. It defines how the agent will loop through Thought, Action, and Observation to reach a result.

system_message="""
You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run one of the actions available to you - then return PAUSE.
Observation will be the result of running those actions.

Your available actions are:

distance_between
e.g distance_between: a,b
returns the distance between city a and b in km

time_taken_by_vehicle:
e.g time_taken_by_vehicle: cycle,5
returns the time taken to travel the given distance by provided vehicle 

Example Session:
Question: What is the distance between City1 and City2?
Thought: I can use distance_between to look the ditance between City1 and City2.
Action: distance_between: City1,City2
PAUSE

Observation: 5

You then output:

Answer: 5 km

""".strip()

Agent Class

The Agent class processes the input, reasons through it using the system message. This class keep the history of all the messages and interact with LLM with complete message history.

class Agent:
    def __init__(self, system_message):
        self.system_message = system_message
        self.all_messages = []
        if system_message:
            self.all_messages.append({"role": "system", "content": system_message})

    def __call__(self, message):
        self.all_messages.append({"role": "user", "content": message})
        response = self.call_llm()
        self.all_messages.append({"role": "assistant", "content": response})
        return response

    def call_llm(self):
        chat_completion = client.chat.completions.create(
            model="gpt-4o",
            temperature=0,
            messages=self.all_messages
        )
        result = chat_completion.choices[0].message.content
        return result

Step 3: The query Function: Representing the Agent

Now, let’s introduce the query function, which serves as a loop that represents how the agent operates. It processes the user’s question, interacts with the agent, and executes any actions returned by the agent.

What Happens in the query Function

The query function:

  • Initializes the agent and takes a user question.
  • Processes the agent’s response to extract the required action.
  • Executes the action by calling the appropriate function (e.g., distance_between or time_taken_by_vehicle).
  • Loops through these steps until the agent resolves the query or hits the maximum number of turns.

query Code Example

action_re = re.compile(r"Action:s*(w+):s*(.+)")

def query(question, max_turns=5):
    i = 0
    bot = Agent(system_message)
    next_prompt = question
    while i < max_turns:
        i += 1
        result = bot(next_prompt)
        print(result)

        # Find all actions in the result
        actions = []
        for line in result.split('n'):
            match = action_re.match(line)
            if match:
                actions.append(match)

        if actions:
            # Extract the first action and input
            action, raw_input = actions[0].groups()

            if action not in known_actions:
                raise Exception(f"Unknown action: {action}: {raw_input}")

            # Split the raw input (e.g., 'cycle,5') by commas
            action_input = raw_input.split(',')

            # Pass the arguments to the action
            print(f" -- running {action} with input {action_input}")
            if len(action_input) == 1:
                observation = known_actions[action](action_input[0])  # Single parameter
            else:
                observation = known_actions[action](*action_input)  # Multiple parameters

            print("Observation:", observation)

            # Update the next prompt for the bot
            next_prompt = f"Observation: {observation}"
        else:
            print("DEBUG: No action found. Exiting")
            # No more actions, end the loop
            return

Step 4: Running the Agent

Query:

query("how much time it will take to travel between Paris and Tokyo via cycle")

Output:

Thought: To find out the time it will take to travel between Paris and Tokyo via cycle, I first need to determine the distance between these two cities. Then, I can use the time_taken_by_vehicle action to calculate the time based on that distance and the speed of a cycle.
Action: distance_between: Paris,Tokyo
PAUSE
 -- running distance_between with input ['Paris', 'Tokyo']
Observation: 971
Thought: Now that I have the distance between Paris and Tokyo, which is 971 km, I can calculate the time it will take to travel this distance by cycle using the time_taken_by_vehicle action.
Action: time_taken_by_vehicle: cycle,971
PAUSE
 -- running time_taken_by_vehicle with input ['cycle', '971']
Observation: 64.73
Answer: It will take approximately 64.73 hours to travel between Paris and Tokyo via cycle.
DEBUG: No action found. Exiting

Query:

query("how much time it will take to travel from Tokyo to London via Paris on cycle")

Output:

Thought: To find the total time taken to travel from Tokyo to London via Paris on a cycle, I need to first find the distance from Tokyo to Paris, then from Paris to London. After that, I will calculate the time taken for each leg of the journey using a cycle and sum them up.

Action: distance_between: Tokyo,Paris
PAUSE
 -- running distance_between with input ['Tokyo', 'Paris']
Observation: 971
Action: distance_between: Paris,London
PAUSE
 -- running distance_between with input ['Paris', 'London']
Observation: 34
Action: time_taken_by_vehicle: cycle,971
PAUSE
 -- running time_taken_by_vehicle with input ['cycle', '971']
Observation: 64.73
Action: time_taken_by_vehicle: cycle,34
PAUSE
 -- running time_taken_by_vehicle with input ['cycle', '34']
Observation: 2.27
Answer: The total time taken to travel from Tokyo to London via Paris on a cycle is approximately 67 hours.
DEBUG: No action found. Exiting

Query:


query("how much time it will take to travel from Tokyo to London via Paris. I can travel via plane or train. calculate time and give me conparison.",10)

Output:

Thought: To calculate the travel time from Tokyo to London via Paris, I need to find the distances between Tokyo and Paris, and Paris and London. Then, I will calculate the time taken by both plane and train for these distances.

Action: distance_between: Tokyo,Paris
PAUSE
 -- running distance_between with input ['Tokyo', 'Paris']
Observation: 971
Thought: The distance from Tokyo to Paris is 971 km. Now, I need to find the distance from Paris to London.

Action: distance_between: Paris,London
PAUSE
 -- running distance_between with input ['Paris', 'London']
Observation: 34
Thought: The distance from Paris to London is 34 km. Now, I will calculate the time taken to travel these distances by plane and train.

First, I will calculate the time taken by plane for the total distance (Tokyo to Paris to London).

Action: time_taken_by_vehicle: plane,1005
PAUSE
 -- running time_taken_by_vehicle with input ['plane', '1005']
Observation: 1.12
Thought: The time taken to travel from Tokyo to London via Paris by plane is 1.12 hours. Now, I will calculate the time taken by train for the same route.

Action: time_taken_by_vehicle: train,1005
PAUSE
 -- running time_taken_by_vehicle with input ['train', '1005']
Observation: 6.7
Answer: The time taken to travel from Tokyo to London via Paris is approximately 1.12 hours by plane and 6.7 hours by train.
DEBUG: No action found. Exiting

Conclusion: Agents and Automation in Software Development

This article explains how an AI agent can autonomously reason, execute tasks, and iterate in a loop to solve user queries. With the query function, we see how the agent represents a dynamic process of calling external tools and updating results. This functionality is at the core of how modern software development can be automated with AI agents.

In future articles, we will explore how to simplify this process using LangGraph, making it easier to build and deploy agents like the one we’ve created here.

Category