How to Talk to Robots: A Guide to Interacting with AI (and Code)

The landscape of technology has undergone a seismic shift with the rise of AI-driven systems, particularly those powered by Large Language Models (LLMs). These models, such as OpenAI's GPT, Anthropic's Claude, and Google's Bard, have transformed how we interact with machines. Whether you're a developer seeking efficient ways to write code or a business user exploring AI-powered chatbots, understanding how to talk to these robots effectively can unlock immense potential.

This blog will dive into the nuts and bolts of LLMs, their applications, prompt strategies, tool integration, limitations, and frameworks like LangChain that simplify building complex workflows. By the end, you'll have actionable insights into interacting with AI systems, crafting better prompts, and overcoming common challenges.

What Are Large Language Models (LLMs)?

Large Language Models are AI systems trained on vast datasets to understand and generate human-like text. They work by predicting the next token (a word, part of a word, or a symbol) in a sequence, which allows them to perform tasks ranging from writing essays to assisting with coding.

Key Characteristics of LLMs:

Pre-trained: LLMs are trained on enormous datasets and fine-tuned for specific tasks.
Stateless: They don’t have long-term memory; they rely only on the context provided in the current session.
Context Window: They can process a limited number of tokens (words) at a time, typically thousands, but not unlimited.

Applications of LLMs

LLMs have versatile use cases across industries:

Text Generation: Crafting essays, marketing copy, or creative content.
Natural Language Understanding: Sentiment analysis, summarization, and text classification.
Code Assistance: Debugging, autocompletion, and explanation of code snippets.
Chatbots: Building customer service bots that simulate human-like conversations.
Search and Retrieval: Enhancing search engines by understanding user intent.

Popular LLM Providers

Claude: Known for safety and nuanced responses.
OpenAI Models: Flagship models like GPT-4 and GPT-3.5 turbo, widely used for general-purpose tasks.
Meta’s Llama: Focused on open-source availability for customizable deployments.

Other notable options include Google's Gemini, Whisper for audio-to-text tasks, and Phi2 for lightweight AI on mobile devices.

How Do LLMs Work?

At their core, LLMs follow a client-server architecture:

Client: Sends a message (query or command) to the LLM.
LLM: Processes the message and returns a response.

Example:

Basic Text Completion:

import openai
from dotenv import load_dotenv

load_dotenv()

client = openai.OpenAI()

res = client.chat.completions.create(
            model='gpt-3.5-turbo',
            messages=[{
                       "role": "user", 
                       "content": "What is the capital of France?"
                       }
                    ],
        )
        

print(res.choices)

Chat Context: To maintain a chat-like experience, developers use a chat history where all previous messages are appended as context:

import openai
from dotenv import load_dotenv

load_dotenv()

client = openai.OpenAI()

messages = []

        
while True:
    user_input = input("You: ")
    if user_input.lower() in ['quit', 'exit', 'bye']:
        print("Assistant: Goodbye!")
        break
    if user_input.strip() == "":
        continue
    messages.append({"role": "user", "content": user_input})
    response = res = client.chat.completions.create(
            model='gpt-3.5-turbo',
            messages=messages,
        )
    print(f"Assistant: {response.choices[0].message.content}")
    messages.append({"role": "assistant", "content": response.choices[0].message.content})  

print(res.choices)

Prompt Strategies: Getting the Best Out of LLMs

Prompts are the instructions you provide to guide the model’s behavior. Well-crafted prompts improve the quality of responses significantly.

Common Strategies:

Zero-Shot Prompting:
- Example: Translate this text to French: "Hello, world."
Few-Shot Prompting:
- Example: Provide a few examples in the prompt to guide the model.
```
Convert temperatures:
- 0°C is 32°F
- 100°C is ?
```

Chain of Thought:

Encourage step-by-step reasoning.

A train travels 60 miles per hour for 2.5 hours. Calculate the distance.
Let's think step by step.

Structured Responses:

Use prompts that enforce output structure.

Provide a JSON object with the following details:
{
    "title": "",
    "author": "",
    "summary": ""
}

Augmenting LLMs with Tools

While LLMs are powerful, they can be further enhanced by integrating tools to overcome limitations. For instance, they might use:

Web Search: Fetching live data.
Database Queries: Retrieving structured information.
IoT Integration: Controlling smart devices.

Example:

Showing the current time:

import openai
from dotenv import load_dotenv
import time
import logging
import json

load_dotenv()

client = openai.OpenAI()

messages = [{"role": "system", "content": "You are a chat assistant. You can answer questions and provide information."}]

tools = [
  {
    'type': 'function',
    'function': {
      'name': 'get_current_time',
      'description': 'Clock',
      'parameters': {
        'type': 'object',
        'properties': {
        },
        'required': [],
      },
    },
  },
]

print("ASK: What is the current time?")

def get_current_time() -> str:
    """Get the current time"""
    current_time = time.strftime("%A, %m/%d/%Y, %H:%M:%S")
    return f"{current_time}"

while True:
    user_input = input("You: ")
    if user_input.lower() in ['quit', 'exit', 'bye']:
        print("Assistant: Goodbye!")
        break
    if user_input.strip() == "":
        continue
    messages.append({"role": "user", "content": user_input})
    response = res = client.chat.completions.create(
            model='gpt-4o',
            messages=messages,
            tools=tools
        )
    message = response.choices[0].message


    if message.tool_calls:
        for tool_call in message.tool_calls:
        # Handle tool calls

            # print(f"Got tool call: {message}")
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
            print(f"Calling function: {function_name} {function_args}")
            result = globals()[function_name](**function_args)

            messages.append(message.model_dump())
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": function_name,
                "content": str(result),
            })
            response = res = client.chat.completions.create(
                model='gpt-4o',
                messages=messages,
                tools=tools
            )

        print(f"Assistant: {response.choices[0].message.content}")
    
    else:
        print(f"Assistant: {response.choices[0].message.content}")
        messages.append({"role": "assistant", "content": response.choices[0].message.content})

Here, the tool retrieves and returns the data in response to the user's query.

Overcoming LLM Limitations

Despite their utility, LLMs have limitations:

Context Window: They can only process a limited number of tokens.
Dependency on Model Decisions: Sometimes, models hallucinate (generate incorrect but plausible information).

Solution: Retrieval-Augmented Generation (RAG)

RAG mitigates these issues by combining LLMs with a Vector Database:

Store relevant documents as embeddings.
Retrieve the most relevant ones for the LLM to use.

Workflow:

Query → Vector DB retrieves relevant documents.
Documents + Query → LLM generates a more informed response.

LangChain: Abstracting LLM Workflows

LangChain is an open-source framework that simplifies building applications powered by LLMs by chaining together prompts, tools, and workflows.

LangChain Features:

Chains: Sequential steps like question answering → translation → audio conversion.
Agents: Dynamically decide which tools to invoke based on user input.

Example: Multi-Agent Collaboration

Agents can work together in parallel workflows:

One agent fetches live weather data.
Another translates the data into the user's language.

Challenges and Opportunities

Challenges:
- Handling different client behaviors.
- Balancing cost and performance in high-load scenarios.
Opportunities:
- Full control over model parameters.
- Building innovative workflows with custom agents and tools.

Future of LLMs

The field evolves rapidly:

Multi-Context Protocols: Allowing broader context handling.
Lightweight Models: Efficient models like Phi2 enable mobile deployment.

As the technology matures, understanding how to effectively "talk to robots" will remain a crucial skill for developers, researchers, and businesses alike.

Conclusion

Interacting with AI through LLMs requires more than just a technical understanding; it involves crafting effective prompts, leveraging tools, and addressing limitations. Whether you're building the next chatbot or exploring novel use cases, mastering these concepts will help you unlock the full potential of AI systems.

Remember: AI is a tool, and its value lies in how well we wield it. So, the next time you talk to a robot, think carefully about what you ask—it might just surprise you with the answer.

How to Talk to Robots: A Guide to Interacting with AI (and Code)

What Are Large Language Models (LLMs)?

Key Characteristics of LLMs:

Applications of LLMs

Popular LLM Providers

How Do LLMs Work?

Example:

Prompt Strategies: Getting the Best Out of LLMs

Common Strategies:

Augmenting LLMs with Tools

Example:

Overcoming LLM Limitations

Solution: Retrieval-Augmented Generation (RAG)

LangChain: Abstracting LLM Workflows

LangChain Features:

Example: Multi-Agent Collaboration

Challenges and Opportunities

Future of LLMs

Conclusion

Share this blog