Databricks AI Agent Playbook

Purpose of the Playbook

This comprehensive playbook guides GTM teams and technical stakeholders in building, training, and deploying LLM-powered Agents inside Databricks that deliver measurable business outcomes. The document provides a structured framework to ensure agents are properly configured with the right tools, context, and guardrails to solve real-world data challenges.

Please utilize Github Repository - NBA-AI-AGENT

Scope

The playbook focuses on building agents that interact with:

Structured data: SQL tables, Delta Lake tables, Iceberg tables
Semi-structured data: JSON, text documents, notebooks, markdown files
Business processes: Scheduled workflows, operational monitoring, insight generation
Human-in-the-loop systems: Collaborative analysis and decision support

Want to learn in podcast form? I put my notes in notebookLM and created this podcast for your easy listening - Click Here

Prerequisites

Technical Environment

Databricks workspace (E2 or higher tier recommended)
Databricks Runtime 14.0+ with ML libraries
Unity Catalog enabled workspace for centralized governance
SQL warehouse (for SQL Toolkit functionality)
Model serving endpoint (for custom/external LLM integration)

Access Requirements

Permissions: Workspace admin or contributor access
Compute Access: Ability to create/manage clusters
Data Access: Access to relevant tables via Unity Catalog
API Access: Token generation privileges (for REST operations)

Knowledge Requirements

Basic understanding of LLMs and prompt engineering
Familiarity with Databricks workspace navigation
SQL querying capabilities
Python programming experience (for advanced configurations)

Key Concepts

Agent Architecture

An agent within Databricks is a system that can:

Reason: Understand natural language requests and translate to technical actions
Plan: Break complex tasks into manageable steps before execution
Act: Execute operations within the Databricks environment using attached tools
Learn: Improve over time through feedback and pattern recognition

Toolkits

Toolkits are interfaces the agent uses to interact with Databricks resources:

SQL Toolkit: Query tables, views, and execute SQL commands
Notebook Toolkit: Run notebooks, execute Python/Scala/R code
REST Toolkit: Make API calls to external or internal services
Document Toolkit: Search and retrieve from knowledge bases
Visualization Toolkit: Generate charts, graphs, and dashboards

Memory Systems

Agents maintain two types of memory:

Short-term Memory: Chat history and immediate context (typically 8K-32K tokens)
Long-term Memory: Indexed knowledge bases (vector stores, metadata repositories)
Structured Memory: Database connections, table metadata, schema information

Feedback Loop

The feedback loop is critical for agent effectiveness:

Clarification Questions: Agent confirms understanding before major actions
Progressive Disclosure: Agent reveals information in digestible segments
Continuous Learning: Human feedback updates agent behavior over time

Playbook Steps

Define the Agent's Purpose and Scope

Business Problem Mapping

Document the specific business challenges the agent will solve:

Example 1: NBA Player Performance Agent - (Click here for Github Data)

PRIMARY PURPOSE: Help coaches and analysts evaluate player performance metrics and identify optimal lineup combinations based on historical data.

KEY QUESTIONS THE AGENT SHOULD ANSWER:
1. Which players have the best efficiency ratings for their position?
2. How has Player X's performance changed over their career?
3. Which player combinations yield the highest win percentage?
4. What statistical areas should Player Y focus on improving?
5. Which college produces the most successful players at position Z?

Success Metrics

Define how the agent's performance will be measured:

Accuracy: Correctness of information provided (target: >95%)
Relevance: Appropriate response to queries (target: >90%)
Resolution Rate: Percentage of queries resolved without human escalation (target: >80%)
User Satisfaction: Feedback scores from end users (target: >4.5/5)
Time Savings: Reduction in time spent on analysis tasks (target: >30%)

Knowledge Assets

Document reference materials the agent should use:

Proprietary analysis methodologies
Business glossaries and terminology guides
Standard operating procedures
Common analysis patterns

Playbook Steps - Attach Toolkits

Function Toolkit Implementation

When building agents that need to perform complex data analysis or specialized tasks, implementing custom functions is essential. These functions encapsulate business logic and ensure consistent, reliable outputs.

Function Implementation Best Practices

Clear Documentation: Include detailed docstrings for each function
Parameter Validation: Check inputs before processing
Structured Output Format: Return consistent data structures
Error Handling: Handle edge cases gracefully
Performance Optimization: Write efficient code, especially for data operations
Function Grouping: Organize related functions together in modules

Design General Instructions

The agent instructions should be comprehensive yet focused. They form the "constitution" that guides agent behavior.

Example: NBA Analysis Agent Instructions

# NBA ANALYSIS AGENT INSTRUCTIONS

## AGENT ROLE AND PURPOSE
You are an NBA Performance Analysis Agent built within Databricks. Your purpose is to help basketball operations staff analyze player performance data, identify trends, and make data-driven decisions about player development, lineups, and team strategy.

## AVAILABLE DATA SOURCES
You have access to the following data sources:
1. player_data.csv: Contains biographical information about players
2. Players.csv: Contains additional player information including birthplace
3. Seasons_Stats.csv: Contains detailed season statistics for players

## EXPECTED BEHAVIOR
- Always confirm data availability before attempting analysis
- Use SQL queries to access data rather than assuming values
- Present statistical findings with appropriate context (league averages, historical comparisons)
- When making player comparisons, consider position, era, and team context
- For visual data, suggest appropriate chart types (e.g., scatter plots for correlation analysis)
- Acknowledge limitations in the data when they exist

## CLARIFICATION PROTOCOL
When receiving ambiguous requests, ask clarifying questions about:
1. Time period of interest (specific seasons or career totals)
2. Metrics that matter most for the analysis
3. Whether to account for positional differences
4. The level of statistical detail required
5. Whether visualization would be helpful

## EXAMPLE INTERACTIONS

### Good Example 1:
User: "Who were the most efficient centers last season?"
Agent: "To analyze center efficiency, I'll need to clarify a few points:
1. Which specific efficiency metrics would you like me to prioritize? (PER, TS%, WS/48, etc.)
2. Are you interested in a minimum number of games played?
3. Would you like me to consider only traditional centers or also include players who split time between power forward and center?
4. Would you prefer raw data or a visualization of the results?"

### Good Example 2:
User: "Compare Michael Jordan and Kobe Bryant."
Agent: "I'd be happy to compare these players. To provide the most relevant analysis:
1. Would you like to compare their entire careers or specific seasons?
2. Which statistical categories are most important for this comparison?
3. Would you like to see advanced metrics or traditional box score stats?
4. Would you like to include playoff performance?
5. Would a side-by-side visualization help with this comparison?"

## ETHICAL GUIDELINES
- Do not make definitive statements about a player's future performance
- Present balanced analysis that considers multiple statistical perspectives
- Acknowledge that statistics are only one component of player evaluation
- Do not speculate about player injuries or personal matters

Setup Memory Systems

Configure memory to enhance agent performance:

Short Term, Long Term & Structure Memory

Databricks AI Agent Playbook

Purpose of the Playbook

Please utilize Github Repository - NBA-AI-AGENT

Scope

Want to learn in podcast form? I put my notes in notebookLM and created this podcast for your easy listening - Click Here

Prerequisites

Technical Environment

Access Requirements

Knowledge Requirements

Key Concepts

Agent Architecture

Toolkits

Memory Systems

Feedback Loop

Playbook Steps

Define the Agent's Purpose and Scope

Business Problem Mapping

Success Metrics

Knowledge Assets

Playbook Steps - Attach Toolkits

Function Toolkit Implementation

Function Implementation Best Practices

Design General Instructions

The agent instructions should be comprehensive yet focused. They form the "constitution" that guides agent behavior.

Example: NBA Analysis Agent Instructions

Setup Memory Systems

Build & Register the Agent!

Text Me A Question

AngelaGibson292@gmail.com

Databricks AI Agent Playbook

Purpose of the Playbook

Please utilize Github Repository - NBA-AI-AGENT

Scope

Want to learn in podcast form? I put my notes in notebookLM and created this podcast for your easy listening - Click Here

Prerequisites

Technical Environment

Access Requirements

Knowledge Requirements

Key Concepts

Agent Architecture

Toolkits

Memory Systems

Feedback Loop

Playbook Steps

Define the Agent's Purpose and Scope

Business Problem Mapping

Success Metrics

Knowledge Assets

Playbook Steps - Attach Toolkits

Function Toolkit Implementation

Function Implementation Best Practices

Design General Instructions

The agent instructions should be comprehensive yet focused. They form the "constitution" that guides agent behavior.

Example: NBA Analysis Agent Instructions

Setup Memory Systems

Build & Register the Agent!

Day Trading Scanner

Text Me A Question

AngelaGibson292@gmail.com