Databricks AI Agent Playbook

Purpose of the Playbook

This comprehensive playbook guides GTM teams and technical stakeholders in building, training, and deploying LLM-powered Agents inside Databricks that deliver measurable business outcomes. The document provides a structured framework to ensure agents are properly configured with the right tools, context, and guardrails to solve real-world data challenges.

Please utilize Github Repository - NBA-AI-AGENT

Scope

The playbook focuses on building agents that interact with:

  • Structured data: SQL tables, Delta Lake tables, Iceberg tables

  • Semi-structured data: JSON, text documents, notebooks, markdown files

  • Business processes: Scheduled workflows, operational monitoring, insight generation

  • Human-in-the-loop systems: Collaborative analysis and decision support

Want to learn in podcast form? I put my notes in notebookLM and created this podcast for your easy listening - Click Here

Prerequisites

Technical Environment

  • Databricks workspace (E2 or higher tier recommended)

  • Databricks Runtime 14.0+ with ML libraries

  • Unity Catalog enabled workspace for centralized governance

  • SQL warehouse (for SQL Toolkit functionality)

  • Model serving endpoint (for custom/external LLM integration)

Access Requirements

  • Permissions: Workspace admin or contributor access

  • Compute Access: Ability to create/manage clusters

  • Data Access: Access to relevant tables via Unity Catalog

  • API Access: Token generation privileges (for REST operations)

Knowledge Requirements

  • Basic understanding of LLMs and prompt engineering

  • Familiarity with Databricks workspace navigation

  • SQL querying capabilities

  • Python programming experience (for advanced configurations)

Key Concepts

Agent Architecture

An agent within Databricks is a system that can:

  • Reason: Understand natural language requests and translate to technical actions

  • Plan: Break complex tasks into manageable steps before execution

  • Act: Execute operations within the Databricks environment using attached tools

  • Learn: Improve over time through feedback and pattern recognition

Toolkits

Toolkits are interfaces the agent uses to interact with Databricks resources:

  • SQL Toolkit: Query tables, views, and execute SQL commands

  • Notebook Toolkit: Run notebooks, execute Python/Scala/R code

  • REST Toolkit: Make API calls to external or internal services

  • Document Toolkit: Search and retrieve from knowledge bases

  • Visualization Toolkit: Generate charts, graphs, and dashboards

Memory Systems

Agents maintain two types of memory:

  • Short-term Memory: Chat history and immediate context (typically 8K-32K tokens)

  • Long-term Memory: Indexed knowledge bases (vector stores, metadata repositories)

  • Structured Memory: Database connections, table metadata, schema information

Feedback Loop

The feedback loop is critical for agent effectiveness:

  • Clarification Questions: Agent confirms understanding before major actions

  • Progressive Disclosure: Agent reveals information in digestible segments

  • Continuous Learning: Human feedback updates agent behavior over time

Playbook Steps

Define the Agent's Purpose and Scope

Business Problem Mapping

Document the specific business challenges the agent will solve:

Example 1: NBA Player Performance Agent - (Click here for Github Data)

PRIMARY PURPOSE: Help coaches and analysts evaluate player performance metrics and identify optimal lineup combinations based on historical data.

KEY QUESTIONS THE AGENT SHOULD ANSWER:
1. Which players have the best efficiency ratings for their position?
2. How has Player X's performance changed over their career?
3. Which player combinations yield the highest win percentage?
4. What statistical areas should Player Y focus on improving?
5. Which college produces the most successful players at position Z?

Success Metrics

Define how the agent's performance will be measured:

  • Accuracy: Correctness of information provided (target: >95%)

  • Relevance: Appropriate response to queries (target: >90%)

  • Resolution Rate: Percentage of queries resolved without human escalation (target: >80%)

  • User Satisfaction: Feedback scores from end users (target: >4.5/5)

  • Time Savings: Reduction in time spent on analysis tasks (target: >30%)

Knowledge Assets

Document reference materials the agent should use:

  • Proprietary analysis methodologies

  • Business glossaries and terminology guides

  • Standard operating procedures

  • Common analysis patterns

Playbook Steps - Attach Toolkits

Function Toolkit Implementation

When building agents that need to perform complex data analysis or specialized tasks, implementing custom functions is essential. These functions encapsulate business logic and ensure consistent, reliable outputs.

Function Implementation Best Practices

  1. Clear Documentation: Include detailed docstrings for each function

  2. Parameter Validation: Check inputs before processing

  3. Structured Output Format: Return consistent data structures

  4. Error Handling: Handle edge cases gracefully

  5. Performance Optimization: Write efficient code, especially for data operations

  6. Function Grouping: Organize related functions together in modules

Design General Instructions

The agent instructions should be comprehensive yet focused. They form the "constitution" that guides agent behavior.

Example: NBA Analysis Agent Instructions

# NBA ANALYSIS AGENT INSTRUCTIONS

## AGENT ROLE AND PURPOSE
You are an NBA Performance Analysis Agent built within Databricks. Your purpose is to help basketball operations staff analyze player performance data, identify trends, and make data-driven decisions about player development, lineups, and team strategy.

## AVAILABLE DATA SOURCES
You have access to the following data sources:
1. player_data.csv: Contains biographical information about players
2. Players.csv: Contains additional player information including birthplace
3. Seasons_Stats.csv: Contains detailed season statistics for players

## EXPECTED BEHAVIOR
- Always confirm data availability before attempting analysis
- Use SQL queries to access data rather than assuming values
- Present statistical findings with appropriate context (league averages, historical comparisons)
- When making player comparisons, consider position, era, and team context
- For visual data, suggest appropriate chart types (e.g., scatter plots for correlation analysis)
- Acknowledge limitations in the data when they exist

## CLARIFICATION PROTOCOL
When receiving ambiguous requests, ask clarifying questions about:
1. Time period of interest (specific seasons or career totals)
2. Metrics that matter most for the analysis
3. Whether to account for positional differences
4. The level of statistical detail required
5. Whether visualization would be helpful

## EXAMPLE INTERACTIONS

### Good Example 1:
User: "Who were the most efficient centers last season?"
Agent: "To analyze center efficiency, I'll need to clarify a few points:
1. Which specific efficiency metrics would you like me to prioritize? (PER, TS%, WS/48, etc.)
2. Are you interested in a minimum number of games played?
3. Would you like me to consider only traditional centers or also include players who split time between power forward and center?
4. Would you prefer raw data or a visualization of the results?"

### Good Example 2:
User: "Compare Michael Jordan and Kobe Bryant."
Agent: "I'd be happy to compare these players. To provide the most relevant analysis:
1. Would you like to compare their entire careers or specific seasons?
2. Which statistical categories are most important for this comparison?
3. Would you like to see advanced metrics or traditional box score stats?
4. Would you like to include playoff performance?
5. Would a side-by-side visualization help with this comparison?"

## ETHICAL GUIDELINES
- Do not make definitive statements about a player's future performance
- Present balanced analysis that considers multiple statistical perspectives
- Acknowledge that statistics are only one component of player evaluation
- Do not speculate about player injuries or personal matters

Setup Memory Systems

Configure memory to enhance agent performance:

Short Term, Long Term & Structure Memory

Build & Register the Agent!

Next
Next

Day Trading Scanner