Structured Approach to Generative AI in Demand Planning

A production-ready framework for Precision, Explainability, and Scale - all things essential to planning

Jun 11, 2025

Demand planning requires precision and consistency in forecasting, with analysts needing to identify significant variances and explain their causes. While generative AI offers promising capabilities for this domain, its implementation must be carefully structured to overcome inherent limitations. This article proposes a production-ready framework for integrating generative AI into demand planning workflows. By combining deterministic calculations, AI-guided analysis, and a traceable state model, we ensure precision, consistency, and explainability in generating insights.

Challenges in Demand Planning

Demand planning analytics typically involves several key requirements:

Time-Period Comparisons: Analysts need to compare forecasts across cycles or against actual values, identifying significant deviations.
High-Volume Data Processing: Planning data often involves millions of rows across multiple dimensions, requiring efficient filtering to identify meaningful variances.
Threshold Determination: What constitutes a "significant" variance involves both statistical methods and domain knowledge – for example, determining if a 10% error on a high-volume product is more important than a 50% error on a low-volume one.
Root Cause Analysis: Once variances are identified, planners must trace them to their sources using multiple datasets and causal reasoning.
Consistent Presentation: Analysis must be formatted consistently for stakeholder consumption, typically following organizational standards.

Limitations of Purely Prompt-Based Generative AI

While generative AI can conceptually handle these tasks, several limitations emerge:

Numerical Imprecision: LLMs aren't designed for high-precision calculations. For example, when calculating error concentrations, they might round inconsistently or make arithmetic errors.
Consistency Issues: Simply asking an LLM to "Find forecast errors and explain them" produces different outputs on each run, making the analysis unreliable for stakeholders.
Black-Box Reasoning: The inability to trace how an AI reached its conclusions undermines confidence in the analysis, especially when stakeholders ask specific questions about methodology.
Multi-Step Logic Challenges: Complex calculations, like calculating error concentration metrics, often get muddled when handled solely within prompts.
Plausible but Incorrect Explanations: LLMs can generate convincing-sounding explanations that attribute variance to incorrect business drivers, potentially misleading decision-makers.

A Structured Approach: Graph-constrained Agent Workflows with State Management

Rather than relying solely on prompts, a more effective approach uses a structured workflow with clearly defined roles:

1. Workflow Graph with Validation Cycles and Logic Loopbacks

The core innovation is implementing a cyclic workflow graph where:

Data Acquisition → Data Validation → Calculation → Analysis → Presentation
       ↑                                   |
       └───────────(if validation fails)───┘

This pattern enables powerful control loop mechanisms where agents can:

Self-Correct through Feedback: Agents can evaluate their own outputs and initiate remedial actions when quality thresholds aren't met
Implement Adaptive Sampling: Dynamically adjust data granularity based on detected error patterns
Enforce Quality Gates: Establish validation checkpoints that must be passed before proceeding to subsequent analysis stages
Create Learning Loops: Track successful analysis patterns to improve performance over time

For example, if a data query returns unexpected results, the system can retry with clarifications:

// Pseudocode example of retry logic
if (nielsen_data is invalid or empty):
  increment retry_count
  if (retry_count < 3):
    add_context("Previous query failed, please ensure it includes all required metrics")
    retry_query()
  else:
    log_error("Unable to retrieve valid Nielsen data after 3 attempts")
    proceed_with_available_data()

2. Global State Management

A well-structured global state management system is crucial for maintaining context and tracking execution. The example system uses an AnalysisState typology with these key components:

# An excerpt of our AnalysisState TypedDict:
class AnalysisState(TypedDict):
    # Tracing and debugging
    analysis_path: List[str]                  # Records each step in the execution
    
    # Analysis artifacts
    raw_data: Dict[str, pd.DataFrame]         # Original query results
    processed_data: Dict[str, pd.DataFrame]   # Calculated metrics
    summary_stats: Dict[str, Dict[str, Dict]] # Statistical summaries
    
    # Error handling
    analysis_attempts: int                    # Retry counter 
    last_analysis_error: Optional[str]        # Last error message
    
    # Data sources
    nielsen_data: Dict[str, pd.DataFrame]     # Market share data by level

This state object allows comprehensive tracking of the analysis process, making the system's reasoning transparent and debuggable. But this approach to state management requires careful consideration of memory tradeoffs:

Raw vs. Consolidated Data: For high-volume datasets, we store both raw query results and processed metrics, allowing for reanalysis without re-querying while maintaining access to original data
Summary vs. Full Data: The system strategically combines both approaches - storing full data for active analysis levels but using statistical summaries for cross-level comparison
In-Memory vs. Persistent Storage: Time-critical data remains in memory for fast access during the analysis workflow, while embeddings or large historical datasets might be offloaded to file system storage when not actively needed

These design decisions ensure the workflow remains responsive even with large datasets while maintaining a complete audit trail of the analysis process.

3. Task Separation

Rather than a monolithic AI system, key tasks are delegated to specialized components:

SQL Generation: The LLM translates structured prompts into precise SQL queries, executes them, and performs initial validation to ensure completeness and correctness.
Metric Calculation: All statistical computations are handled deterministically using pandas, ensuring precision and repeatability without relying on the LLM.
Threshold Application: The LLM applies domain-informed, subjective thresholds to algorithmically flag significant contributors based on the calculated metrics.
Market Context Integration: Leveraging the existing output, the LLM dynamically determines which supplementary data sources to query (e.g., Nielsen, DAR) and how to enrich the analysis with them.
Natural Language Insights: The LLM interprets the structured data and generates contextualized, human-readable explanations to communicate root causes and patterns clearly.

For example, when calculating forecast error metrics, the system uses deterministic functions rather than asking the LLM to perform math:

// Pseudocode for error metric calculation
function compute_forecast_error_metrics(data):
  for each metric in metrics:
     data["error"] = data["forecast"] - data["actual"]
     data["abs_error"] = absolute_value(data["error"])
     total_error = sum(data["abs_error"])
     data["contribution_percentage"] = (data["abs_error"] / total_error) * 100
     data["error_direction"] = if error > 0 then "Over" else "Under"
  return data

4. Data-First Approach

The workflow prioritizes data collection and structuring before analysis:

Query base forecast vs. actual data
Calculate error metrics (MAPE, bias, etc.)
Identify significant contributors based on statistical thresholds
Pull in contextual data (Nielsen market share, DAR drivers)
Only then perform analysis and generate insights

This ensures the AI has all relevant information available when forming conclusions, rather than speculating based on incomplete data.

5. Analysis at the Right Stage

The AI's strengths in pattern recognition and natural language generation are applied after data is properly structured:

// Pseudocode for insight generation
function generate_level_output(level, stats, contributors, dar_data, nielsen_data):
  // Combine all preprocessed data into context
  context = {
     "level": level,
     "summary_statistics": stats,
     "significant_contributors": contributors,
     "market_drivers": dar_data,
     "market_share": nielsen_data
  }
  // Now let the LLM generate insights with all data available
  return ask_llm_for_analysis(context)

Positioning analysis at the end of the workflow creates multiple benefits:

Rapid Iteration on Analysis Logic: With all data pre-collected and structured, analysts can quickly modify prompts to generate different insights without re-running the entire data pipeline. For example, changing the analysis focus from "identify over-forecasted products" to "analyze base vs. incremental components" requires only prompt adjustments, not data restructuring.
Complex Analysis Without Complex Code: The LLM can generate sophisticated multi-factor analyses by examining relationships across various data sources. For instance, it can correlate Nielsen market share data with forecast errors to identify external market factors - all without explicit programming of these relationships.
Contextual Adaptation: Different business scenarios may require different analysis approaches. A structured system can adjust prompts based on detected patterns - for example, using different analysis logic when errors are highly concentrated versus widely distributed.
Prompt Engineering Efficiency: Since all relevant data is already in the context, prompt engineers can focus on analysis quality rather than data extraction logic. This separation allows non-technical business experts to refine analysis prompts without changing the underlying data pipeline.
Maximum Leverage of Context Windows: Modern LLMs with large context windows can process all relevant data simultaneously, enabling cross-referencing between different metrics and levels that would require complex logic in traditional programming.

A practical example of this approach's flexibility: when a demand planner needs to incorporate a new logic to map pipefill volume in error calculation, they can simply update the prompt template to include this dimension without modifying the data collection or calculation code. The system will leverage the already-collected metrics to generate new insights immediately.

6. Prescriptive Prompting for Structured Reasoning

Another key insight in our system is the implementation of structured causal reasoning frameworks within prompts that guide the AI through domain-specific analytical pathways.

Just as we use carefully designed workflows to process data, we apply equally rigorous structure to the reasoning process itself. This prescriptive approach ensures that AI analysis follows expert human reasoning patterns rather than taking unpredictable paths that may sound plausible but lack grounding in domain expertise.

For example, when analyzing forecast errors, we don't simply ask the AI to "explain the variance." Instead, we provide a step-by-step causal reasoning framework that mirrors how an experienced demand planner would approach the problem:

# Example of structured reasoning instructions in a prompt
prompt = """
Analyze the forecast error patterns in this data and identify likely causes.

For significant over-forecasting (where forecast > actual):
* First check market share: Did our share decline more than historical trend?
* Then check category performance: Did the category decline unexpectedly?
* Examine pricing factors: Were you expecting a price decrease that did not materialize?
* Assess marketing impact: Did expected marketing contribution not materialize?

For market dynamics impact:
* If over-forecast, negative market dynamics represents a gap between forecast and actual
* Test whether the movement year-over-year is consistent with trend/share patterns
* Check if there are other external factors reflected in the data

Base your reasoning on the actual data provided, not general assumptions.
Provide specific, data-backed explanations that reference the metrics shown.
"""

This prescriptive approach guides the AI to follow the same reasoning processes a human analyst would, while staying grounded in the actual data.

The Workflow in Action: A Demand Planning Example

Let's walk through how this structured approach works in practice:

1. Data Collection and Validation

The workflow begins by gathering forecast vs. actual data at different hierarchy levels. The graph's first stage queries the database for each level (brand, category, SKU), validates the results, and stores the raw data in the state:

State["analysis_path"]: 
- "initialize: created analysis queue with hierarchy levels"
- "select: starting analysis for category level"
- "calculate: starting metrics calculation for category"

Each level is processed systematically with validation to ensure data quality.

2. Metric Calculation and Contributor Identification

Once raw data is available, the workflow calculates forecast error metrics deterministically:

State["analysis_path"]:
- "calculate: completed metrics calculation"
- "identify_contributors: starting identification for category"

For significant contributor identification, the system:

Calculates a concentration ratio to understand error distribution
Dynamically adjusts thresholds based on this concentration
Selects records meeting either coverage or contribution criteria

3. Contextual Data Integration

To enable root cause analysis, the workflow conditionally fetches additional data sources:

State["analysis_path"]:
- "decision: fetching DAR data"
- "dar: successfully fetched DAR data"
- "nielsen: starting data fetch for category"

These retry loops ensure robustness against temporary database issues, implementing automatic retries with increasingly detailed context:

// Pseudocode example of retry context
if (nielsen_attempt > 0):
  context = f"Previous query failed with error: {last_nielsen_error}. 
  Ensure you group by the correct category columns."

4. Analysis with Structured Reasoning

Finally, when all data is available, the analysis step applies structured reasoning:

State["analysis_path"]:
- "output: generating formatted analysis for category"

The AI is given precise instructions on how to approach the reasoning:

"First identify the direction of forecast error. If we over-forecasted, examine whether market share declined more than expected. Check the nielsen_data for category-level share trends. Then examine year-over-year patterns in the dar_data to identify if marketing, pricing, or distribution factors contributed."

This structured approach ensures the reasoning follows business logic and remains consistent across analyses.

Conclusion

Successful integration of generative AI in demand planning requires a structured approach that:

Separates responsibilities between deterministic calculation and AI interpretation
Implements validation cycles to ensure data quality
Maintains comprehensive state to track execution and enable debugging
Prioritizes data structuring before applying AI for insight generation

By combining the precision of traditional analytics with the interpretive capabilities of generative AI, organizations can produce more insightful, consistent, and reliable demand planning analyses.

The key is finding the right balance – using generative AI where it excels (pattern recognition, natural language, explanation) while relying on deterministic methods for the precise calculations that form the foundation of trustworthy analysis.

Fizi’s Substack

Discussion about this post

Ready for more?