AI Learning Ramp | Course 6

System-Design Frame

Assume a BigQuery analytics copilot receives a question like "which enterprise accounts expanded after onboarding?" The course now uses popular LLM-era sources: DAIL-SQL for prompt engineering, token efficiency, and benchmark evaluation; DIN-SQL for decomposition and self-correction; an AWS enterprise tutorial for schema discovery and operational architecture; and PICARD as the optional constrained-decoding refresher.

Course 6: Text-To-SQL Systems

One-hour objective: defend a text-to-SQL architecture using LLM-era sources that are either highly cited or practically canonical: prompt design, decomposition, enterprise tool flow, constrained decoding, and permission-safe BigQuery access.

0-5 min

Frame the popularity filter.

Write why this version avoids obscure papers: the reading set should give you vocabulary interviewers recognize and architecture ideas you can actually reuse.

5-19 min

Read DAIL-SQL for LLM-era prompting.

Extract how question representation, example selection, example organization, token efficiency, and execution accuracy shape a practical text-to-SQL system.

19-34 min

Use DIN-SQL for LLM-era architecture.

Focus on decomposition, schema-linking hints, SQL generation, and self-correction as a practical control flow for an analytics assistant.

34-47 min

Translate AWS's tutorial into a BigQuery design.

Map Bedrock Agents concepts to your world: schema discovery tools, query execution tools, validation, error handling, auth boundaries, and approval gates.

47-55 min

Skim PICARD for constrained decoding.

Capture the core idea: if SQL is structured code, the decoder or validator can reject invalid partial outputs before bad SQL reaches the warehouse.

55-60 min

Deliver the interview synthesis.

Walk through the request path from intent interpretation to schema retrieval, decomposition, SQL generation, constrained validation, dry-run repair, execution, answer provenance, and evals.

Course 6 Reading List

Use three required sources: two LLM-era text-to-SQL papers plus one practical enterprise tutorial. Citation counts below are Semantic Scholar snapshots from this update and should be treated as popularity signals, not exact permanent numbers.

Required

DAIL-SQL: Text-to-SQL Empowered by Large Language Models

A popular LLM-era text-to-SQL paper from PVLDB. Semantic Scholar showed 542 citations during this update, and the paper compares prompt representations, example selection, example organization, token efficiency, and GPT-4 execution accuracy.

Read for: the practical prompt-engineering and cost-quality tradeoffs behind modern LLM-based SQL generation.

Required

DIN-SQL: Decomposed In-Context Learning with Self-Correction

A popular LLM-era text-to-SQL paper. Semantic Scholar listed roughly 702 citations during this update, and the decomposition pattern maps cleanly to agentic query planning.

Read for: how to split the task into schema linking, decomposition, SQL generation, and self-correction rather than betting on one prompt.

Required

AWS: Dynamic Text-to-SQL for Enterprise Workloads

A practical, official architecture tutorial for enterprise text-to-SQL with Amazon Bedrock Agents, dynamic schema discovery, generated query execution, error handling, and guardrail-style controls.

Read for: production design moves you can translate to BigQuery: tools, permissions, schema context, execution boundaries, and repair loops.

Optional Refresher

PICARD: Constrained Decoding for Language Models

A widely cited constrained-decoding paper for text-to-SQL. Semantic Scholar listed roughly 579 citations during this update.

Skim for: the validation instinct that generated SQL is structured code, so the system can constrain or reject invalid outputs before execution.

Readiness Checklist

You are ready for the interview version of this topic when you can connect the canonical benchmark story to a practical enterprise architecture.

You can explain DAIL-SQL's LLM-era prompt design lessons: question representation, example selection, example organization, token efficiency, and execution accuracy.
You can use DIN-SQL's decomposition pattern to separate intent classification, schema linking, query decomposition, SQL generation, and self-correction.
You can translate the AWS Bedrock Agents tutorial into a BigQuery architecture with schema tools, query tools, auth boundaries, validation, and error handling.
You can explain where constrained decoding or constrained validation fits before a warehouse call, using PICARD as the reference point.
You can design a bounded repair loop that distinguishes parser failures, dry-run failures, runtime failures, semantic wrong-result failures, and uncertainty that needs human escalation.
You can propose evals for execution accuracy, schema-linking accuracy, tool-call precision, query cost limits, permission denial, repair success, and answer provenance.

Interview Drill: AI Infra System Design

Prompt: design a BigQuery text-to-SQL assistant using the popular-source stack from this course: DAIL-SQL-style prompting, DIN-SQL-style decomposition, AWS-style enterprise tools, and PICARD-style validation.

Start with the contract: accept natural language, user identity, workspace, candidate datasets, and clarification state; return SQL, validation receipts, result summary, and provenance.
Design prompting with DAIL-SQL vocabulary: choose schema representation, select high-similarity examples, control token budget, and keep cross-domain examples from polluting the prompt.
Use DIN-SQL's control flow: classify the question, decompose hard requests, select relevant schema context, generate SQL, and run a bounded self-correction loop.
Borrow the AWS tutorial's production shape: expose schema discovery and query execution as tools, keep credentials outside the model, validate generated SQL, and capture structured errors for repair.
Add PICARD-style safety: reject invalid SQL structures early, then dry-run for syntax and cost before any execution against customer data.
Close with evals and operations: golden text-to-SQL tasks, execution accuracy, schema-linking accuracy, permission probes, cost-limit tests, trace receipts, and human review for high-impact queries.

Text-to-SQL systems through the papers and tutorials people actually use.

System-Design Frame

Course 6: Text-To-SQL Systems

Frame the popularity filter.

Read DAIL-SQL for LLM-era prompting.

Use DIN-SQL for LLM-era architecture.

Translate AWS's tutorial into a BigQuery design.

Skim PICARD for constrained decoding.

Deliver the interview synthesis.

Course 6 Reading List

DAIL-SQL: Text-to-SQL Empowered by Large Language Models

DIN-SQL: Decomposed In-Context Learning with Self-Correction

AWS: Dynamic Text-to-SQL for Enterprise Workloads

PICARD: Constrained Decoding for Language Models

Readiness Checklist

Interview Drill: AI Infra System Design

Sources