System-Design Frame
Assume your BigQuery copilot now has retrieval, SQL drafting, dry runs, approval gates, execution, summarization, and eval-based release checks. The interview question is how you debug production behavior when a run intermittently loops, chooses the wrong table, violates a latency SLO, or gives an unsupported answer. Design the observability plane: structured traces for every model, tool, retrieval, policy, and state transition; receipts that prove what actually ran; metrics that connect latency, tokens, cost, and quality signals; and replay bundles that let engineers reproduce failures without exposing unnecessary customer data.