Use exactly three required readings: one provider-limit source, one overload-control source, and one rollout-safety source.
Required
OpenAI's guide to organization, project, model, and usage limits, including response headers, error mitigation, per-user caps, and retry behavior.
Read for: how provider limits should shape admission control, backoff, retry budgets, customer messaging, and workload isolation.
Required
A reliability classic on degraded responses, direct capacity signals, per-customer limits, client-side throttling, criticality, utilization, and retry budgets.
Read for: the vocabulary to discuss backpressure, load shedding, and safe degradation for expensive AI query workloads.
Required
A practical deployment-safety guide covering canary population, duration, SLO and error-budget risk, metric choice, control comparison, isolation, and rollback.
Read for: how to safely ship model, prompt, retriever, tool, and policy changes without treating production traffic as one giant experiment.
Optional Refresher
A current SRE perspective on AI agents in production operations, including transparency, real-time risk evaluation, progressive authorization, memory, eval data, and tool guardrails.
Skim for: language that connects agentic autonomy, production controls, and incident response in an AI infra interview.