feat: retry transient OpenAI API errors with exponential backoff #129

Closed
pook wants to merge 1 commit from feature/openai-retry-logic into main
Owner

Summary

  • Adds withRetry() utility that wraps async calls with exponential backoff (1s, 2s, 4s) + jitter
  • Retries on transient errors: 429 (rate limit), 500, 502, 503
  • Fails immediately on non-retryable errors: 400, 401, 404
  • Logs each retry attempt with attempt number and error status code
  • Throws typed RetryExhaustedError after max 3 retries for upstream handling
  • Wraps the OpenAI chat.completions.create call in llm.ts

Test plan

  • 11 unit tests covering all retry scenarios (all passing)
  • Verifies retry on 429, 500, 502, 503
  • Verifies immediate failure on 400, 401, 404
  • Verifies network errors (no status code) are retried
  • Verifies RetryExhaustedError thrown after max retries
  • Verifies retry logging output

Closes #37

🤖 Generated with Claude Code

## Summary - Adds `withRetry()` utility that wraps async calls with exponential backoff (1s, 2s, 4s) + jitter - Retries on transient errors: 429 (rate limit), 500, 502, 503 - Fails immediately on non-retryable errors: 400, 401, 404 - Logs each retry attempt with attempt number and error status code - Throws typed `RetryExhaustedError` after max 3 retries for upstream handling - Wraps the OpenAI `chat.completions.create` call in `llm.ts` ## Test plan - [x] 11 unit tests covering all retry scenarios (all passing) - [x] Verifies retry on 429, 500, 502, 503 - [x] Verifies immediate failure on 400, 401, 404 - [x] Verifies network errors (no status code) are retried - [x] Verifies `RetryExhaustedError` thrown after max retries - [x] Verifies retry logging output Closes #37 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Wraps OpenAI API calls in a retry function that handles transient
failures (429, 500, 502, 503) with exponential backoff and jitter.
Non-retryable errors (400, 401, 404) fail immediately. Includes
typed RetryExhaustedError for upstream handling.

Closes #37

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Closed 2026-04-10 during pipeline triage.

Merge conflicts with current main were blocking the CEO agent's backlog view. The compliancebot repo had ~60 open PRs and 141 open agent-task issues. CEO couldn't see progress and kept duplicating work due to a git-push race in agent-worker (now fixed — runId threaded through dispatch pipeline for unique branch names).

Reopen / resubmit against current main if the work is still relevant. Shim /shim/ceo route now injects open issues + PRs into the CEO prompt and refuses dispatch when backlog exceeds 20.

Closed 2026-04-10 during pipeline triage. Merge conflicts with current main were blocking the CEO agent's backlog view. The compliancebot repo had ~60 open PRs and 141 open agent-task issues. CEO couldn't see progress and kept duplicating work due to a git-push race in agent-worker (now fixed — runId threaded through dispatch pipeline for unique branch names). Reopen / resubmit against current main if the work is still relevant. Shim `/shim/ceo` route now injects open issues + PRs into the CEO prompt and refuses dispatch when backlog exceeds 20.
pook closed this pull request 2026-04-10 15:08:17 -04:00

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pook/compliancebot!129
No description provided.