Skip to content

feat(test): add end-to-end live API test suite (DIS-73)#40

Closed
ckorhonen wants to merge 5 commits intomainfrom
devin/DIS-73-1772653465
Closed

feat(test): add end-to-end live API test suite (DIS-73)#40
ckorhonen wants to merge 5 commits intomainfrom
devin/DIS-73-1772653465

Conversation

@ckorhonen
Copy link
Collaborator

@ckorhonen ckorhonen commented Mar 4, 2026

feat(test): add end-to-end live API test suite (DIS-73)

Summary

Adds test/e2e.test.ts — a comprehensive live API test suite gated behind LIVE_TEST=true. The suite tests all 8 command domains (collections, nfts, listings, offers, events, tokens, swaps, accounts) across three layers:

  • SDK: Invokes OpenSeaCLI class methods programmatically
  • CLI: Executes node dist/cli.js via execFileSync (no shell) and captures stdout/stderr/exit codes
  • Parity: Compares SDK responses against direct fetch calls to the REST API

Output format validation covers all 3 formats (json, table, toon). Error handling tests cover invalid API key (401), non-existent slugs (4xx), and missing API key (exit code 2). An afterAll hook writes a JSON report to test/e2e-report.json.

When LIVE_TEST is unset, the entire suite is skipped via describe.runIf — existing CI is unaffected (confirmed: 151 tests pass, 75 e2e tests skipped).

Updates since last revision

Round 1 (bot review):

  • SDK instantiation moved to beforeAll — was previously eagerly constructed at describe-body scope
  • Replaced execSync with execFileSync — avoids shell metacharacter injection; API key passed via env var
  • Removed unused execSync import

Round 2 (human review — @ckorhonen):

  • Parity test flakiness — All 4 dynamic-data parity tests (listings, offers, events, tokens) now use Math.abs(a - b) <= 1 tolerance instead of exact .length equality, since data can change between sequential API calls
  • Swap error handling narrowed — Catch block now only treats [400, 404, 422] as acceptable OpenSeaAPIError status codes; 500s and other unexpected errors will correctly propagate as test failures
  • Build step addedbeforeAll now runs npm run build (30s timeout) so CLI tests always execute against a fresh dist/cli.js instead of potentially stale artifacts
  • .gitignore updated — Added test/e2e-report.json to prevent accidental commits of the generated report
  • Quoted arg parsingcli() helper uses parseArgs() regex to correctly handle double-quoted arguments (e.g. --value "New Punk Blue")

Review & Testing Checklist for Human

  • Run the suite with a live key — These tests have only been verified to skip correctly, not actually executed against the live API. This is the most important check. Run: LIVE_TEST=true OPENSEA_API_KEY=<key> npx vitest run test/e2e.test.ts
  • Parity tolerance — The ±1 tolerance on .length comparisons is an improvement over exact matching but is still somewhat arbitrary. On high-churn endpoints the counts could diverge by more than 1 between sequential calls. Consider whether structural/field-level comparison (e.g. comparing first item IDs) would be more robust.
  • Rate limiting in swap tests — Acceptable swap error codes are [400, 404, 422]. A 429 (rate limit) during the swap quote call will propagate as a hard failure. Verify this doesn't happen during a full test run.

Notes

  • search command is intentionally excluded — ticket specifies exactly 8 domains and does not list search
  • Toon format validation checks compactness (fewer tokens than JSON) but does not deeply verify parseability of toon encoding — ticket says "verify toon output is parseable"
  • Fixtures use well-known entities (BAYC, vitalik.eth, WETH) which are stable but tests assume specific response shapes
  • Build step in beforeAll adds ~30s to test startup; ensures CLI tests run against current code but may affect CI timing

Link to Devin Session: https://app.devin.ai/sessions/0a1a825e89dc41c5a3f3a68952aa262d
Requested by: @ckorhonen


Open with Devin

Adds comprehensive e2e tests gated behind LIVE_TEST=true that:
- Test all 8 command domains via SDK and CLI shell execution
- Validate all 3 output formats (json, table, toon)
- Verify parity between SDK responses and direct REST API calls
- Test error handling (invalid API key, non-existent slugs, missing key)
- Generate JSON report with pass/fail per command x format

Co-Authored-By: Chris K <ckorhonen@gmail.com>
@devin-ai-integration
Copy link
Contributor

Original prompt from Chris K
Please work on ticket "[Testing] Build end-to-end agent test suite for CLI" ([DIS-73](https://linear.app/opensea/issue/DIS-73/testing-build-end-to-end-agent-test-suite-for-cli))

PLAYBOOK_md:
# Ticket to PR

## Overview

This playbook guides the process of taking a Linear ticket from initial scoping through implementation to final review. The workflow ensures proper context gathering, quality implementation, and thorough code review before delivery. The agent uses the Linear MCP to manage ticket status and communication throughout.

## What's Needed From User

- Linear ticket URL or ticket ID (e.g., `ENG-123` or `https://linear.app/team/issue/ENG-123/...`)
- Repository access for the codebase where changes will be made

<phase name="Disambiguation" id="1">
## Disambiguation Phase

Think about the full user intent. Tickets are sometimes sparse. Make sure you disambiguate to the full scope that the user intended.

1. Fetch the ticket details using the Linear MCP `get_issue` tool with the ticket ID
2. Before diving into code: use the devin MCP to get a high-level understanding of the relevant systems and architecture. Use `ask_question` to learn about the relevant systems – send queries for multiple repos that could be relevant to get the full picture. Use `read_wiki_contents` to then get a better understanding how different parts of the codebase connect to each other.
3. Gather additional context to understand what the ticket means and refers to:
   - Look at past tickets in the same project and from the same author to understand patterns and terminology
   - Search for related commits and PRs (by author and content) that may provide context on the affected systems
   - Check any linked documents, designs, or parent tickets
   - Investigate the actual code
4. Identify any ambiguity in what the ticket refers to or asks for, including jargon or project-specific terms and use all means necessary to answer this yourself
5. Consult your smart friend: pass in the raw conten... (8100 chars truncated...)

@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

…l injection

Addresses Devin Review findings:
- SDK instantiation moved from eager describe-body scope into beforeAll
  hook so it only runs when LIVE_TEST=true
- Replaced execSync with execFileSync in cli() helper to avoid shell
  metacharacter injection via OPENSEA_API_KEY env var
- API key now passed via OPENSEA_API_KEY env var instead of CLI argument

Co-Authored-By: Chris K <ckorhonen@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration bot and others added 2 commits March 4, 2026 19:59
Add parseArgs() that respects double-quoted groups so values like
"New Punk Blue" are passed as a single argument to execFileSync.
Both cli() and cliFormatted() now accept string | string[].

Co-Authored-By: Chris K <ckorhonen@gmail.com>
Co-Authored-By: Chris K <ckorhonen@gmail.com>
@ckorhonen
Copy link
Collaborator Author

@devin Several issues to address:

  1. Add test/e2e-report.json to .gitignore — The afterAll hook writes this file and it will get accidentally committed.

  2. Parity test flakiness — Tests comparing .length between SDK and direct fetch calls (lines ~604-615, 773-784, 935-945, 1103-1115) will flake because data changes between sequential calls. Fix by comparing structural shape or stable fields instead of counts, or add a tolerance like Math.abs(a - b) <= 1.

  3. Overly permissive swap error handling — Lines ~1198-1212 treat ANY OpenSeaAPIError as a pass. A 500 is not the same as 'no liquidity'. Narrow to expected codes:

if (err instanceof OpenSeaAPIError && [400, 404, 422].includes(err.statusCode)) {
  // acceptable for swap quote
}
  1. Stale build risk — CLI tests depend on dist/cli.js but there's no build step. Add a beforeAll that runs npm run build or at least fails fast with a clear message if the binary is stale.

…rowing, build step

- Parity tests now use tolerance (±1) instead of exact count matching
- Swap error handling narrowed to expected codes [400, 404, 422]
- beforeAll runs npm run build to ensure dist/cli.js is fresh

Co-Authored-By: Chris K <ckorhonen@gmail.com>
@devin-ai-integration
Copy link
Contributor

All 4 items addressed:

  1. .gitignoretest/e2e-report.json added (commit 9ceb465)
  2. Parity flakiness — Changed exact .length comparisons to Math.abs(a - b) <= 1 tolerance for all 4 dynamic endpoints (listings, offers, events, tokens)
  3. Swap error handling — Narrowed from any OpenSeaAPIError to [400, 404, 422].includes(err.statusCode)
  4. Stale buildbeforeAll now runs npm run build before tests execute

@ckorhonen ckorhonen closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant