Skip to content

Latest commit

 

History

History
509 lines (391 loc) · 13.2 KB

File metadata and controls

509 lines (391 loc) · 13.2 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a GitHub Account and Organization Migration Toolkit for transferring 600+ repositories from a personal account (pythoninthegrass) to an organization (pythoninthegrass2), then swapping names so the organization takes the original account name.

Current Status: 594/600 repos successfully transferred (99% success rate). Ready for manual rename operations.

Tech Stack:

  • Python 3.13+ with PEP 723 self-contained scripts (no virtualenv needed)
  • uv package manager for running scripts
  • GitHub CLI (gh) for API access
  • python-decouple for environment configuration
  • tqdm for progress bars

Common Commands

Running Scripts

All scripts are PEP 723 compliant with embedded dependencies. Run them directly with uv:

# Pre-migration audit (generates inventory backup)
./scripts/00_pre_migration_audit.py
# or: uv run scripts/00_pre_migration_audit.py

# Transfer repositories (main migration script)
./scripts/01_transfer_repos.py

# Post-migration validation
./scripts/02_post_migration_validation.py

# Retry failed transfers
./scripts/03_retry_failed_transfers.py

# Delete all forks from organization (if needed)
./scripts/04_delete_forks.py

Force cache refresh (ignore 30-minute cache):

./scripts/00_pre_migration_audit.py --force-refresh
./scripts/01_transfer_repos.py -f

Configuration

All scripts read from .env file:

# Copy template
cp .env.example .env

# Edit configuration
nano .env

Key variables:

  • SOURCE_OWNER - Personal account (pythoninthegrass)
  • TARGET_ORG - Target organization (pythoninthegrass2)
  • DRY_RUN - Test mode (true/false)
  • PILOT_MODE - Test with small batch (true/false)
  • PILOT_REPOS - Comma-separated repo list for pilot
  • BATCH_SIZE - Repos per batch (default: 9)
  • MAX_CONCURRENT_TRANSFERS - Parallel transfers (default: 3)
  • DELAY_BETWEEN_TRANSFERS - Seconds between transfers (default: 1)
  • DELAY_BETWEEN_BATCHES - Seconds between batches (default: 3)
  • EXCLUDE_FORKS - Skip forks (true/false)
  • EXCLUDE_ARCHIVED - Skip archived repos (true/false)

Verification Commands

# Check authentication
gh auth status

# List repos in source account
gh repo list pythoninthegrass --limit 100

# List repos in target org
gh repo list pythoninthegrass2 --limit 100

# Check specific repo location
gh repo view pythoninthegrass2/repo-name

Architecture

Script Pipeline

00_pre_migration_audit.py    → Inventory & backup
         ↓
01_transfer_repos.py          → Automated transfers
         ↓
04_delete_forks.py            → Delete forks (if needed)
         ↓
[Manual Web UI Steps]         → Rename account/org
         ↓
02_post_migration_validation.py → Verify success

Caching System

Both audit and transfer scripts implement a 30-minute cache:

Location: .cache/repos_{owner}.json

Benefits:

  • Instant subsequent runs (no API calls)
  • Shared cache between scripts
  • Automatic TTL expiration after 30 minutes

Cache structure:

{
  "owner": "pythoninthegrass",
  "timestamp": "2026-01-26T15:22:22.245556",
  "repositories": [
    {
      "name": "repo-name",
      "is_fork": false,
      "is_archived": false,
      "full_name": "pythoninthegrass/repo-name",
      ...
    }
  ]
}

Bypass cache: Use --force-refresh or -f flag

Parallel Transfer Implementation

01_transfer_repos.py uses concurrent.futures.ThreadPoolExecutor:

  • Processes repos in batches (default: 9 repos per batch)
  • Runs 3 concurrent transfers within each batch
  • ~3-4x faster than sequential (100 repos in ~3 minutes vs ~8 minutes)
  • Rate limiting: Small delays between concurrent operations
  • Error handling: Each thread handles failures independently

Transfer flow:

ThreadPoolExecutor(max_workers=3)
├── Thread 1: Transfer repo A
├── Thread 2: Transfer repo B
└── Thread 3: Transfer repo CWait for batch completionSmall delay (3 seconds)
    ↓
Next batch...

Error Handling Patterns

HTTP 422 "Already Transferred": Scripts detect repos already in target org and skip them:

  • Pre-transfer verification check
  • Post-error verification for 422 responses
  • Marked as "already_transferred" (not failures)

Status categories:

  • success - New transfer completed
  • already_transferred - Repo exists in target
  • error - Transfer failed
  • timeout - Request timed out
  • dry_run - Dry-run mode (no actual transfer)

Failed transfers: Results stored in results/transfer_results_YYYYMMDD_HHMMSS.csv

GraphQL API Usage

Scripts use GitHub GraphQL API (via gh api graphql) for efficient data fetching:

Query pattern:

query($owner: String!, $cursor: String) {
  user(login: $owner) {
    repositories(first: 100, after: $cursor, ownerAffiliations: OWNER) {
      totalCount
      pageInfo {
        hasNextPage
        endCursor
      }
      nodes {
        name
        isFork
        isArchived
        hasIssuesEnabled
        hasWikiEnabled
        ...
      }
    }
  }
}

Pagination: Automatic cursor-based pagination for 100+ repos

Transfer API

Repository transfers use REST API:

gh api --method POST /repos/{owner}/{repo}/transfer \
  -f new_owner={target_org}

Limitations:

  • Private forks cannot be transferred if parent repo has restrictions
  • Repos with pending transfers must wait 24 hours
  • Name collisions (already taken) require manual resolution

Key Implementation Details

PEP 723 Script Headers

All scripts include inline dependency specifications:

#!/usr/bin/env -S uv run --script

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "python-decouple>=3.8",
#     "tqdm>=4.66.0",
# ]
# [tool.uv]
# exclude-newer = "2025-12-31T00:00:00Z"
# ///

This enables direct execution without virtualenv:

./scripts/00_pre_migration_audit.py  # uv handles dependencies

Audit Script Fields

00_pre_migration_audit.py captures:

  • Repository metadata (name, owner, visibility)
  • Settings (issues, wiki, pages enabled)
  • Statistics (stars, forks, open issues)
  • Timestamps (created, updated, last pushed)
  • Topics and primary language
  • Fork and archive status

Output formats:

  • CSV: backup/repos_inventory_YYYYMMDD_HHMMSS.csv
  • JSON: backup/repos_full_YYYYMMDD_HHMMSS.json

Transfer Script Filtering

01_transfer_repos.py supports multiple filtering modes:

Pilot mode (test with specific repos):

PILOT_MODE=true
PILOT_REPOS=repo1,repo2,repo3

Filter by type:

EXCLUDE_FORKS=true      # Skip forked repos
EXCLUDE_ARCHIVED=true   # Skip archived repos

Both filters can be combined.

Validation Script Checks

02_post_migration_validation.py verifies:

  1. Repository count matches pre-migration audit
  2. Metadata preserved (stars, forks, issues)
  3. Access control intact
  4. Missing repositories identified

Uses cached audit data if available.

Retry Logic

03_retry_failed_transfers.py implements exponential backoff:

Initial delay: 5 seconds
Max retries: 3
Delay multiplier: 2x per retry

Retry conditions:

  • Timeout errors
  • Rate limit errors (429)
  • Temporary network failures

Permanent failures (no retry):

  • Pending transfers (24-hour cooldown)
  • Not found (404)
  • Permission denied (403)

Fork Deletion

04_delete_forks.py deletes all forked repositories:

Features:

  • Parallel deletion (5 concurrent by default)
  • Batch processing (20 repos per batch)
  • GraphQL API for efficient fork discovery
  • Confirmation prompt (must type "DELETE")
  • Progress tracking with tqdm
  • Dry-run mode for safety

Use case: Some organizations have hundreds of forks that can block org renames or cause naming conflicts. This script bulk-deletes them.

Configuration:

MAX_CONCURRENT_DELETIONS=5  # Parallel deletions
DRY_RUN=false               # Live deletion mode

Performance: ~2-3 minutes for 384 forks (5 concurrent)

Critical Migration Steps

Phase 1: Pre-Migration

  1. Run audit script to generate backup
  2. Review CSV output for exclusions
  3. Test with pilot mode (5-10 repos)
  4. Wait 24 hours to verify pilot

Phase 2: Bulk Transfer

  1. Set PILOT_MODE=false and DRY_RUN=false
  2. Run transfer script (monitors progress)
  3. Handle failures with retry script
  4. Verify transfer counts

Phase 3: Manual Rename (Web UI Only)

⚠️ Cannot be automated via API

  1. Rename personal account: pythoninthegrasspythoninthegrass_og
  2. Rename organization: pythoninthegrass2pythoninthegrass
  3. Verify with gh auth status and API calls

See: docs/MANUAL_RENAME_PROCEDURE.md for detailed steps

Phase 4: Post-Migration

  1. Run validation script
  2. Update GitHub Actions secrets if needed
  3. Verify webhooks and integrations
  4. GitHub redirects handle URL changes for 90 days (no local clone updates needed)

Current State

✅ Migration completed successfully: 2026-01-26

Final results:

  • Total repos: 600
  • Successfully transferred: 594 (99% success rate)
    • 463 new transfers
    • 131 already existed
  • Failed: 6 repos (edge cases)
    • 3 actually succeeded despite 422 errors
    • 2 private forks (can't transfer due to GitHub restrictions)
    • 1 name collision
  • Forks deleted: 384 (cleared path for org rename)

Account/Org status:

  • Personal account renamed: pythoninthegrassthepythoninthegrass
  • Organization renamed: pythoninthegrass2pythoninthegrass
  • All repos now under organization with original username

Performance:

  • Repository transfers: ~9 minutes (3 concurrent)
  • Fork deletion: ~2-3 minutes (5 concurrent)
  • Total active time: ~3 hours
  • Success rate: 99%

Troubleshooting

"gh: command not found"

brew install gh  # macOS
# or: https://cli.github.com/

"uv: command not found"

curl -LsSf https://astral.sh/uv/install.sh | sh

Permission Errors

gh auth login
gh auth status

Rate Limiting

Increase delays in .env:

DELAY_BETWEEN_TRANSFERS=5
DELAY_BETWEEN_BATCHES=30
MAX_CONCURRENT_TRANSFERS=2  # Reduce parallelism

HTTP 422 Errors

"Validation Failed" can mean:

  1. Repo already transferred (check target org)
  2. Private fork with parent restrictions
  3. Name collision in target org
  4. Pending transfer (24-hour cooldown)

Check repo status:

gh repo view pythoninthegrass2/repo-name  # Exists?
gh api /repos/pythoninthegrass/repo-name --jq '{fork: .fork, private: .private}'

Failed Transfers

  1. Check results/transfer_results_*.csv for error details
  2. Try manual transfer:
    gh api --method POST /repos/pythoninthegrass/repo-name/transfer -f new_owner=pythoninthegrass2
  3. Run retry script: ./scripts/03_retry_failed_transfers.py

Cache Issues

Clear cache and force fresh data:

rm -rf .cache/
./scripts/00_pre_migration_audit.py --force-refresh

Safety Features

  • Dry-run mode: Test without changes (DRY_RUN=true)
  • Pilot mode: Test small batch first (PILOT_MODE=true)
  • Confirmation prompts: Prevents accidental transfers
  • Rate limiting: Avoids API bans
  • Error recovery: Retry script for transient failures
  • Pre-migration backup: Complete audit before changes
  • Validation: Verify migration success

GitHub API Limits

  • Rate limit: 5,000 requests/hour (authenticated)
  • Transfer cooldown: 24 hours between transfer attempts per repo
  • Concurrent transfers: No official limit, but rate limiting applies
  • Rename redirects: 90-day redirect period

What's Preserved

✅ Repository metadata (stars, watchers, forks) ✅ Issues and pull requests ✅ Commits and history ✅ Branch protection rules ✅ Webhooks (URLs may need updates) ✅ Deploy keys ✅ Repository settings

What's NOT Preserved

❌ GitHub Actions secrets (must be re-added manually) ❌ Some third-party integrations (may need reconfiguration)

Documentation

  • README.md - User guide and setup instructions
  • docs/MANUAL_RENAME_PROCEDURE.md - Detailed rename steps
  • .env.example - Configuration template
  • This file (CLAUDE.md) - AI agent guidance

Future Improvements

If extending this toolkit:

  1. Add progress persistence: Resume interrupted transfers
  2. Webhook updates: Automatically update webhook URLs after rename
  3. Batch scheduling: Spread transfers over multiple days for very large migrations
  4. GitHub Actions integration: Automate secret re-creation
  5. Monitoring dashboard: Real-time transfer progress visualization
  6. Rollback capability: Automate transfer reversal if needed

Testing Notes

When modifying scripts:

  1. Always test with DRY_RUN=true first
  2. Use pilot mode with test repos
  3. Verify cache behavior with --force-refresh
  4. Test error handling with intentional failures
  5. Check parallel transfer behavior under load

Security Considerations

  • Never commit .env file (contains config)
  • Review audit outputs before sharing (may expose private repo names)
  • Limit access to backup/ and results/ directories
  • Use .gitignore to prevent accidental commits
  • Personal access tokens continue working after rename