Aayushi Gupta guptaaayushi09

╔═══════════════════════════════════════════════════════╗
║  AAYUSHI GUPTA  ·  Software Development Engineer      ║
║  Backend Systems  ·  Distributed Architecture  ·  AWS ║
╚═══════════════════════════════════════════════════════╝

`$ whoami`

Backend engineer with 3+ years building production systems across Amazon, Pine Labs, and Niyo — distributed architectures, event-driven pipelines, cross-region migrations, payment infrastructure, and real-time notification systems at scale.

I design APIs, own systems end-to-end, mentor engineers, and ship things that don't break at 3AM.

Engineer engineer = Engineer.builder()
    .experience(List.of("Amazon", "Pine Labs", "Niyo"))
    .focus("Distributed Systems · Event-Driven Architecture · Cloud Infrastructure")
    .languages(List.of("Java", "Go", "TypeScript", "Python"))
    .cloud("AWS — ECS, Lambda, SQS/SNS, DynamoDB, OpenSearch, CDK")
    .ai(List.of("LangChain", "RAG pipelines", "LLM APIs", "Vector DBs"))
    .certifications("AWS Certified")
    .dsaSolved(500)
    .build();

`$ cat production_impact.log`

Real work. Real scale. No toy projects.

🌍 Cross-Region Migration — 1M+ Customers, Zero Downtime `Amazon`

Owned end-to-end migration of a stateful ECS service from eu-west-1 → eu-south-2 as part of Amazon's Regional Flex Planning initiative. No prior playbook existed for this in the org — I wrote it.

Designed the migration architecture from scratch: phased approach — per-tenant S3 cross-region replication + OpenSearch snapshot-restore, followed by gradual WebLab traffic ramp (5% → 25% → 50% → 100%), followed by formal decommission via change management
Solved the stateful data problem: validated per-tenant S3 object counts and OpenSearch document counts before touching a single traffic percentage — data parity was the go/no-go gate, not a timer
Separated "zero traffic" from "delete resources" by a full 7-day monitoring window — decommission is irreversible; I treated it accordingly
Owned the decommission sequence: endpoint removal → compute deletion → data deletion — order matters because running compute without data causes cascading failures
Coordinated multiple client teams across Alpha and Prod accounts; wrote the runbook so the next engineer doesn't start from zero

Zero downtime · Zero data loss · 1M+ customers · Full org-level playbook created

🔐 GDPR Deletion Pipeline — Compliance at Scale `Amazon`

Designed and built a compliance-grade, event-driven customer data deletion system across 3+ microservices. The correctness bar here is non-negotiable — a missed deletion is a compliance violation.

SNS (deletion event) → SQS (buffered queue) → Lambda → DynamoDB tables
                                  ↓                  → External state API
                                 DLQ (failure capture + alarm)

Idempotency by design: deletion state tracked in a DynamoDB tracking table keyed by customerId + requestId — SQS at-least-once delivery means duplicate processing is guaranteed to happen; the system handles it safely
Solved the distributed transaction gap: DynamoDB delete succeeds but external API call fails → message returns to queue → DynamoDB delete is a no-op on retry → external API gets called again. Chose data-deletion-first ordering deliberately: missing a state update is recoverable; data existing after a deletion request is a compliance violation
DLQ + CloudWatch alarm ensures no deletion silently fails — every failure is captured, alerted, and replayable after root cause fix
Chose Lambda over persistent ECS: deletion is bursty and infrequent — Lambda scales to zero and only costs on execution

GDPR right-to-erasure · 100K+ customers · At-least-once safe · Zero silent failures

📊 Real-Time CDC Pipeline — DynamoDB → Columnar Analytics `Amazon`

Built a multi-region change data capture pipeline to make operational DynamoDB data queryable for analytics without impacting production read capacity.

Designed schema transformation layer: DDB's typed JSON format ({S: "val"}, {BOOL: true}) → flat TSV rows compatible with the columnar store — column order and type mapping owned entirely by me
Solved the backfill problem: DDB Streams only captures future writes. For 1.3M existing records (1.7GB), used a temp-table strategy — copy prod data to a temp DDB table with streams enabled, run a parallel DataCraft pipeline into the same Andes destination, then activate the prod stream pipeline. Live writes and backfill converge safely because stream events carry timestamps
Debugged a production row count mismatch (expected 1255, found 1254) — ruled out data loss by querying Andes directly, identified root cause as a manifest generation bug in pipelines created before a certain DataCraft version flag became default. Fix: recreate pipeline with flag first, then recreate Datashare — order matters because recreating Datashare before fixing the pipeline reads from the same broken manifest
Deployed across EU, NA, and FE regions with consistent schema

Multi-region · 1.3M records backfilled · Production bug debugged and fixed · Downstream Redshift unblocked

⚡ Sync → Async Re-architecture — 50% Latency Reduction `Amazon`

Converted synchronous IT workflows that were timing out under peak load into an event-driven async model.

Identified root cause: synchronous call chains where upstream service waited on downstream completion — under load, downstream slowness caused cascading timeouts across the entire chain
Redesigned to publish-and-forget: services emit events to SQS, downstream consumers process at their own pace with idempotency guards for at-least-once delivery
Added DLQ + visibility timeout tuning to prevent message loss under sustained load spikes
Execution time: 50 minutes → 25 minutes. Production timeouts: eliminated.

💳 Payment Gateway API Design — 5 Gateways `Pine Labs`

Led API design and technical ownership for Pine Labs' multi-gateway payment integration layer, with junior engineers implementing individual gateway integrations under my design.

Designed the unified gateway abstraction: a single internal API contract that normalized heterogeneous external gateway interfaces — each gateway had different auth schemes, retry semantics, error codes, and idempotency models. Abstraction layer hid all of this from callers
Owned the resilience contract: defined how retries, timeouts, and idempotency keys worked at the abstraction layer — individual gateway implementations had to conform, not invent their own retry logic
Led code reviews with a specific focus on failure modes: "what happens if this gateway returns a 200 but the transaction is actually pending?", "how does this handle a network timeout mid-request?" — taught juniors to think in failure paths, not happy paths
Drove reconciliation flow design for failed or ambiguous transactions — financial systems need a recovery path, not just error logging

5 gateways integrated · Unified abstraction owned · Junior engineers mentored on production-grade error handling

📡 Real-Time Notification Pipeline — Kafka `Niyo`

Built a Kafka-based real-time notification pipeline delivering push, SMS, and email alerts to users on transaction events — replacing a batch-based approach that introduced unacceptable delivery delays.

Transaction Event → Kafka Topic → Notification Consumer Service
                                       → Push (FCM/APNs)
                                       → SMS (provider)
                                       → Email (provider)

Designed consumer group configuration for fault-tolerant, ordered processing — partition assignment ensured per-user event ordering was preserved across notification channels
Handled the fan-out routing problem: a single Kafka message needed to trigger multiple notification channels based on user preferences and event type — built a routing layer inside the consumer that dispatched to the right provider without duplicating event consumption
Implemented offset commit strategy carefully: committed offsets only after all notification dispatches succeeded — a failed SMS dispatch would not silently drop the message, it would retry from the last committed offset
Reduced notification delivery latency from batch-cycle delays to near real-time

Kafka · Multi-channel fan-out · Ordered delivery · Fault-tolerant offset management

📧 Scheduled Report Automation — EventBridge → Lambda → SES `Amazon`

Built a scheduled reporting pipeline from scratch because it was toil that should not exist. Nobody asked me to — I identified it and eliminated it.

EventBridge cron → Lambda → DynamoDB scan → in-memory CSV generation (not /tmp — Lambda's ephemeral filesystem is cleaned between invocations; in-memory is faster and has no cleanup cost) → S3 archive + SES email delivery
Made the CDK stack generic: accepts a LambdaConfig interface so any future scheduled report reuses the same construct — no copy-paste infrastructure
When the stack was accidentally deleted during the ZAZ migration decommission, I rebuilt it and encoded 4 production lessons directly into CDK: RemovalPolicy.RETAIN on S3 (survives stack deletion), in-memory CSV, generic stack, CloudWatch error alarm (original had zero observability — failures were invisible until someone noticed a missing email)

Toil eliminated · Infrastructure made deletion-proof · Observability added · CDK construct reusable

`$ cat ai_in_practice.md`

AI isn't a line on my resume — it's part of how I build and how I work.

Building with AI — flat/flatmates (side project, working prototype):

Stack: Next.js · Java Spring Boot · LLM API

Architecture:
├── Preference intake → structured user profile (Spring Boot)
├── Compatibility scoring via LLM API — prompt engineered for
│   deterministic structured output (JSON), not free-form text
├── RAG layer: user-generated descriptions embedded + retrieved
│   at match time to give LLM relevant context per query
├── Cold-start strategy: new users with no history get rule-based
│   scoring until enough signal exists to switch to LLM scoring
└── Cost control: LLM called only at match-time, not on every
    profile update — cached embeddings, selective inference

Using AI in daily engineering:

Tools:      Cursor · GitHub Copilot
What for:   Boilerplate elimination · Test case generation
            Debugging hypothesis generation · PR description drafting
What not:   Architecture decisions · Production incident RCA
            Anything where I need to own the reasoning

Actively studying: LangChain internals · Pinecone / Weaviate · RAG vs fine-tuning decision framework

`$ tech --stack`

┌─────────────────┬──────────────────────────────────────────────────┐
│ Languages       │ Java · Go · TypeScript · Python · C++            │
│ Cloud (AWS)     │ ECS · Lambda · SQS · SNS · DynamoDB · S3         │
│                 │ OpenSearch · EventBridge · CDK · CloudWatch       │
│ Messaging       │ Kafka · SNS/SQS · Event-driven architecture       │
│ Databases       │ DynamoDB · MongoDB · MySQL · Redis                │
│ Frameworks      │ Spring Boot · Next.js · React                    │
│ Infrastructure  │ Docker · AWS CDK · CloudFormation                │
│ Observability   │ CloudWatch · Log Insights · Alarms · DLQ         │
│ AI/ML           │ LangChain · LLM APIs · RAG · Vector embeddings   │
│ Testing         │ Cypress · Parallel sharding · Integration tests  │
└─────────────────┴──────────────────────────────────────────────────┘

`$ cat engineering_principles.txt`

// Principles I've developed from shipping real systems

1.  Separate data migration from traffic cutover — never do both simultaneously
2.  Decommission is irreversible; treat it differently from deployment
3.  Idempotency is not optional when your delivery guarantee is at-least-once
4.  A metadata bug is not data corruption — identify which one before alerting anyone
5.  Design for the failure path first; the happy path usually works
6.  Payment systems fail in creative ways — build reconciliation in, not as an afterthought
7.  When forced to rebuild, encode what production taught you directly into the infrastructure
8.  Every technical decision should have a stated "what breaks and when"
9.  Teach engineers to think in failure modes, not just correct behavior
10. If something is toil, eliminate it — don't document a workaround

`$ git log --stats`

`$ reach --out`

$ curl -X GET https://linkedin.com/in/guptaaayushi09
$ curl -X GET https://leetcode.com/code_buddy21
$ echo "aayushi09023@gmail.com"

_{Built systems for millions of users · Still writing code like it matters · Because it does}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aayushi Gupta guptaaayushi09

Achievements

Achievements

Block or report guptaaayushi09

`$ whoami`

`$ cat production_impact.log`

🌍 Cross-Region Migration — 1M+ Customers, Zero Downtime `Amazon`

🔐 GDPR Deletion Pipeline — Compliance at Scale `Amazon`

📊 Real-Time CDC Pipeline — DynamoDB → Columnar Analytics `Amazon`

⚡ Sync → Async Re-architecture — 50% Latency Reduction `Amazon`

💳 Payment Gateway API Design — 5 Gateways `Pine Labs`

📡 Real-Time Notification Pipeline — Kafka `Niyo`

📧 Scheduled Report Automation — EventBridge → Lambda → SES `Amazon`

`$ cat ai_in_practice.md`

`$ tech --stack`

`$ cat engineering_principles.txt`

`$ git log --stats`

`$ reach --out`

Popular repositories Loading

Uh oh!

Aayushi Gupta guptaaayushi09

Achievements

Achievements

$ whoami

$ cat production_impact.log

🌍 Cross-Region Migration — 1M+ Customers, Zero Downtime Amazon

🔐 GDPR Deletion Pipeline — Compliance at Scale Amazon

📊 Real-Time CDC Pipeline — DynamoDB → Columnar Analytics Amazon

⚡ Sync → Async Re-architecture — 50% Latency Reduction Amazon

💳 Payment Gateway API Design — 5 Gateways Pine Labs

📡 Real-Time Notification Pipeline — Kafka Niyo

📧 Scheduled Report Automation — EventBridge → Lambda → SES Amazon

$ cat ai_in_practice.md

$ tech --stack

$ cat engineering_principles.txt

$ git log --stats

$ reach --out

Popular repositories Loading

Uh oh!

`$ whoami`

`$ cat production_impact.log`

🌍 Cross-Region Migration — 1M+ Customers, Zero Downtime `Amazon`

🔐 GDPR Deletion Pipeline — Compliance at Scale `Amazon`

📊 Real-Time CDC Pipeline — DynamoDB → Columnar Analytics `Amazon`

⚡ Sync → Async Re-architecture — 50% Latency Reduction `Amazon`

💳 Payment Gateway API Design — 5 Gateways `Pine Labs`

📡 Real-Time Notification Pipeline — Kafka `Niyo`

📧 Scheduled Report Automation — EventBridge → Lambda → SES `Amazon`

`$ cat ai_in_practice.md`

`$ tech --stack`

`$ cat engineering_principles.txt`

`$ git log --stats`

`$ reach --out`