Skip to content
View guptaaayushi09's full-sized avatar

Block or report guptaaayushi09

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
guptaaayushi09/README.md
╔═══════════════════════════════════════════════════════╗
║  AAYUSHI GUPTA  ·  Software Development Engineer      ║
║  Backend Systems  ·  Distributed Architecture  ·  AWS ║
╚═══════════════════════════════════════════════════════╝

LinkedIn LeetCode AWS Visitors


$ whoami

Backend engineer with 3+ years building production systems across Amazon, Pine Labs, and Niyo — distributed architectures, event-driven pipelines, cross-region migrations, payment infrastructure, and real-time notification systems at scale.

I design APIs, own systems end-to-end, mentor engineers, and ship things that don't break at 3AM.

Engineer engineer = Engineer.builder()
    .experience(List.of("Amazon", "Pine Labs", "Niyo"))
    .focus("Distributed Systems · Event-Driven Architecture · Cloud Infrastructure")
    .languages(List.of("Java", "Go", "TypeScript", "Python"))
    .cloud("AWS — ECS, Lambda, SQS/SNS, DynamoDB, OpenSearch, CDK")
    .ai(List.of("LangChain", "RAG pipelines", "LLM APIs", "Vector DBs"))
    .certifications("AWS Certified")
    .dsaSolved(500)
    .build();

$ cat production_impact.log

Real work. Real scale. No toy projects.


🌍 Cross-Region Migration — 1M+ Customers, Zero Downtime Amazon

Owned end-to-end migration of a stateful ECS service from eu-west-1 → eu-south-2 as part of Amazon's Regional Flex Planning initiative. No prior playbook existed for this in the org — I wrote it.

  • Designed the migration architecture from scratch: phased approach — per-tenant S3 cross-region replication + OpenSearch snapshot-restore, followed by gradual WebLab traffic ramp (5% → 25% → 50% → 100%), followed by formal decommission via change management
  • Solved the stateful data problem: validated per-tenant S3 object counts and OpenSearch document counts before touching a single traffic percentage — data parity was the go/no-go gate, not a timer
  • Separated "zero traffic" from "delete resources" by a full 7-day monitoring window — decommission is irreversible; I treated it accordingly
  • Owned the decommission sequence: endpoint removal → compute deletion → data deletion — order matters because running compute without data causes cascading failures
  • Coordinated multiple client teams across Alpha and Prod accounts; wrote the runbook so the next engineer doesn't start from zero

Zero downtime · Zero data loss · 1M+ customers · Full org-level playbook created


🔐 GDPR Deletion Pipeline — Compliance at Scale Amazon

Designed and built a compliance-grade, event-driven customer data deletion system across 3+ microservices. The correctness bar here is non-negotiable — a missed deletion is a compliance violation.

SNS (deletion event) → SQS (buffered queue) → Lambda → DynamoDB tables
                                  ↓                  → External state API
                                 DLQ (failure capture + alarm)
  • Idempotency by design: deletion state tracked in a DynamoDB tracking table keyed by customerId + requestId — SQS at-least-once delivery means duplicate processing is guaranteed to happen; the system handles it safely
  • Solved the distributed transaction gap: DynamoDB delete succeeds but external API call fails → message returns to queue → DynamoDB delete is a no-op on retry → external API gets called again. Chose data-deletion-first ordering deliberately: missing a state update is recoverable; data existing after a deletion request is a compliance violation
  • DLQ + CloudWatch alarm ensures no deletion silently fails — every failure is captured, alerted, and replayable after root cause fix
  • Chose Lambda over persistent ECS: deletion is bursty and infrequent — Lambda scales to zero and only costs on execution

GDPR right-to-erasure · 100K+ customers · At-least-once safe · Zero silent failures


📊 Real-Time CDC Pipeline — DynamoDB → Columnar Analytics Amazon

Built a multi-region change data capture pipeline to make operational DynamoDB data queryable for analytics without impacting production read capacity.

  • Designed schema transformation layer: DDB's typed JSON format ({S: "val"}, {BOOL: true}) → flat TSV rows compatible with the columnar store — column order and type mapping owned entirely by me
  • Solved the backfill problem: DDB Streams only captures future writes. For 1.3M existing records (1.7GB), used a temp-table strategy — copy prod data to a temp DDB table with streams enabled, run a parallel DataCraft pipeline into the same Andes destination, then activate the prod stream pipeline. Live writes and backfill converge safely because stream events carry timestamps
  • Debugged a production row count mismatch (expected 1255, found 1254) — ruled out data loss by querying Andes directly, identified root cause as a manifest generation bug in pipelines created before a certain DataCraft version flag became default. Fix: recreate pipeline with flag first, then recreate Datashare — order matters because recreating Datashare before fixing the pipeline reads from the same broken manifest
  • Deployed across EU, NA, and FE regions with consistent schema

Multi-region · 1.3M records backfilled · Production bug debugged and fixed · Downstream Redshift unblocked


⚡ Sync → Async Re-architecture — 50% Latency Reduction Amazon

Converted synchronous IT workflows that were timing out under peak load into an event-driven async model.

  • Identified root cause: synchronous call chains where upstream service waited on downstream completion — under load, downstream slowness caused cascading timeouts across the entire chain
  • Redesigned to publish-and-forget: services emit events to SQS, downstream consumers process at their own pace with idempotency guards for at-least-once delivery
  • Added DLQ + visibility timeout tuning to prevent message loss under sustained load spikes
  • Execution time: 50 minutes → 25 minutes. Production timeouts: eliminated.

💳 Payment Gateway API Design — 5 Gateways Pine Labs

Led API design and technical ownership for Pine Labs' multi-gateway payment integration layer, with junior engineers implementing individual gateway integrations under my design.

  • Designed the unified gateway abstraction: a single internal API contract that normalized heterogeneous external gateway interfaces — each gateway had different auth schemes, retry semantics, error codes, and idempotency models. Abstraction layer hid all of this from callers
  • Owned the resilience contract: defined how retries, timeouts, and idempotency keys worked at the abstraction layer — individual gateway implementations had to conform, not invent their own retry logic
  • Led code reviews with a specific focus on failure modes: "what happens if this gateway returns a 200 but the transaction is actually pending?", "how does this handle a network timeout mid-request?" — taught juniors to think in failure paths, not happy paths
  • Drove reconciliation flow design for failed or ambiguous transactions — financial systems need a recovery path, not just error logging

5 gateways integrated · Unified abstraction owned · Junior engineers mentored on production-grade error handling


📡 Real-Time Notification Pipeline — Kafka Niyo

Built a Kafka-based real-time notification pipeline delivering push, SMS, and email alerts to users on transaction events — replacing a batch-based approach that introduced unacceptable delivery delays.

Transaction Event → Kafka Topic → Notification Consumer Service
                                       → Push (FCM/APNs)
                                       → SMS (provider)
                                       → Email (provider)
  • Designed consumer group configuration for fault-tolerant, ordered processing — partition assignment ensured per-user event ordering was preserved across notification channels
  • Handled the fan-out routing problem: a single Kafka message needed to trigger multiple notification channels based on user preferences and event type — built a routing layer inside the consumer that dispatched to the right provider without duplicating event consumption
  • Implemented offset commit strategy carefully: committed offsets only after all notification dispatches succeeded — a failed SMS dispatch would not silently drop the message, it would retry from the last committed offset
  • Reduced notification delivery latency from batch-cycle delays to near real-time

Kafka · Multi-channel fan-out · Ordered delivery · Fault-tolerant offset management


📧 Scheduled Report Automation — EventBridge → Lambda → SES Amazon

Built a scheduled reporting pipeline from scratch because it was toil that should not exist. Nobody asked me to — I identified it and eliminated it.

  • EventBridge cron → Lambda → DynamoDB scan → in-memory CSV generation (not /tmp — Lambda's ephemeral filesystem is cleaned between invocations; in-memory is faster and has no cleanup cost) → S3 archive + SES email delivery
  • Made the CDK stack generic: accepts a LambdaConfig interface so any future scheduled report reuses the same construct — no copy-paste infrastructure
  • When the stack was accidentally deleted during the ZAZ migration decommission, I rebuilt it and encoded 4 production lessons directly into CDK: RemovalPolicy.RETAIN on S3 (survives stack deletion), in-memory CSV, generic stack, CloudWatch error alarm (original had zero observability — failures were invisible until someone noticed a missing email)

Toil eliminated · Infrastructure made deletion-proof · Observability added · CDK construct reusable


$ cat ai_in_practice.md

AI isn't a line on my resume — it's part of how I build and how I work.

Building with AI — flat/flatmates (side project, working prototype):

Stack: Next.js · Java Spring Boot · LLM API

Architecture:
├── Preference intake → structured user profile (Spring Boot)
├── Compatibility scoring via LLM API — prompt engineered for
│   deterministic structured output (JSON), not free-form text
├── RAG layer: user-generated descriptions embedded + retrieved
│   at match time to give LLM relevant context per query
├── Cold-start strategy: new users with no history get rule-based
│   scoring until enough signal exists to switch to LLM scoring
└── Cost control: LLM called only at match-time, not on every
    profile update — cached embeddings, selective inference

Using AI in daily engineering:

Tools:      Cursor · GitHub Copilot
What for:   Boilerplate elimination · Test case generation
            Debugging hypothesis generation · PR description drafting
What not:   Architecture decisions · Production incident RCA
            Anything where I need to own the reasoning

Actively studying: LangChain internals · Pinecone / Weaviate · RAG vs fine-tuning decision framework


$ tech --stack

┌─────────────────┬──────────────────────────────────────────────────┐
│ Languages       │ Java · Go · TypeScript · Python · C++            │
│ Cloud (AWS)     │ ECS · Lambda · SQS · SNS · DynamoDB · S3         │
│                 │ OpenSearch · EventBridge · CDK · CloudWatch       │
│ Messaging       │ Kafka · SNS/SQS · Event-driven architecture       │
│ Databases       │ DynamoDB · MongoDB · MySQL · Redis                │
│ Frameworks      │ Spring Boot · Next.js · React                    │
│ Infrastructure  │ Docker · AWS CDK · CloudFormation                │
│ Observability   │ CloudWatch · Log Insights · Alarms · DLQ         │
│ AI/ML           │ LangChain · LLM APIs · RAG · Vector embeddings   │
│ Testing         │ Cypress · Parallel sharding · Integration tests  │
└─────────────────┴──────────────────────────────────────────────────┘

$ cat engineering_principles.txt

// Principles I've developed from shipping real systems

1.  Separate data migration from traffic cutover — never do both simultaneously
2.  Decommission is irreversible; treat it differently from deployment
3.  Idempotency is not optional when your delivery guarantee is at-least-once
4.  A metadata bug is not data corruption — identify which one before alerting anyone
5.  Design for the failure path first; the happy path usually works
6.  Payment systems fail in creative ways — build reconciliation in, not as an afterthought
7.  When forced to rebuild, encode what production taught you directly into the infrastructure
8.  Every technical decision should have a stated "what breaks and when"
9.  Teach engineers to think in failure modes, not just correct behavior
10. If something is toil, eliminate it — don't document a workaround

$ git log --stats

GitHub Stats

Top Languages

Streak


$ reach --out

$ curl -X GET https://linkedin.com/in/guptaaayushi09
$ curl -X GET https://leetcode.com/code_buddy21
$ echo "aayushi09023@gmail.com"
Built systems for millions of users · Still writing code like it matters · Because it does

Popular repositories Loading

  1. Hactoberfest_2021 Hactoberfest_2021 Public

    #hacktober2021 #hacktober2k21

    C++ 1 31

  2. Hacktoberfest-2k21 Hacktoberfest-2k21 Public

    Forked from rupeshpandit/Hacktoberfest-2k21

    #hacktoberfest2k21 #hacktoberfest #hacktoberfest2021

    C++

  3. Hacktoberfest-2k20 Hacktoberfest-2k20 Public

    Forked from sca-nit-jsr/Hacktoberfest-2k20

    Put your best code ever to complete the hacktoberfest 2020

    C++

  4. Hacktoberfest2K21-1 Hacktoberfest2K21-1 Public

    Forked from samridh-git/Hacktoberfest2K21

  5. Hacktoberfest-2k21- Hacktoberfest-2k21- Public

    Forked from MahakAgrawal03/Hacktoberfest-2k21-

    #hacktoberfest2k21 #hacktoberfest #hacktoberfest2021

    C++

  6. HacktoberFest HacktoberFest Public

    Forked from PritamNIT/HacktoberFest

    HTML