Skip to content
View Shehrozkashif's full-sized avatar

Highlights

  • Pro

Block or report Shehrozkashif

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shehrozkashif/README.md

Shehroz Kashif

AI Infrastructure Engineer · LLM Systems · GPU-Accelerated ML · RISC-V

LinkedIn GitHub Email Location


About Me

I build production-grade AI systems — from GPU-accelerated LLM inference backends and RAG pipelines to privacy-first hallucination mitigation frameworks and agentic desktop automation.

What sets me apart is depth across the full hardware-to-software stack: I've designed pipelined RISC-V processors in Chisel, optimized CUDA inference for quantized LLMs, and shipped open-source contributions into Linux Foundation and Intel repos — all while pursuing my BSc in Software Engineering.

Currently:

  • AI Infrastructure Engineer (Intern) @ Skoop — GPU-accelerated RAG backends, agentic AI systems
  • Research — Publishing on GAN-based hallucination mitigation for private LLMs

Current Focus

Area What I'm Working On
LLM Infra GPU inference backends — quantization, batching, FAISS RAG
Research Hallucination mitigation paper (AI4Org, GAN + REINFORCE RL)
Open Source Intel NNCF/OpenVINO — LoRA correction, PyTorch frontend fixes

Experience

AI Infrastructure Engineer (Intern)Skoop, Walnut Creek CA (Remote) Apr 2026 – Present

  • Deploying GPU-accelerated LLM inference backends via REST APIs
  • Building production RAG systems: ingestion → chunking → FAISS embedding → retrieval
  • Developing multi-step agentic AI systems over containerized backend services

AI EngineerThe Oval Labs, Karachi (Remote) Jan 2026 – Apr 2026

  • Built Verimate — an automated UVM verification platform for chip designs
  • Deployed INT4/INT8-quantized LLMs on CUDA clusters via local GPU servers
  • Architected end-to-end RAG pipelines with FAISS vector retrieval + LangChain + Gemini API
  • Reduced per-query latency via quantization, batching, and memory optimization

Software Engineer — LFX MentorshipRISC-V International / Linux Foundation (Remote) Mar 2025 – Jun 2025

  • Contributed to riscv-unified-db — the canonical machine-readable RISC-V ISA specification (167 stars)
  • Landed 6 merged PRs: new ISA extensions (Zilsd, Zclsd, Zcmop), CI test coverage, schema fixes
  • Mentored by engineers from Qualcomm, Ventana, and Synopsys
  • Automated formal specification workflows (Python, Ruby, Bash, YAML, GitHub Actions)

Research Assistant — AI & HardwareMERL (Micro Electronics Research Lab), Karachi Jun 2023 – Dec 2025

  • Led AI4Org: GAN-based hallucination mitigation for private LLMs — multi-discriminator pipeline + REINFORCE RL, fine-tuned LLaMA2/Mistral 7B on dual AMD RX 7900 XTX
  • Led ArcheV: LLM benchmark suite for RISC-V RV32I code — functional, syntactic, and edge-case evaluation via llama.cpp
  • Built Vermithor: 5-stage pipelined RV32I processor in Chisel (Scala) with hazard detection, forwarding, and Verilator verification
  • Managed multi-GPU training infrastructure (RTX 2060 + dual AMD RX 7900 XTX)

Open Source — 9+ Merged PRs

Repository Organization Contributions
riscv-unified-db Linux Foundation / RISC-V 6 merged PRs — ISA extensions, CI coverage, schema fixes
openvinotoolkit/nncf Intel 1 merged PR — stateful OpenVINO models in LLM compression
jonescompneurolab/hnn-core Harvard / MIT 1 merged PR — docstring improvements
merledu/ai4org MERL Lab 1 merged PR — GPU hallucination reduction pipeline

Under review: OpenVINO PR #33803 (PyTorch frontend fix), NNCF PR #3864 (LoRA transpose_a support)


Featured Projects

AI4Org — GAN-based Hallucination Mitigation for Private LLMs

github.com/merledu/ai4org

  • Privacy-first GAN framework: generator fine-tunes LLM responses, discriminator detects hallucinations
  • Multi-discriminator pipeline with REINFORCE RL optimization loop
  • Fine-tuned TinyLlama, GPT-2, LLaMA2, Mistral 7B using PEFT/LoRA/SFT
  • FAISS retrieval integration, automated Q&A data generation, INT4 quantization
  • Trained on RTX 2060 + dual AMD RX 7900 XTX with gradient checkpointing
  • Production-grade: CI/CD, logging, automated testing, design reviews
  • Planning research paper publication

Stack: PyTorch · Transformers · PEFT · LoRA · FAISS · CUDA · GANs · REINFORCE RL

ArcheV — LLM Benchmark Suite for RISC-V RV32I

github.com/merledu/ArcheV

  • Standardized evaluation framework for LLM-generated RISC-V assembly code
  • Evaluates functional correctness, syntactic validity, and ISA edge-case handling
  • llama.cpp local inference with structured prompt engineering pipelines
  • Reproducible JSON I/O, Verilog simulation, published and presented at lab level

Stack: Python · Verilog · llama.cpp · JSON

Vermithor — 5-Stage Pipelined RISC-V Processor
  • Full RV32I single-cycle and 5-stage pipelined implementations in Chisel (Scala)
  • Hazard detection, data forwarding, stall generation
  • Verified via Verilator simulation, standards-compliant and fully documented

Stack: Chisel · Scala · Verilator · RISC-V RV32I


Tech Stack

AI / LLM Systems RAG FAISS LLM Fine-tuning PEFT / LoRA / SFT GANs Hallucination Mitigation Prompt Engineering Agentic AI MCP A2A Protocol LangChain llama.cpp OpenVINO NNCF Gemini API

GPU & ML Infrastructure PyTorch TensorFlow Hugging Face Transformers CUDA INT4/INT8 Quantization Mixed Precision Gradient Checkpointing VRAM Optimization Multi-GPU Training OVMS

Languages Python Scala (Chisel) C++ Ruby Java JavaScript Verilog Shell/Bash YAML

Hardware & EDA RISC-V ISA (RV32I) Chisel HDL Verilator UVM Verification HPC Architecture

DevOps & Infrastructure Docker Linux GitHub Actions CI/CD REST APIs pytest Git


Education & Certifications

BSc Software Engineering — UIT University, Karachi · Feb 2022 – Feb 2026

Certification Issuer
Machine Learning with Python IBM
Deep Learning & Neural Networks with Keras IBM
Scalable Java Microservices with Spring Boot & Spring Cloud Google

GitHub Stats


Connect


Open to collaborations in production AI systems, LLM infrastructure, GPU optimization, and open-source tooling.

Pinned Loading

  1. riscv/riscv-unified-db riscv/riscv-unified-db Public

    Monorepo containing a machine-readable database of the RISC-V specification and artifact generation tools

    Ruby 169 136

  2. Vermithor Vermithor Public

    RISCV RV-32I 5 Stage Pipelined Processor

    Scala

  3. merledu/ArcheV merledu/ArcheV Public

    RISC-V RV-32i RTL Benchmark for evaluating Large Language Models.

    Verilog 3

  4. merledu/ai4org merledu/ai4org Public

    Hallucination reduction framework for LLMs using RAG, multi-discriminator RL, and automated data pipelines.

    Python 1 5