Skip to content

mosharaf/cse585

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSE 585: Advanced Scalable Systems for Agentic AI (W'26)

Administrivia

  • Catalog Number: 35120
  • Lectures/Discussion: 1200 EECS, TTh: 10:30 AM – 12:00 PM
  • Projects/Makeup: 1014 DOW, F 1:30 PM – 2:30 PM
  • Counts as: Software Breadth and Depth (PhD); Technical Elective and 500-Level (MS/E)

Team

Member (uniqname) Role Office Hours
Mosharaf Chowdhury (mosharaf) Faculty 4156 LEIN. By appointments only.
Shiqi He (shiqihe) GSI 1637 BBB. Wed 4:00 PM – 6:00 PM.

Communication

ALL communication regarding this course must be via Ed. This includes questions, discussions, announcements, as well as private messages.

Presentation slides and paper summaries should be emailed to cse585-staff@umich.edu.

Course Description

This iteration of CSE585 will introduce you to the key concepts and the state-of-the-art in practical, scalable, and fault-tolerant systems for Agentic and Generative AI and encourage you to think about either building new tools or how to apply the existing ones.

Since datacenters and cloud computing form the backbone of modern computing, we will start with an overview of the two. We will then take a deep dive into systems for the Agentic and Generative AI landscape, focusing on different types of problems. Our topics will include: basics on generative models and agentic AI from a systems perspective; systems for AI lifecycle including pre-training, post-training, and inference serving systems; systems for agentic AI; etc. We will cover topics primarily from top conferences that take a systems view to the relevant challenges.

Note that this course is NOT focused on AI methods. Instead, we will focus on how one can build systems so that existing AI methods can be used in practice and new AI methods can emerge.

Prerequisites

Students are expected to have good programming skills and must have taken at least one undergraduate-level systems-related course (from operating systems/EECS482, databases/EECS484, distributed systems/EECS491, and networking/EECS489). Having an undergraduate ML/AI course may be helpful, but not required or necessary.

Textbook

This course has no textbooks. We will read recent papers from top venues to understand trends in scalable GenAI and agentic systems, and their applications.

Tentative Schedule and Reading List

This is an evolving list and subject to changes due to the breakneck pace of agentic and generative AI innovations.

Date Readings Presenter Summary Reviewer
Jan 8 No Lecture: Find Project Groups
How to Read a Paper (Required)
How to Give a Bad Talk (Required)
Jan 13 Introduction Mosharaf
Hints and Principles for Computer System Design (Required)
The Datacenter as a Computer (Chapters 1 and 2)
Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput (Required)
Jan 15 Systems for AI Basics Shiqi
The Illustrated Transformer (Required)
Deep Dive into LLMs like ChatGPT
OpenHands: An Open Platform for AI Software Developers as Generalist Agents (Required)
Jan 20 Distributed Training Basics Shiqi
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM (Required)
Jan 22 No Lecture: Work on Project Proposals
Writing Reviews for Systems Conferences (Required)
Worse is Better (Required)
Pre-Training
Jan 27 WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training (Required) Rohan, Archit, Divya, Maaz Joshua, Rishith, Olaf, Jimmy Anika, Joshua, Namita, Nandana
Zero Bubble (Almost) Pipeline Parallelism (Required)
Jan 29 Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (Required) Adam, Qilong, Yung-Hao, Zhe Yiqun, Yicheng, Yihang, Xiangchen Evan, Frank, Madeleine, Alan
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models (Required)
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Feb 3 TrainVerify: Equivalence-Based Verification for Distributed LLM Training (Required) Matthew, Madison, Minkyu, Kevin Ajay, Allison, Jamal, Tejas Shivam, Aman, Leonard, Dimash
SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates (Required)
Post-Training
Feb 5 HybridFlow: A Flexible and Efficient RLHF Framework Joshua, Rishith, Olaf, Jimmy Rohan, Archit, Divya, Maaz Yiqun, Yicheng, Yihang, Xiangchen
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning (Required)
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Required)
Inference
Feb 10 Inference Basics Shiqi
Orca: A Distributed Serving System for Transformer-Based Generative Models (Required)
Efficient Memory Management for Large Language Model Serving with PagedAttention (Required)
On Evaluating Performance of LLM Inference Serving Systems
Feb 12 DistServe: Disaggregating Prefill and Decoding for Goodput-Optimized Large Language Model Serving (Required) Anika, Joshua, Namita, Nandana Jingjing, Yile, Zhengqing, Barry Shruti, Srikrishnan, Nikhil, Pranav
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve (Required)
Feb 17 No Lecture: Work on Projects
Feb 19 NanoFlow: Towards Optimal Large Language Model Serving Throughput (Required) Evan, Frank, Madeleine, Alan Matthew, Madison, Minkyu, Kevin Tea, Nidhil, Dillan
Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot (Required)
Feb 24 LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism (Required) Shivam, Aman, Leonard, Dimash Kidus, Blake, Torence, Ethan Ajay, Allison, Jamal, Tejas
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism (Required)
Feb 26 Cornserve: Efficiently Serving Any-to-Any Multimodal Models (Required) Jingjing, Yile, Zhengqing, Barry Anika, Joshua, Namita, Nandana Vansh, Pranav, Anshul, Shrey
TetriServe: Efficient DiT Serving for Heterogeneous Image Generation
Approximate Caching for Efficiently Serving Diffusion Models (Required)
Mar 10 No Lecture: Work on Presentations
Mar 12 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding (Required) Kidus, Blake, Torence, Ethan Shruti, Srikrishnan, Nikhil, Pranav Adam, Qilong, Yung-Hao, Zhe
ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation (Required)
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Mar 17 Mid-Semester Presentations
Mar 19 Mid-Semester Presentations
Agentic AI Systems
Mar 24 Parrot: Efficient Serving of LLM-based Applications with Semantic Variable (Required) Ajay, Allison, Jamal, Tejas Tea, Nidhil, Dillan Joshua, Rishith, Olaf, Jimmy
Pie: A Programmable Serving System for Emerging LLM Applications (Required)
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
Mar 26 Towards End-to-End Optimization of LLM-based Applications with Ayo (Required) Yiqun, Yicheng, Yihang, Xiangchen Shivam, Aman, Leonard, Dimash Matthew, Madison, Minkyu, Kevin
AVA: Towards Agentic Video Analytics with Vision Language Models (Required)
Mar 31 METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation (Required) Tea, Nidhil, Dillan Evan, Frank, Madeleine, Alan Jingjing, Yile, Zhengqing, Barry
HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows (Required)
Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining
Hardware / Infrastructure
Apr 2 WaferLLM: Large Language Model Inference at Wafer Scale (Required) Vansh, Pranav, Anshul, Shrey Marie, Emily Kidus, Blake, Torence, Ethan
Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework (Required)
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Power and Energy Management
Apr 7 Reducing Energy Bloat in Large Model Training Marie, Emily Vansh, Pranav, Anshul, Shrey Rohan, Archit, Divya, Maaz
Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training (Required)
TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms (Required)
Apr 9 Power Stabilization for AI Training Datacenters (Required) Shruti, Srikrishnan, Nikhil, Pranav Adam, Qilong, Yung-Hao, Zhe Marie, Emily
AI Training Load Fluctuations at Gigawatt-scale – Risk of Power Grid Blackout? (Required)
Wrap Up
Apr 14 On the Dangers of Stochastic Parrots: Can Language Models be too Big?🦜 (Required) Mosharaf
We Need a New Ethics for a World of AI Agents (Required)
Apr 16 No Lecture: Work on Posters
Creating an Effective Poster (Required)
How to Write a Great Research Paper (Required)
Apr 21 Final Poster Presentations
(10 AM - 12 PM, Tishman hall)
Template

Policies

Honor Code

The Engineering Honor Code applies to all activities related to this course.

Groups

All activities of this course will be performed in groups of 4 students.

Required Reading

Each lecture will have two required reading that everyone must read.
There will be one or more optional related reading(s) that only the presenter(s) should be familiar with. They are optional for the rest of the class.

Student Lectures

The course will be conducted as a seminar. Only one group will present in each class. Each group will be assigned at least one lecture over the course of the semester. Presentations should succinctly cover all required papers for that lecture. The duration of the presentation should be at most 35 minutes with short clarifying questions and interruptions. The rest of the lecture time will be dedicated toward discussion on the papers and the broader topic(s) covered by the papers.

In the presentation, you should:

  • Provide necessary background and motivate the problem.
  • Present the high level idea, approach, and/or insight (using examples, whenever appropriate) in the required reading as well as the additional reading.
  • Discuss technical details so that one can understand key details without carefully reading.
  • Explain the differences between related works.
  • Identify strengths and weaknesses of the required reading and propose directions of future research.

The slides for a presentation must be emailed to the instructor team at least 24 hours prior to the corresponding class. Use Google slides to enable in-line comments and suggestions.

Lecture Summaries

Each group will also be assigned to write summaries for at least one lectures. The summary assigned to a group will not be the reading they gave the lecture on. The group will write a summary for all presented papers (required readings) for that lecture.

A paper summary must address the following four questions in sufficient details (2-3 pages):

  • What is the problem addressed in the lecture, and why is this problem important?
  • What is the state of related works in this topic?
  • What is the proposed solution, and what key insight guides their solution?
  • What is one (or more) drawback or limitation of the proposal?
  • What are potential directions for future research?

The paper summary of a paper must be emailed to the instructor team within 24 hours after its presentation. Late reviews will not be counted. You should use this format for writing your summary. Use Google doc to enable in-line comments and suggestions.

Allocate enough time for your reading, discuss as a group, write the summary carefully, and finally, include key observations from the class discussion.

Post-Presentation Panel Discussion

To foster a deeper understanding of the papers and encourage critical thinking, each lecture will be followed by a panel discussion. This discussion will involve three distinct roles played by different student groups, simulating an interactive and dynamic scholarly exchange.

Roles and Responsibilities

  1. The Authors
  • Group Assignment: The group that presents the paper and the group that writes the summary will play the role of the paper's authors.
  • Responsibility: As authors, you are expected to defend your paper against critiques, answer questions, and discuss how you might improve or extend your research in the future, akin to writing a rebuttal during the peer-review process.
  1. The Reviewers
  • Group Assignment: Each group will be assigned to one slot to play the role of reviewers for all presented papers (required readings) of that lecture.
  • Responsibility: Reviewers critically assess the paper, posing challenging questions and highlighting potential weaknesses or areas for further investigation. Your goal is to engage in a constructive critique of the paper, simulating a peer review scenario.
  1. Rest of the Class
  • Responsibility:
    • You are required to submit one insightful question for each presented paper before each class.
    • During the panel discussions, feel free to actively ask questions and engage in the dialogue.

Participation

Given the discussion-based nature of this course, participation is required both for your own understanding and to improve the overall quality of the course. You are expected to attend all lectures (you may skip up to 2 lectures due to legitimate reasons), and more importantly, participate in class discussions. There will be random events to gauge attendance.

A key part of participation will be in the form of discussion in Ed. The group in charge of the summary should initiate the discussion and the rest should participate. Not everyone must have add something every day, but it is expected that everyone has something to say over the semester.

Project

You will have to complete substantive work an instructor-approved problem and have original contribution. Surveys are not permitted as projects; instead, each project must contain a survey of background and related work.

You must meet the following milestones (unless otherwise specified in future announcements) to ensure a high-quality project at the end of the semester:

  • Form a group and declare your group's membership and paper preferences by January 23. After this date, we will form groups from the remaining students.
  • Turn in a 2-page draft proposal (including references) by February 6. Remember to include the names and Michigan email addresses of the group members. You may submit the project proposal here.
  • Each group must present mid-semester progress during class hours on March 17 and March 19.
  • Each group must turn in an 8-page final report and your code via email on or before 1:00PM EST on April 28. The report must be submitted as a PDF file, with formatting similar to that of the papers you've read in the class. It should point to a git repository with all the code along with a README file with a step-by-step guide on how to compile and run the code.
  • You can find how to access GPU resources here.

Tentative Grading

Weight
Paper Presentation 15%
Paper Summary 15%
Participation 10%
Project Report 40%
Project Presentations 20%

About

Advanced Scalable Systems for X

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors