Skip to content

[v2] Branching simulation modes & execution-time entry points #382

@vaseline555

Description

@vaseline555

Describe the Request

APPFL currently supports run_* under /src/appfl. (MPI/gPRC/Globus)
While run_serial.py exists, it will be deprecated in future, which seems useful for laptop-scale simulation experiments.

  1. branching simulation modes
  • I suggest to revive the run_serial.py, by having four modes internally:
    • run_serial: running on CPU/GPU in serial manner (for-loop style)
    • run_gloo: running in parallel on CPUs (torch.distributed.init_process_group(backend='gloo'))
    • run_nccl: running in parallel on multiple GPUs (torch.distributed.init_process_group(backend='nccl'))
    • run_mpi: running in parallel on multiple GPUs for intel GPUs (i.e., XPU)
  • While gloo can be regarded as an alternative of mpi, but it is thread-safe, pytorch-native backend, and thus no dependency on mpiexec.
  • (draft example):
    def run_distributed(config, backend: str) -> None:
  1. branching entry points
  • appfl
    • run
      • mode=real
        • comm=grpc
        • comm=globus
        • comm=ray
      • mode=sim
        • backend=serial
        • backend=gloo
        • backend=nccl
        • backend=mpi
    • commit
      • interface=chat // vibe coding interface to auto generate essential files for algorithm implementation.
      • interface=manual // sanity check interface to confirm manually-coded essential modules by users (e.g., aggregator,scheduler,trainer)
  • Currently a simple suggestion -- needs further discussion on this if acceptable

Sample Code

// simulation modes
appfl run mode=sim backend=serial config=...
appfl run mode=sim backend=gloo config=...
appfl run mode=sim backend=ncll config=...

// deployment modes
appfl run mode=real comm=grpc config=...
appfl run mode=real comm=globus config=...
...

// commit modes
appfl commit interface=chat config=...
appfl commit interface=manual config=...

Additional Code or Information

To-Do

  • Check and test torch.distributed compatibility with Intel XPU
  • Determine run_mpi design
  • Devise and confirm entry points design pattern
  • Research existing Generative UI design for the chat commit modes
  • Envision specific paths and plausible scenarios for the sustainable maintenance of the commit mode (for both advanced / new users)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions