[v2] Branching simulation modes & execution-time entry points

### Describe the Request

`APPFL` currently supports `run_*` under `/src/appfl`. (MPI/gPRC/Globus)
While `run_serial.py` exists, it will be deprecated in future, which seems useful for laptop-scale simulation experiments.

1. branching simulation modes
- I suggest to revive the `run_serial.py`, by having four modes internally:
  - `run_serial`: running on CPU/GPU in serial manner (for-loop style)
  - `run_gloo`: running in parallel on CPUs (`torch.distributed.init_process_group(backend='gloo')`)
  - `run_nccl`: running in parallel on multiple GPUs (`torch.distributed.init_process_group(backend='nccl')`)
  - `run_mpi`: running in parallel on multiple GPUs for intel GPUs (i.e., XPU)
- While `gloo` can be regarded as an alternative of `mpi`, but it is thread-safe, pytorch-native backend, and thus no dependency on `mpiexec`.
- (draft example): https://github.com/APPFL/APPFL/blob/4a26e033eac5ab995ffe742ff2589695d64d68b7/src/appfl/sim/runner.py#L580

2. branching entry points
- `appfl`
  - `run`
    - `mode=real`
      - `comm=grpc`
      - `comm=globus`
      - `comm=ray`
    - `mode=sim`
      - `backend=serial`
      - `backend=gloo`
      - `backend=nccl`
      - `backend=mpi`
  - `commit`
    - `interface=chat` //  vibe coding interface to auto generate essential files for algorithm implementation.
    - `interface=manual` // sanity check interface to confirm manually-coded essential modules by users (e.g., `aggregator`,`scheduler`,`trainer`)
- Currently a simple suggestion -- needs further discussion on this if acceptable

### Sample Code

```python
// simulation modes
appfl run mode=sim backend=serial config=...
appfl run mode=sim backend=gloo config=...
appfl run mode=sim backend=ncll config=...

// deployment modes
appfl run mode=real comm=grpc config=...
appfl run mode=real comm=globus config=...
...

// commit modes
appfl commit interface=chat config=...
appfl commit interface=manual config=...
```

### Additional Code or Information

```console

```


### To-Do
- [ ] Check and test `torch.distributed` compatibility with Intel XPU
- [ ] Determine `run_mpi` design
- [ ] Devise and confirm entry points design pattern
- [ ] Research existing Generative UI design for the chat commit modes
- [ ] Envision specific paths and plausible scenarios for the sustainable maintenance of the commit mode (for both advanced / new users)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Branching simulation modes & execution-time entry points #382

Describe the Request

Sample Code

Additional Code or Information

To-Do

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[v2] Branching simulation modes & execution-time entry points #382

Description

Describe the Request

Sample Code

Additional Code or Information

To-Do

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions