diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 6fa815b1920..6a5d31df5ab 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -650,6 +650,7 @@ peps/pep-0769.rst @facundobatista peps/pep-0770.rst @sethmlarson @brettcannon peps/pep-0771.rst @pradyunsg peps/pep-0773.rst @zooba +peps/pep-0774.rst @savannahostrowski # ... peps/pep-0777.rst @warsaw # ... diff --git a/peps/pep-0774.rst b/peps/pep-0774.rst new file mode 100644 index 00000000000..e6d171d1749 --- /dev/null +++ b/peps/pep-0774.rst @@ -0,0 +1,260 @@ +PEP: 774 +Title: Removing the LLVM requirement for JIT builds +Author: Savannah Ostrowski +Status: Draft +Type: Standards Track +Created: 27-Jan-2025 +Python-Version: 3.14 + +Abstract +======== + +Since Python 3.13, CPython has been able to be configured and built with an +experimental just-in-time (JIT) compiler via the ``--enable-experimental-jit`` +flag on Linux and Mac and ``--experimental-jit`` on Windows. To build CPython with +the JIT enabled, users are required to have LLVM installed on their machine +(initially, with LLVM 16 but more recently, with LLVM 19). LLVM is responsible +for generating stencils that are essential to our copy-and-patch JIT (see :pep:`744`). +These stencils are predefined, architecture-specific templates that are used +to generate machine code at runtime. + +This PEP proposes removing the LLVM build-time dependency for JIT-enabled builds +by hosting the generated stencils in the CPython repository. This approach +allows us to leverage the checked-in stencils for supported platforms at build +time, simplifying the contributor experience and address concerns raised at the +Python Core Developer Sprint in September 2024. That said, there is a clear +tradeoff to consider, as improved developer experience does come at the cost of +increased repository size. + +It is important to note that this PEP is not a proposal to accept or reject the +JIT itself but rather to determine whether the build-time dependency on LLVM is +acceptable for JIT builds moving forward. If this PEP is rejected, we will +proceed with the status quo, retaining the LLVM build-time requirement. While +this dependency has served the JIT development process effectively thus far, it +introduces setup complexity and additional challenges that this PEP seeks to +alleviate. + +Motivation +========== + +At the Python Core Developer Sprint that took place in September 2024, there was +discussion about the next steps for the JIT - a related discussion also took +place on `GitHub `__. As part +of that discussion, there was also a clear appetite for removing the LLVM +requirement for JIT builds in preparation for shipping the JIT off by default in +3.14. The consensus at the sprint was that it would be sufficient to provide +pre-generated stencils for non-debug builds for Tier 1 platforms and that +checking these files into the CPython repo would be adequate for the limited +number of platforms (though more options have been explored; see `Rejected +Ideas`_). + +Currently, building CPython with `the JIT requires LLVM +`__ as a +build-time dependency. Despite not being exposed to end users, this dependency +is suboptimal. Requiring LLVM adds a setup burden for developers and those who +wish to build CPython with the JIT enabled. Depending on the operating system, +the version of LLVM shipped with the OS may differ from that required by our JIT +builds, which introduces additional complexity to troubleshoot and resolve. With +few core developers currently contributing to and maintaining the JIT, we also +want to make sure that the friction to work on JIT-related code is minimized as +much as possible. + +With the proposed approach, hosting pre-compiled stencils for supported +architectures can be generated in advance, stored in a central location, and +automatically used during builds. This approach ensures reproducible builds, +making the JIT a more stable and sustainable part of CPython's future. + +Rationale +========= + +This PEP proposes checking JIT stencils directly into the CPython repo as the +best path forward for eliminating our build-time dependency on LLVM. + +This approach: + +* Provides the best end-to-end experience for those looking to build CPython + with the JIT +* Lessens the barrier to entry for those looking to contribute to the JIT +* Ensures builds remain reproducible and consistent across platforms without + relying on external infrastructure or download mechanisms +* Eliminates variability introduced by network conditions or potential + discrepancies between hosted files and the CPython repository state, and +* Subjects stencils to the same review processes we have for all other JIT-related + code + +However, this approach does result in a slight increase in overall +repository size. Comparing repo growth on commits over the past 90 days, the +difference between the actual commits and the same commits with stencils added +amounts to a difference of 0.03 MB per stencil file. This is a small increase in +the context of the overall repository size, which has grown by 2.55 MB in the +same time period. For six stencil files, this amounts to an upper bound of 0.18 MB. +The current total size of the stencil files for all six platforms is 7.2 MB. + +These stencils could become larger in the future with changes to register +allocation, which would introduce 5-6 variants per instruction in each stencil +file (5-6x larger). However, if we ended up going this route, there are +additional modifications we could make to stencil files that could help +counteract this size increase (e.g., stripping comments, minimizing the +stencils). + +Specification +============= + +This specification outlines the proposed changes to remove the build-time +dependency on LLVM and the contributor experience if this PEP is accepted. + +Repository changes +------------------ + +The CPython repository would now host the pre-compiled JIT stencils in a new +subdirectory in ``Tools/jit`` called ``stencils/``. At present, the JIT is tested +and built for six platforms, so to start, we'd check in six stencil files. In +the future, we may check in additional stencil files if support for additional +platforms is desired or relevant. + +.. code-block:: text + + cpython/ + Tools/ + jit/ + stencils/ + aarch64-apple-darwin.h + aarch64-unknown-linux-gnu.h + i686-pc-windows-msvc.h + x86_64-apple-darwin.h + x86_64-pc-windows-msvc.h + x86_64-pc-linux-gnu.h + +Workflow +-------- + +The workflow changes can be split into two parts, namely building CPython with +the JIT enabled and working on the JIT's implementation. + +Building CPython with the JIT +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Precompiled JIT stencil files will be stored in the ``Tools/jit/stencils`` +directory, with each file name corresponding to its target triple as outlined +above. At build time, we determine whether to use the checked in stencils or to +generate a new stencil for the user's platform. Specifically, for contributors +with LLVM installed, the ``build.py`` script in ``Tools/jit/stencils`` will allow +them to regenerate the stencil for their platform. Those without LLVM can rely +on the precompiled stencil files directly from the repository. + +Working on the JIT's implementation (or touching JIT files) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In continuous integration (CI), stencil files will be automatically validated and updated when changes +are made to JIT-related files. When a pull request is opened that touches these +files, the ``jit.yml`` workflow, which builds and tests our builds, will run as +usual. + +However, as part of this, we will introduce a new step that diffs the current +stencils in the repo against those generated in CI. If there is a diff for a +platform's stencil file, a patch file for the updated stencil is generated and +the step will fail. Each patch is uploaded to GitHub Actions. After CI is +finished running across all platforms, the patches are aggregated into a single +patch file for convenience. You can download this aggregated patch, apply it +locally, and commit the updated stencils back to your branch. Then, the +subsequent CI run will pass. + +Reference Implementation +======================== + +Key parts of the `reference implementation `__ include: + +- |CI|_: The CI workflow responsible for generating stencil patches. + +- |jit_stencils|_: The directory where stencils are stored. + +- |targets|_: The code to compile and parse the templates at build time. + +.. |CI| replace:: ``.github/workflows/jit.yml`` +.. _CI: https://github.com/python/cpython/blob/main/.github/workflows/jit.yml + +.. |jit_stencils| replace:: ``Tools/jit/stencils`` +.. _jit_stencils: https://github.com/python/cpython/blob/main/Tools/jit/stencils + +.. |targets| replace:: ``Tools/jit/_targets`` +.. _targets: https://github.com/python/cpython/blob/main/Tools/jit/_targets.py + +Ignoring the stencils themselves and any necessary JIT README changes, the +changes to the source code to support reproducible stencil generation and +hosting are minimal (around 150 lines of changes). + +Rejected Ideas +============== + +Several alternative approaches were considered as part of the research and +exploration for this PEP. However, the ideas below either involve +infrastructural cost, maintenance burden, or a worse overall developer +experience. + +Using Git submodules +-------------------- + +Git submodules are a poor developer experience for hosting stencils because they +create a different kind of undesirable friction. For instance, any +updates to the JIT would necessitate regenerating the stencils and committing +them to a separate repository. This introduces a convoluted process: you must +update the stencils in the submodule repository, commit those changes, and then +update the submodule reference in the main CPython repository. This disconnect +adds unnecessary complexity and overhead, making the process brittle and +error-prone for contributors and maintainers. + +Using Git subtrees +------------------ + +When using subtrees, the embedded repository becomes part of the main +repository, similar to what's being proposed in this PEP. However, subtrees +require additional tooling and steps for maintenance, which adds unnecessary +complexity to workflows. + +Hosting in a separate repository +-------------------------------- + +While splitting JIT stencils into a separate repository avoids the storage +overhead associated with hosting the stencils, it adds complexity to the build +process. Additional tooling would be required to fetch the stencils and +potentially create additional and unnecessary failure points in the workflow. +This separation also makes it harder to ensure consistency between the stencils +and the CPython source tree, as updates must be coordinated across the +repositories. Finally, this approach introduces an attack vector, as external +repositories are less integrated with our workflows, making provenance and +integrity harder to guarantee. + +Hosting in cloud storage +------------------------ + +Hosting stencils in cloud storage like S3 buckets or GitHub raw storage +introduces external dependencies, complicating offline development +workflows. Also, depending on the provider, this type of hosting comes with +additional cost, which we'd like to avoid. + +Using Git LFS +------------- + +Git Large File Storage (LFS) adds a tool dependency for contributors, +complicating the development workflow, especially for those who may not already +use Git LFS. Git LFS does not work well with offline workflows since files +managed by LFS require an internet connection to fetch when checking out +specific commits, which is disruptive for even basic Git workflows. Git LFS has +some free quota but there are `additional +costs `__. +for exceeding that quota which are also undesirable. + +Maintain the status quo with LLVM as a build-time dependency +------------------------------------------------------------ + +Retaining LLVM as a build-time dependency upholds the existing barriers to +adoption and contribution. Ultimately, this option fails to address the core +challenges of accessibility and simplicity, and fails to eliminate the +dependency which was deemed undesirable at the Python Core Developer Sprint in +the fall (the impetus for this PEP), making it a poor long-term solution. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.