Skip to content

Commit 9de2962

Browse files
emmatypingemmatyping-nv
authored andcommitted
PEP 819: Add information about duplicate keys and integer parsing
Update the PEP to clarify that integers and floats should be serialized to strings, and specify that when there are duplicate keys the second key wins in a JSON object.
1 parent 67384c4 commit 9de2962

1 file changed

Lines changed: 41 additions & 19 deletions

File tree

peps/pep-0819.rst

Lines changed: 41 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,40 @@ JSON schema for wheel metadata has been produced.
235235
This schema will be updated with each revision to the wheel metadata
236236
specification. The schema is available in :ref:`0819-wheel-json-schema`.
237237

238+
Handling of Integer and Float Values in JSON Package Metadata
239+
-------------------------------------------------------------
240+
241+
While no core metadata or wheel metadata values are currently encoded as
242+
integers or floats, when decoding a JSON file, integer and float values should
243+
be decoded as strings for both core metadata and wheel metadata. This is to
244+
avoid compatibility issues due to differences in precision and representation
245+
of integers and floats between languages and parsers. This also mitigates a
246+
security risk with integer parsing denial of service attacks based on
247+
`CVE-2020-10735 <https://github.com/advisories/GHSA-6jr7-xr67-mgxw>`__.
248+
249+
If a future field of core metadata or wheel metadata needs to be encoded as an
250+
integer or float, the field MUST be decoded lazily after loading the JSON
251+
document. This minimizes the risks of denial of service attacks by minimizing
252+
the integer parsing allowed during the deserialization process.
253+
254+
If using the Python :mod:`!json` module, parsing integers and floats as strings
255+
can be accomplished by setting the ``parse_int`` and ``parse_float``
256+
keyword arguments to :func:`json.load` or :func:`json.loads` to :class:`str`.
257+
258+
Handling of Duplicate Keys in JSON Package Metadata
259+
---------------------------------------------------
260+
261+
JSON does not define semantics for duplicate keys in a JSON document. However,
262+
different parsers treat duplicate keys differently. Tools SHOULD NOT generate
263+
duplicate keys in JSON package metadata. However, it is likely duplicate keys
264+
may be generated anyway, so tools consuming JSON package metadata should handle
265+
duplicate keys gracefully. In the interest of compatibility and matching the
266+
behavior of the Python :mod:`!json` module, if duplicate keys are encountered,
267+
the second duplicate key should be used as the data for that key. This matches
268+
the behavior of many JSON parsers such as those in Python, Rust, Go, and the
269+
ECMAScript Standard. Tools MAY warn about duplicate keys in JSON package
270+
metadata.
271+
238272
Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files
239273
------------------------------------------------------------------
240274

@@ -272,25 +306,13 @@ or ``WHEEL`` files.
272306
Security Implications
273307
=====================
274308

275-
One attack vector with JSON encoded core metadata is if the JSON payload is
276-
designed to consume excessive memory or CPU resources in a denial of service
277-
(DoS) attack. While this attack is not likely to affect users whom can cancel
278-
resource-intensive interactive operations, it may be an issue for package
279-
indexes.
280-
281-
There are several mitigations that can be made to prevent this:
282-
283-
#. The length of the JSON payload can be restricted to a reasonable size.
284-
#. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int`
285-
and :class:`float` values to avoid quadratic number parsing time complexity
286-
attacks.
287-
#. I plan to contribute a change to :class:`~json.JSONDecoder` in Python
288-
3.15+ that will allow it to be configured to restrict the nesting of JSON
289-
payloads to a reasonable depth. Core metadata currently has a maximum depth
290-
of 2 to encode mapping and list fields.
291-
292-
With these mitigations in place, concerns about denial of service attacks with
293-
JSON encoded core metadata are minimal.
309+
JSON encoded core metadata and wheel metadata have the potential for a denial
310+
of service attack due to the quadratic parsing time complexity of parsing of
311+
integers. This PEP mitigates this risk by requiring that integers and floats be
312+
parsed as strings, and only lazily parsed into integers or floats after the
313+
initial deserialization of the JSON document. With these mitigations in place,
314+
concerns about denial of service attacks with JSON encoded package metadata are
315+
considered minimal.
294316

295317

296318
Reference Implementation

0 commit comments

Comments
 (0)