Skip to content

Conversation

@yheuhtozr
Copy link

As far as I read the current implementation, the software technically processes hyphened language tags and does legally emit ISO 639-3 as well as ISO 639-1 codes. That means even though Mastodon itself is not ready to support the most of advanced language tags, it has no problem inputting/outputting those codes in API. Thus we can:

  • remove some outdated descriptions limiting to "ISO 639-1 two-letter code", which no more applies
  • accept well-formed BCP 47 language tags, with notes that Mastodon probably ignore additional information

mastodon/mastodon#19302 is a corroboration that a language tag doesn't break the system.

The rationale of this change is discussed in mastodon/mastodon#23541.

Closes mastodon/mastodon#23541.

  • Note: "language subtag" in BCP 47 ≈ ISO 639-1 ∪ ISO 639-3

@vercel
Copy link

vercel bot commented Mar 29, 2023

@yheuhtozr is attempting to deploy a commit to the Mastodon Team on Vercel.

A member of the Team first needs to authorize it.

@nikclayton
Copy link
Contributor

nikclayton commented May 31, 2023

+1 to the general idea that BCP47 language tags are the way to go.

-1 to this specific change, as I think it's a backwards incompatible change (Mastodon servers could now emit a string field that is > 2 characters long, where previously they were documented as only emitting a 2 character string).

A backwards compatible change would be to accept / emit a new language_code field, where the value is an object with type and code fields. I.e.,

"language_code": {
    "type": "...",
    "code": "..." 
},

Valid initial values for type would be iso639-1, iso639-2 (3 letter codes), and bcp47.

So:

"language_code": {
    "type": "iso639-1",
    "code": "en"
},

is equivalent to the current:

"language": "en",

If this field exists then the contents of the language field would be ignored.


Edit to note: The above is an example, not something I've spent a serious amount of design thought on.

@yheuhtozr
Copy link
Author

A backwards compatible change would be to accept / emit a new language_code field, where the value is an object with type and code fields.

@nikclayton Hi, that will be totally fine with me, too. Just for sure (since I'm a stranger), I think it entails additional logic in the Mastodon code, but do you think that will be more feasible?

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accept BCP 47 language tags in the statuses API

3 participants