This document describes the DSL-based files with an extension .iml or .imp.
An .imp file is a file containing a standalone transliteration map. For
instance, a map that can transliterate a Korean file to a Latin file.
An .iml file is a file that contains a library of aliases and stages to be
used by the .imp maps. It follows the same format, but does not require the
metadata and tests parts to exist and doesn’t allow the main stage to exist.
This document describes the map version of the format if it isn’t noted
otherwise.
A # character is a comment character. This means, that the part that follows
a # character till the end of the line is ignored by Interscript, but exist to
communicate to a human reader the intention behind the content. In this document
it is most often a hint to a person reading this document.
A String is a part of the document of a form either: "content" or 'content'.
It denotes a group of characters to be used. It can be joined together using a
+ character like so: "a" + "b" which is equal to as if someone wrote just
"ab".
Except for the strings of the form 'content', all those forms can contain
escape forms like \u0410, which means "An Unicode character 0410". The usage
of those forms is discouraged in new maps, but possible.
An array (or a list) is a part of the document of a form ["a", "b", "c"]. It
means a sequential group of Strings, or other types.
The root part of the .iml file is called a document. A map has a format as
follows:
metadata {
# Metadata part comes here
}
tests {
# Tests part comes here
}
# A dependency directive may happen zero or more times. It will be described in
# a subsection.
dependency "other-map-or-library", as: shortname
# This part is optional
aliases {
# Aliases part comes here
}
stage {
# A stage description comes here
}
# There may be more than 1 stage, the other stages need to have a name. The
# default stage name is `main`. A name can't happen more than once in a document.
stage(stage_name) {
# A stage description comes here
}Dependency is an instruction to be issued only in the document context. It means that we want to import some aliases or stages from another map or a library.
dependency "other-map-or-library", as: shortnameThis instruction will allow us to reference aliases and stages from other
libraries in this form: map.shortname.stage.stagename for stages and
map.shortname.aliasname for aliases.
There is a second syntax, mostly useful for loading libraries that will import the stages and aliases to a global context resulting in possibly more human readable maps:
dependency "other-map-or-library", as: shortname, import: trueThis form allows to reference other stages and aliases in the following form:
stage.stagename, aliasname
It is not possible to load maps using this form, only libraries, because we
can’t override the main stage.
The standard library is implicitly imported this way. There’s no way or need to import it explicitly.
All maps depend on a standard library implicitly. This standard library defines a few useful aliases that may or may not be expressed otherwise.
Below is a table that describes the aliases defined by the standard library:
|
An empty string |
|
A space character |
|
Any whitespace ascii character (space, tab, line-delimiter, …) |
|
A word boundary (see below for what institutes a word character) |
|
An ascii word character (a-z, A-Z, 0-9, _) |
|
Negation of the above |
|
Any ascii alphabetic character (a-z, A-Z) |
|
Negation of the above |
|
Any ascii digit |
|
Negation of the above |
|
Beginning of a line |
|
Ending of a line |
|
Beginning of a string |
|
Ending of a string |
Any standard library (or otherwise) aliases can be joined with anything else
using a + command, for example: line_start + "rest".
The metadata part describes our map. It follows a YAML syntax.
metadata {
# ID of the authority that provided the transliteration rules we are about to implement
authority_id: iso
# ID of the rules, most often the year they were defined
id: 1996-method1
# The language code of the map
language: iso-639-2:kor
# The source script of our map, in our example Hang for Hangul
source_script: Hang
# The destination script of our map
destination_script: Latn
# The longer name of our map
name: ISO/TR 11941:1996 Information and documentation — Transliteration of Korean script into Latin characters
# The URL where it was published
url: https://www.iso.org/standard/20564.html
# The creation date of our map
creation_date: 1996
# The adoption date of our map, or empty if not adopted
adoption_date: ""
# The description of our map
description: |
Establishes a system for the transliteration of the characters of Korean script into Latin characters.
Intended to provide a means for international communication of written documents.
# The notes that describe some parts of our map that we are about to implement
notes:
- A word-initial hard sign 'ъ' is not represented, but instead is left out of the transliteration.
- The romanization follows the dialect spoken in Chechnya rather than other local pronunciations.
}The tests part describes a group of the tests to be executed by the automated system to verify that the map is defined properly. An example tests part looks like this:
tests {
test "애기", "aeki"
test "방", "pang"
}This means, that we want to test our map to transliterate a string "애기" to "aeki" and "방" to "pang".
An aliases part describes a group of aliases to be used by the stages to simplify the code of our map.
Let’s suppose that our map refers to "Double consonant jamo" and "Aspirated consonant jamo" quite extensively. We can alias those
aliases {
def_alias double_cons_jamo, any("ᄁᄄᄈᄍᄊ")
def_alias aspirated_cons_jamo, any("ᄏᄐᄑᄎ")
}And later in the stage part refer to them by just double_cons_jamo, not
needing to repeat ourselves.
A stage part describes a stage, a sequential group of steps to transliterate a string from a source script code to a destination script code. An example stage looks like the following:
stage {
run map.hangjamo.stage.main
sub any("ᄀᆨ"), "k"
sub any("ᄏᆿ"), "kh"
parallel {
sub "ᅡ", "a"
sub "ᅥ", "eo"
}
}A stage can be named, as described in the Document section. The default name
of a stage is main.
A sub call does a substitution of an item (string, character, alias) with
another item.
stage {
sub "source", "destination"
}This call allows for some named parameters:
|
Execute this substitution only if the "source" is preceded by what is given as a parameter, but won’t replace it, it will only replace the "source". |
|
Same, but this parameter denotes what is used after. |
|
Negation of |
For example:
stage {
sub boundary + "Е", "Ye", not_before: "’"
sub boundary + "е", "ye", not_before: "’"
sub none, "'", not_before: hangul, after: aspirated_cons
}In various maps there was a need to document multiple replacements. Let’s suppose our character set has a character "a" that can be transliterated to any of the forms "X", "Y" or "Z". As of now, it means that "a" is always translated to "X", as it came first. In the future it will be possible to execute such a map in reverse as well.
stage {
sub "a", any("XYZ")
}A parallel block can be defined as a subsection of a stage part. It indicates
that the steps inside need to be executed in parallel. At the current time, only
sub calls can be executed in parallel. It also means, that those steps will try
to find the longest substrings first.
stage {
parallel {
sub "А", 'A'
sub "Б", 'B'
sub "В", 'V'
sub "Г", 'G'
}
}The run call runs a stage defined inside the document, or another map or
library. If this map isn’t local, a map or library dependency needs to be
declared using the dependency call.
For example:
stage {
# If dependency declared without import: true
run map.hangjamo.stage.main
# If dependency declated with import: true, or we reference a local stage
run stage.remove_spaces
}There are certain conversions that may be hard to be achieved using stages, those are implemented in respective standard libraries using programming languages.
For a function named title_case, it can be called with the following:
stage {
title_case
}A standard library function can take (named) arguments. Those are described in the table below and they may be omitted if a default value is specified.
| Function name | Arguments | Sample input | Sample output |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Consult: Usage with Secryst. |
|
Interscript doesn’t work purely on Strings, even though Strings are mostly
referenced to by this document. The items can be used in the alias and stage
context.
The most basic kind of item. For example "Г" means "match Г" or "replace
with Г" depending on usage context. Some contexts will only accept strings, or
aliases to strings.
Items can be concatenated (added together) to denote a complex item. For instance:
any("ab") + "e" means "either ae or be" and is equivalent to any(["ae", "be"]).
Any denotes some alternative variations of a string. It has 3 forms of call:
-
any("abcde")- any character: a, b, c, d or e -
any(["one", "two"])- any string: one or two -
any("a".."z")- any character from a to z
Any can be also used with other kinds of items than String, for instance:
stage {
sub any([line_start + "a", "a" + line_end]), none
}If you want a given item to be allowed to be repeated respectively: 0 to 1 times,
1 to Infinity times, 0 to Infinity times, you can surround it with respectively:
maybe(), some(), maybe_some().
stage {
sub "a"+maybe("-")+"b", "AB" # Equivalent to regexp: a-?b
sub "a"+some("-")+"b", "AB" # Equivalent to regexp: a-+b
sub "a"+maybe_some("-")+"b", "AB" # Equivalent to regexp: a-*b
}An alias item references an alias. For example map.other_map.alias_from_other_map
or simply a_local_alias_or_an_alias_from_imported_library.
Sometimes there may be a need to reference a group from input inside output (or
input too). People who know regular expressions are familiar with expressions of
some form of replace /(a)/, '[\1]'. Interscript supports this kind of syntax:
stage {
sub capture(any("abc")), "["+ref(1)+"]"
}When ran against a string "abcde", this stage will produce an output of
"[a][b][c]de".