Skip to main content

Manage your text

You can use our MT API to get both documents and text translated. In the case of handling text translations, you should add your text directly to the POST /v2/translate call to get it translated back.
However, there are some restrictions and recommendations that you should take into account to avoid error responses.

Size

  • The system has a maximum limit set at 10,000 characters.
  • If this amount is exceeded, the tool will return an error message.

Segments

A segment ideally contains a single sentence, but you can send multiple sentences as a single segment too. But in any case, the cumulative length of the text in all segments must not exceed 10,000 characters.

Tags

The to-be-translated text may include XML tags, and if this is the case, you should know that our system does not translate inline markup as part of the text, as this is not part of the linguistic meaning. Instead, they are extracted and reinserted back based on the word alignment.

Word alignment

Apart from the translated text, the neural machine translation model we use in our MT API produces a matrix called word alignment, which expresses the most likely correspondence between words in the input and output segments.
This is the logic followed for the tags:

  • After extracting the tags from the source text, everything else is text to be translated.
  • Reinsertion of a tag in the output is based on the alignment between words in the input and the translation.

Example of HTML snippet and reinsertion of the tags based on the word alignment, for the model en-de:
Input: Source text (in English) with tags:

The notion of tags is explained here

After extracting the tags, this is the input to the neural model:

… The notion of tags is explained here

Output: A possible translation (in German) returned by the model:

… Der Begriff der Tags wird hier erklärt

And after using word alignment in tags reinsertion:

Der Begriff derTags wird hier erklärt

Tags general support

Tags are handled based on the general patterns: for the opening for the closing for the standalone

XML compliance

For the tags used, XML compliance is expected in the following respects:

  • Tag name: The sequence of allowed characters immediately following the initial ‘<’ or ‘</’, possibly followed by a space and property strings before the final ‘>’ or ‘/<’
  • Tag correspondence: Opening and closing tags with the same tag name are considered a pair.
  • Tag ordering: Paired tags should have the opening tag first, followed by the closing tag. If either matching tags are not present in the input, the opening/closing tag is assumed to be situated in a preceding/following segment.
  • Tag embedding: The first of two opening tags should close:
    • either before the second opening tag,
    • or after the corresponding closing tag of the second one.
Note
  • Insertion of paired open and close tags in the translation preserves XML compliant order.
  • Non-compliant paired tags will work, but provide no guarantees in terms of XML compliant output.

Conversely, if the input special characters are not encoded, they will be reinserted as-is (unencoded) in the final translation, if not translated.