v2 migration guide

Our Machine Translation API is continuously improving. This release introduces two major enhancements:

Enhanced Security: We are adopting OpenID Connect (OIDC) and deprecating API keys to provide a more secure authentication method.
Improved Standardization: We are adopting the BCP 47 standard for language codes to ensure cross-product compatibility with other LanguageWire products.

Additionally, this release includes several other improvements:

Callback URLs for document translations are now optional, with a new polling mechanism available to check for finished translations.
The maximum character length for callback URLs has been reduced.
The API naming convention now uses camelCase instead of snake_case, which is most noticeable in the document translation endpoints.
Language detection is now handled by a dedicated endpoint.
The HTTP status codes for some errors have been updated. The API will return an error message that describes the type of error.
- Unsupported language pairs now return 422 Unprocessable Entity instead of 400 Bad Request.
- Documents that are too large now return 422 Unprocessable Entity instead of 413 Payload Too Large.
- Attempts to download unfinished or failed document translations now return 422 Unprocessable Entity instead of 400 Bad Request.

The following sections provide more details about these changes and include examples to help you migrate to the new version of the API.

Improving security by adopting OIDC

OpenID Connect (OIDC) is an identity and authentication protocol built on top of OAuth 2.0 that provides secure, standardized authentication for both user and machine-to-machine scenarios. For API integrations like ours, OIDC's Client Credentials flow enables secure application-to-application authentication without requiring user interaction. As an industry-standard protocol, OIDC is widely adopted by major technology companies and organizations worldwide due to its robust security features and proven reliability in enterprise environments.

OIDC enhances security through several key mechanisms:

Short-lived tokens: Access tokens have limited lifespans, reducing the risk of token misuse if compromised
Token-based authentication: Eliminates the need to store and transmit long-lived credentials
Standardized security practices: Built on proven OAuth 2.0 foundations with additional identity verification
JWT (JSON Web Tokens): Provides cryptographic signatures for token integrity and authenticity verification

Authentication

In order to comply with the highest security standards, we are improving our authentication layer and replacing API Keys with OpenID Connect (OIDC).

OIDC provides several security advantages over traditional API Keys:

Token expiration: Unlike API Keys that remain valid indefinitely until manually revoked, OIDC access tokens automatically expire after a short period, significantly reducing the window of opportunity if credentials are compromised.
Credential separation: Instead of a single long-lived API Key, OIDC uses a client ID and secret to obtain temporary tokens, allowing you to rotate secrets without disrupting active sessions.
Reduced exposure: API Keys are often stored in configuration files, logs, or transmitted in URLs where they can be accidentally exposed. OIDC tokens are only obtained when needed and have limited lifespans.

With OIDC, you won't permanently reuse an API Key but will have a client ID and a password that you can use to provision short lived tokens. This short-lived token is then sent with every API request as a header.

You can read more about OIDC and the new mechanism on our authentication guide. The Getting access page will guide you on how to generate your credentials.

The following examples contrast the v1 API key authentication with the new v2 OIDC token flow.

Before (v1 - API Key):

import requests

# Direct API call with API key
headers = {
    "Authorization": "<your-api-key-here>"
}

response = requests.post(
    "https://mt.api.languagewire.com/v1/translate",
    headers=headers,
    json={
        "source": "en",
        "target": "es",
        "segments": ["Hello world!"]
    }
)

After (v2 - OIDC):

import requests

# Step 1: Obtain access token using client credentials
token_response = requests.post(
    "https://idp.languagewire.com/realms/languagewire/protocol/openid-connect/token",
    data={
        "grant_type": "client_credentials",
        "client_id": "<your-client-id>",
        "client_secret": "<your-client-secret>",
    }
)
token_response.raise_for_status()
access_token = token_response.json()["access_token"]

# Step 2: Use the access token for API calls
headers = {
    "Authorization": f"Bearer {access_token}"
}

response = requests.post(
    "https://mt.api.languagewire.com/v2/translate",
    headers=headers,
    json={
        "source": "en-GB",
        "target": "es-ES",
        "segments": ["Hello world!"]
    }
)

Note

The request to acquire a token uses form-encoded data (application/x-www-form-urlencoded), not JSON.

After you acquire a token, store the token and reuse it for multiple API calls until it expires. There's no need to obtain a new token for each request.

Callback verification

Previously, callback verification relied on using the API key as a symmetric key for generating a signature. Adopting OIDC for callback verification provides several security improvements:

Standardized JWT validation: Built-in token expiration, issuer verification, and cryptographic signature validation.
Payload integrity: SHA-256 hash of the request body included in the JWT prevents tampering.
No shared secrets: Eliminates the need to store and manage shared secrets for signature verification.
Industry standard: Uses well-established OIDC/JWT protocols with extensive tooling support.

Callbacks will now adopt OIDC and add a JWT as an authorization header. The token can by validated against LanguageWire Identity Provider to validate its authenticity. The token will contain a signature field with a SHA-256 hash of the request body for integrity validation. See more information in our callbacks guide.

The following examples contrast the v1 callback signature with the new v2 OIDC token signature.

Before (v1 - Manual signature verification):

import hmac
import hashlib

def verify_callback_v1(request_body, signature_header, api_key):
    # Manual HMAC signature verification
    expected_signature = hmac.new(
        api_key.encode(),
        request_body.encode(),
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature_header, expected_signature)

# Usage in callback handler
if verify_callback_v1(request.body, request.headers['X-Signature'], API_KEY):
    # Process callback
    pass
else:
    # Reject callback
    pass

After (v2 - OIDC JWT verification):

import hashlib
from authlib.jose import JsonWebToken

def verify_callback_v2(request):
    auth_header = request.headers["Authorization"]
    if not auth_header.startswith("Bearer "):
        raise ValueError("Invalid authorization header")

    jwt_token = auth_header.removeprefix("Bearer ")

    # Retrieve public key from LanguageWire IdP
    public_key = (
        "-----BEGIN PUBLIC KEY-----\n"
        + get_languagewire_public_key()  # Fetch from IdP
        + "\n-----END PUBLIC KEY-----"
    )

    jwt = JsonWebToken(["RS256"])
    jwt_claims = jwt.decode(
        jwt_token,
        key=public_key,
        claims_options={
            "exp": {"essential": True},
            "iat": {"essential": True},
            "iss": {
                "essential": True,
                "value": "https://idp.languagewire.com/realms/languagewire",
            },
        },
    )
    jwt_claims.validate()

    # Verify payload integrity
    expected_signature = jwt_claims["signature"]
    actual_signature = hashlib.sha256(request.body).hexdigest()

    return actual_signature == expected_signature

# Usage in callback handler
if verify_callback_v2(request):
    # Process callback
    pass
else:
    # Reject callback
    pass

Improving standardization and cross-product compatibility by adopting BCP 47 language codes

IETF BCP 47 language tags is the Internet Best Current Practices (BCP) for language tags.

Most commonly, tags written are with 2 subtags - language and region. For example, en-US is composed by 2 subtags separated by the "-" character. The value "en" is the language subtag for English and the value "US" is the region subtag for the United States. Therefore, the language tag "en-US" represents US English.

However, language tags can also include additional subtags for variants and scripts. For example,: ca-ES-valencia represents Valencian (a Catalonian variant from Spain as spoken in Valencia).

The entire product suite of LanguageWire adheres to the BCP 47 standard, allowing you to store translation memory (TM) and termbase data specific to different language variants, and to have translations tailored for a particular region/country.

Until now, our Machine Translation API only relied the language subtag (e.g., en) and ignored regional variants (with a few exceptions). This meant that when using features like TM enhancement or AI terminology, the API had to guess which variant to use, potentially leading to less accurate or contextually inappropriate translations.

By adopting BCP 47, you can now specify the exact language variant (e.g., en-US vs. en-GB), ensuring that the correct TM and terminology are applied. This change is crucial for:

TM enhancement: Selecting the right translation memory for your specific regional audience.
AI terminology: Applying the correct, variant-specific terms for your brand and industry.

So, we are moving all our endpoints to adopt BCP 47 codes.

Languages endpoint

Currently, the endpoint returns a short list of languages without language variants. The new version will return a longer list including all the supported variants for each language.

Before:

[
  { "source": "en", "target": "de" },
  ...
]

After:

[
  { "source": "en-GB", "target": "de-DE" },
  { "source": "en-GB", "target": "de-CH" },
  { "source": "en-US", "target": "de-DE" },
  { "source": "en-US", "target": "de-CH" },
  ...
]

The full list of affected language codes and the supported variants:

Old code	New BCP 47 Codes
Arabic (ar)	ar-AE, ar-BH, ar-DZ, ar-EG, ar-JO, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-QA, ar-SA, ar-SD, ar-SY, ar-TN, ar-YE
German (de)	de-AT, de-CH, de-DE, de-LI, de-LU
Greek (el)	el, el-CY
English (en)	en-029, en-AU, en-BZ, en-CA, en-GB, en-GY, en-HK, en-IE, en-IN, en-JM, en-NZ, en-TT, en-US, en-ZA
Spanish (es)	es-419, es-AR, es-BO, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GT, es-HN, es-MX, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-US, es-UY, es-VE
French (fr)	fr-BE, fr-CA, fr-CH, fr-FR, fr-LU
Italian (it)	it-CH, it-IT
Japanese (ja)	ja, ja-Kana
Dutch (nl)	nl-BE, nl-CW, nl-NL, nl-SR
Portuguese (pt)	pt-AO, pt-PT
Swedish (sv)	sv-FI, sv-SE
Chinese Simplified (zh-Hans)	zh-Hans-CN, zh-Hans-SG, zh-yue-Hans
Chinese Traditional (zh-Hant)	zh-Hant-HK, zh-Hant-TW

The following language codes are not affected:

Bulgarian (bg)
Czech (cs)
Danish (da)
Estonian (et)
Finnish (fi)
Croatian (hr)
Hungarian (hu)
Indonesian (id)
Korean (ko)
Lithuanian (lt)
Latvian (lv)
Norwegian Bokmål (nb)
Polish (pl)
Portuguese Brazil (pt-BR)
Romanian (ro)
Russian (ru)
Slovak (sk)
Slovenian (sl)
Serbian Cyrillic (sr-Cyrl)
Serbian Latin (sr-Latn)
Thai (th)
Turkish (tr)
Ukrainian (uk)
Vietnamese (vi)

Separate endpoints for text and document translation languages

We now provide two separate endpoints for retrieving supported language pairs:

/v2/languages/text-translation - Returns language pairs supported for text translation
/v2/languages/document-translation - Returns language pairs supported for document translation

This separation is necessary because text translation and document translation may support different language combinations. Document translation involves more systems to offer its features, reducing which language variants can be used.

The following examples contrast the v1 languages codes with the new v2 language codes.

Before (v1 - Single languages endpoint):

curl -H "Authorization: <your-api-key>" \
  https://mt.api.languagewire.com/v1/languages

Response: Single list used for both text and document translation

[
  {"source": "en", "target": "de"},
  {"source": "en", "target": "fr"},
  ...
]

After (v2 - Separate endpoints):

Get language pairs for text translation

curl -H "Authorization: Bearer <access-token>" \
  https://mt.api.languagewire.com/v2/languages/text-translation

Response: Language pairs specifically supported for text translation

[
  {"source": "en-GB", "target": "de-DE"},
  {"source": "en-US", "target": "fr-FR"},
  {"source": "da", "target": "en-GB"},
  ...
]

Get language pairs for document translation

curl -H "Authorization: Bearer <access-token>" \
  https://mt.api.languagewire.com/v2/languages/document-translation

Response: Language pairs specifically supported for document translation (may be a subset of text translation pairs)

[
  {"source": "en-GB", "target": "de-DE"},
  {"source": "en-US", "target": "fr-FR"},
  ...
]

Always check the appropriate endpoint before starting a translation to ensure your desired language pair is supported for your specific use case.

Text translation endpoint

We removed the support for sending requests in the format { "source": "...", "target": "...", "text": "Hello from LanguageWire!" }. Instead, you should use the following schema: { "source": "...", "target": "...", "segments": [{ "text": "Hello from LanguageWire!" }] }. Notice that the text field was replaced by segments and it's now a list of objects instead of a single string.

This change allows you to send multiple pieces of text (segments) in a single API call. Consequently, the response format has also been updated to return a list of translations, with each item in the list corresponding to a segment from your request.

Each segment can contain at most 10,000 characters. A maximum of 100 segments is allowed to be translated at once. The cumulative length of the text in all segments must not exceed 10,000 characters. A segment ideally contains a single sentence, but you can send multiple sentences as a single segment too.

Also, the language codes provided as parameters will need to match codes from the languages endpoint, i.e., include the language variant. If before you were sending a request with source set to "en" and target set to "de", now you have to pick a variant (as per the table above), e.g. "en-GB" and "de-DE".

Here is a complete example of the change using curl:

Before (v1 - single text field):

curl -X POST "https://mt.api.languagewire.com/v1/translate" \
  -H "Authorization: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "en",
    "target": "de",
    "text": "Hello from LanguageWire!"
  }'

Response:

{
  "translation": "Hallo von LanguageWire!"
}

After (v2 - segments list):

curl -X POST "https://mt.api.languagewire.com/v2/translate" \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "en-GB",
    "target": "de-DE",
    "segments": [
      {"text": "Hello from LanguageWire!"},
      {"text": "This is a second segment."}
    ]
  }'

Response:

{
  "translations": [
    {
      "translation": "Hallo von LanguageWire!"
    },
    {
      "translation": "Dies ist ein zweites Segment."
    }
  ]
}

Finally, language detection is moving to a new endpoint. See "Language detection endpoint" for more details.

Document translation endpoints

The document translation workflow has been updated to align with the new BCP 47 language codes and to provide more flexibility in how you retrieve translation results. Here are the key changes:

BCP 47 Language Codes: You must now use BCP 47 language codes (e.g., en-GB) for the source and target parameters.
Optional Callbacks: Callback URLs are no longer mandatory. You can use the new status endpoint to poll for the result. The maximum length for callback URLs has also been reduced from 65,536 to 256 characters.
Polling for Status: A new endpoint, GET /v2/documents/{jobId}, is available to check the status of a translation job.
New Download URL: The endpoint for downloading the translated document has been moved to GET /v2/documents/{jobId}/download (instead of GET /v1/documents/download/{jobId}).

Here is an example of how to submit a document for translation:

Before (v1):

curl -X POST "https://mt.api.languagewire.com/v1/documents/translate" \
  -H "Authorization: <your-api-key>" \
  -H "Content-Type: multipart/form-data" \
  -F "source=en" \
  -F "target=de" \
  -F "file=@/path/to/your/document.docx" \
  -F "successCallbackUrl=https://example.com/success" \
  -F "errorCallbackUrl=https://example.com/error"

After (v2):

In v2, you must use BCP 47 language codes (e.g., en-GB instead of en). Callback URLs are now optional.

curl -X POST "https://mt.api.languagewire.com/v2/documents/translate" \
  -H "Authorization: Bearer <your-access-token>" \
  -H "Content-Type: multipart/form-data" \
  -F "source=en-GB" \
  -F "target=de-DE" \
  -F "file=@/path/to/your/document.docx"

Retrieving the translated document

Once you have submitted a document for translation, you will receive a jobId. You can use this ID to retrieve your translated document.

There are two ways to do this:

1. Using callbacks (Recommended)

The most efficient method is to provide a successCallbackUrl and/or an errorCallbackUrl when you submit the document. Our system will send a POST request to the appropriate URL when the job is complete, so you don't need to poll for status.

2. Using polling

If you cannot use callbacks, you can periodically check the translation status by polling the GET /v2/documents/{jobId} endpoint. Please note that this method will consume your rate limit quota. We recommend a polling interval of at least 30 seconds.

The status of the job will be one of the following:

CREATED: The translation job is in the queue.
IN_PROGRESS: The translation is being processed.
SUCCESS: The translation was successful.
FAILED: The translation failed.

Here is a Python example of how to poll for the result:

import requests
import time

# Assume 'access_token' is already obtained and valid.
headers = {"Authorization": f"Bearer {access_token}"}

# Step 1: Submit the document for translation (as shown in the curl example)
# and get the jobId from the response.
# For this example, let's assume we got a jobId.
job_id = "<your-job-id>"

while True:
    # Step 2: Poll the status endpoint
    status_response = requests.get(
        f"https://mt.api.languagewire.com/v2/documents/{job_id}",
        headers=headers
    )
    status_response.raise_for_status()
    status_data = status_response.json()
    status = status_data["status"]

    print(f"Job {job_id} status: {status}")

    if status == "SUCCESS":
        # Step 3: Download the translated document
        download_response = requests.get(
            f"https://mt.api.languagewire.com/v2/documents/{job_id}/download",
            headers=headers,
            stream=True
        )
        download_response.raise_for_status()

        # Save the translated file
        with open("translated_document.docx", "wb") as f:
            for chunk in download_response.iter_content(chunk_size=8192):
                f.write(chunk)

        print("Translated document downloaded successfully.")
        break
    elif status == "FAILED":
        print(f"Translation failed. Reason: {status_data['reason']}")
        break
    elif status in ("CREATED", "IN_PROGRESS"):
        # Wait before polling again
        time.sleep(30)
    else:
        print(f"Unknown status: {status}")
        break

Language detection endpoint

This new endpoint is added to replace the functionality previously existing in the Text translation endpoint.

For example, sending the following data {"text": "Hello from LanguageWire!"} will request a language detection of the text "Hello from LanguageWire!". This request would return a response similar to {"code": "en-GB"}.

When the provided text contains multiple languages, the endpoint will identify and return the predominant one.

The returned language code will always be a specific BCP 47 variant. For languages with multiple supported variants, the API will return a default one (e.g., en-GB for English, de-DE for German). If this default variant does not match your needs (e.g., you require en-US), you will need to map it before using it in a translation request.

The following list shows the default variants for languages with multiple options:

Arabic (ar): ar-EG
German (de): de-DE
Greek (el): el
English (en): en-GB
Spanish (es): es-ES
French (fr): fr-FR
Italian (it): it-IT
Japanese (ja): ja
Dutch (nl): nl-NL
Portuguese (pt): pt-PT
Swedish (sv): sv-SE
Chinese Simplified (zh-Hans): zh-Hans-CN
Chinese Traditional (zh-Hant): zh-Hant-TW

If the code field in the response is null, it means that no supported language could be detected. In some cases, a language might be detected but is not supported for translation. You should always validate the detected language code against the list of supported languages for your use case.

You can find the new endpoint by following this link.

Before (v1 - Language detection via text translation endpoint):

import requests

# Language detection was done through the translate endpoint with source=null
response = requests.post(
    "https://mt.api.languagewire.com/v1/translate",
    headers={"Authorization": "<your-api-key>"},
    json={
        "source": None,  # or omit entirely
        "target": "de",
        "segments": [
            "Hello from LanguageWire!",
            "This is a second segment."
        ]
    }
)
response.raise_for_status()

# Response included both translation and detected language
# [
#   {
#     "translation": "Hallo von LanguageWire!",
#     "detected_source_language": "en"
#   },
#   {
#     "translation": "Dies ist ein zweites Segment.",
#     "detected_source_language": "en"
#   }
# ]
detected_language = response.json()[0]["detected_source_language"]

After (v2 - Dedicated language detection endpoint):

import requests

# Assume 'access_token' is already obtained and valid.
headers = {"Authorization": f"Bearer {access_token}"}

# Step 1: Use dedicated language detection endpoint
response = requests.post(
    "https://mt.api.languagewire.com/v2/languages/detect",
    headers=headers,
    json={"text": "Hello from LanguageWire! This is a second segment."}
)
response.raise_for_status()

# Response contains only the detected language code
# {"code": "en-GB"}
detected_language = response.json()["code"]

# Step 2: Use the detected language for translation if needed
if detected_language:
    translation_response = requests.post(
        "https://mt.api.languagewire.com/v2/translate",
        headers=headers,
        json={
            "source": detected_language,
            "target": "de-DE",
            "segments": [
                {"text": "Hello from LanguageWire!"},
                {"text": "This is a second segment."}
            ]
        }
    )

Improving security by adopting OIDC​

Authentication​

Callback verification​

Improving standardization and cross-product compatibility by adopting BCP 47 language codes​

Languages endpoint​

Separate endpoints for text and document translation languages​

Text translation endpoint​

Document translation endpoints​

Retrieving the translated document​

Language detection endpoint​