Internationalization and Localization with Django (-model-translations) @Instawork

Published in

Instawork Engineering

5 min readJul 12, 2023

At Instawork, we want to create economic opportunity for all workers, and not just those who speak English. To do so, language must not be a barrier; localization of the app to support the 14% Spanish speaking population in the United states is a key first step. However, Spanish is but one of many languages spoken across the globe; to truly achieve our objective, we need not one-off efforts, but instead a scalable process to localize the app for many regions and languages.

Here, we’ll go over our Internationalization journey, from our first shot at it with Django’s inbuilt Internationalization (i18n) and Localization (i10n) tools, and how we overcame it’s limitations to get comprehensive i18n coverage across the app.

Django’s native i18n features got us through the first half: Strings in the codebase, whether in Django template or Python files, can be marked for translation. In templates, a {% trans %} tag indicates that the strings within require translation, while in Python, a gettext does the same to a passed string.

# demo.html 
... 
{% trans %}I need to be translated{% endtrans %}

# demo.py 
class SomeView(View):   
def get_context(self, request):     
  return {       
    "message": gettext("I also need to be translated")     
  }

Django’s inbuilt makemessages command then collates them into GNU gettext's human readable .po file.

# /en/django.po
...
msgctxt "Go to admin home page"
msgid "Home"
msgstr ""

# /es/django.po
...
msgctxt "Go to admin home page"
msgid "Home"
msgstr "Inicio"

Knowing what we need to translate is half the problem, actually translating the many strings is something else entirely. To this, we turned to Locize, a localization management platform that provides us with access to translation-as-a-service. A daily task on CI performs a bidirectional sync with Locize, uploading newly found strings for translation, while updating the .po file with fresh translations.

Translations were originally done with contracted translators, though we’ve since migrated to a combination of machine translation using DeepL and internal proofreading by native speakers.

The resultant .po file is then automatically checked into Git, letting us version control the translation state of the app. All that’s left is to compile the translations into the binary .bo. As a bonus, thanks to Hyperview, our mobile app and web app share a backend, allowing our translation efforts to be shared across both.

Problem: Untranslated strings in the DB

At this point, we can mark strings for translations, get them translated and it works on both web and mobile. All good, right? Unfortunately not. Django’s included i10n/i18n solution is static, translating only designated strings in Django templates and Python files. Strings within the database were left untranslated, resulting in UI incongruence with translated (static) strings presented alongside untranslated (dynamic) strings.

Solution

To handle these untranslated DB strings, we used the Django Modeltranslation library as a framework for Model field localization. We then integrated it with Locize to obtain and synchronize translations.

Django Modeltranslation

Setting up django-modeltranslation is straightforward - by dropping a configured translation.py file into an app’s root directory, models can be registered for translation. For example, the following configuration registers SomeModel 's label and body fields for translation, and a follow-up migration modifies the database, creating corresponding _$LANG columns for each supported language for the specified fields. Field access will then be locale-sensitive, preferring the translated field, if present.

# translation.py
@register(SomeModel)
class SomeModelTranslationOptions(TranslationOptions):
    fields = ("label", "body")
    required_languages = ["en"]

Custom Locize Integration

Unfortunately, while django-modeltranslation handles the app localization, it currently does not come with tooling to help get the translations necessary for localization. To get translations into the app, we automated the following process:

Export every string to be translated
Upload the translation candidates to Locize
Synchronize the data back to the DB and update the corresponding columns with the new translations

We decided to reuse the .po file format to keep things consistent with the default Django flow and to allow reuse of the existing Locize integrations. We thus built a custom Django makedbmessages.py command to extract and dump DB-extracted translation strings to a .po file. Strings were associated with their source models using the following key format: f{app}_{model}_{field}_{pk}". Deduplication was performed on their raw values, and the aggregated model keys stored in the .po file’s #: reference field. Following this, the .po file could then be serialized and sent for translation using the same process described earlier.

# Reference field
#: app.SomeModel.field:1 app.SomeOtherModel.field:2
msgid "Foo"
msgstr "Translated Foo"

Completed translations were later retrieved in the form of a simple key-value mapping JSON. Keys were the raw English text, and values the translations for each respective language. To avoid losing existing translations to invalid/null entries, we performed an append only update of the .po files instead a full sync. Here, msgstr fields were updated only if the corresponding msgid was present in the existing file.

Translated .po file in hand, we then persisted the change to DB using a custom compiledbmessages admin command. App by app, we retrieved the pool of models requiring translation. To minimize DB access, translations were first aggregated at a model level, and a bulk_update performed to replace the translations.

The above was all orchestrated as a nightly ECS Task, ensuring that translations would always be updated with minimal further effort on our end. If anything went wrong, automated alerts on Slack would then point a dev at the problem till it went away.

Conclusion

At time of writing, these changes have almost doubled the size of our translation corpus, adding almost 28000 individual words across ~3000 entries. This expansion of our i10n/i18n solution represents a sizable improvement in the user experience and reach of the Instawork application. As the addition of additional languages now has minimal marginal development cost, we expect this to pay increasing dividends as we continue to grow.