Mike Edwards - Technical Director
14 Dec 2022
On a recent project, we were tasked with the challenge of translating Kontent.ai into 9 different languages for a website. English would be our base language and we needed to translate the content into 8 other languages. A few of the languages contain non-ASCII characters, e.g. diacritic and Cyrillic characters which was an added challenge.
Translation was going to be automated using a script and a third-party service. The script itself should be straightforward, just use the Kontent.ai management and execute the following steps:
However, things are rarely this simple and we quickly ran into a few things that we needed to account for:
This allowed the code to create a deep copy of the elements and break the in-memory references.
Since the titles of content were being translated, we wanted the slugs to also be translated to match the titles. We use the slugs to generate the URLs to each page and this would mean we would have language-specific URLs.
We need the script to generate the slugs and match the auto-generation rules that Kontent.ai use.
According to Kontent.ai’s documentation they use these two simple rules:
We wanted to be sure these rules are correct, so we went into Kontent.ai to actually check the real behaviour. For each test, we entered a title and then let the Slug auto-generate.
For English, everything worked as documented. The rules have been followed.
What happens if we try diacritic or Cyrillic characters?
Okay, this is interesting. This isn’t really what we were expecting. The diacritic and Cyrillic characters have been removed; this wasn’t in the documentation, ah.
After a quick conversation with Kontent.ai support, it turns out that Slugs can only support ASCII characters. This is a little disappointing since browsers can now support UTF characters (this is both good and bad, think phishing scams).
So back to our translation scripts and time to update them to solve this problem.
We found a couple of great libraries online to handle this problem for us:
A quick function and now we have a clean slug:
Initially, we planned to use Google translate to perform the translations for us. This made the most sense because the previous tool we used, provided by a third party, used it.
A bit of experimentation with Google Translate gave us some very weird results, and the API gave us different results to Google translate online.
A simple example was the word “Cart” as in shopping cart. Google would translate this to “Chariot” in French rather than “Panier”.
We decide to switch to Azure Cognitive Services which performs translation and were pleasantly surprised by how easy it was to set up and the quality of the translations.
Automation of translation is a quick way to translate a website and translation services are getting a lot better. The Kontent.ai management API made reading and writing data very simple, with the only caveat being the rate limit.
The translation services themselves are good but not always 100% and some translations may not make sense. You will have to decide if the small inaccuracies are worth the time and money saved by not using human translators.
If you are interested in automated translation of your website, and you have thousands of pages that need to be processed, please reach out to us at Konabos.
With 18 years of IT development experience, Mike has worked across government, not for profit, and commercial sectors. He has delivered large-scale multinational websites, desktop and mobile applications, and mission-critical health apps. He works closely with the client and delivery teams to ensure that projects deliver business benefits and not just a technical solution.
Mike is a nine-time Sitecore MVP and is the founder of the very popular Glass.Mapper.Sc ORM, which has over 1 million downloads.
Outside of work, Mike can be found exploring the British countryside, riding his motorbike, and learning the piano.