Internationalizing Strava

Internationalizing Strava  
Hi Hi, my name is Julien and I’m an engineering manager at Strava, where I’ve been working for a little over 2 years. I was previously working at Turn, an ad tech company, and before that at Google for about 6 years. At Google, I was part of the team in charge of building the localization tooling and then moved on to work on APIs. At Strava, I’ve been working on internationalization and more recently the revamp of our e-commerce platform.
Strava In case you are not familiar with Strava, we’re a digital community of athletes, focused on runners and cyclists. We have a very confidential number of registered and active users, most of which live outside of the U.S.
Strava's technical stack In terms of software stack, Strava is relatively typical of companies that were created back in 2009, with Rails as the go-to web frontend. We’ve progressively implemented a lot of our logic in Javascript, to the point where it represents a sizable chunk of our codebase. Over the past two years, we’ve also started breaking down our Rails app into standalone services that are written in Scala or Go. And we also have Android and iOS apps that are using the vanilla frameworks and tooling on each platform.
  In my first year at Strava, I was tasked with tackling the internationalization and localization of the entire product; this deck tells the process we followed and the challenges we faced along the way.
Defining Goals Before anything else, you should be defining the goals you are after by internationalizing your product. It’s easy to have a gold rush mentality when it comes to international markets but it’s really important to know what you’re after: users, revenue – ideally both but that’s not necessarily obvious. Ask yourself whether the product will need to change and why.
  Which parts of the product constitute its identity? Is that identity suited for non-domestic audiences? It’s all about knowing where you’re going, learning what might work or not in terms of features, styles, names, etc…. Being first to market can be incredibly important – you also run the risk of being incredibly irrelevant. More importantly, conducting an engineering process in the absence of a concrete frame of reference is vain and hardly satisfying, at least to me.
Content Inventory The first thing you want to do in terms of process is building an inventory of your content – I’m not just talking about strings here: images, styles, palette of colors, etc… Who creates it, where is it stored, and is it structured are the basic three things you want to know. In Strava’s case, we have UI messages stored in three different file formats (YAML, XML, Apple Strings), static help content, FAQs and helpdesks maintained by our support team, emails and, for lack of a better word, curated content, e.g. Strava challenges or training videos, which are maintained in a database but controlled by the product and marketing team.
  The principal outcome of this process is a glossary: you want to identify the set of terms that are specific to the domain or industry in which you operate. Translating that will be the next step. The secondary outcome is commenting UI strings. Modern UIs can be remarkably terse and it proves challenging for translators to make sense of an isolated label on a button in the absence of context. Adding comments to UI messages is good practice and very helpful to translators – it will pay for itself in the end.
Content pipeline The next thing on the map is the content pipeline. That is a purely engineering job where the goal is to get the content previously identified flowing to the translators and then back to wherever it should be stored. In Strava’s case, the last thing I was interested in doing was developing or deploying an internal solution, so we went fishing for a service that could be our translation memory and workbench. We settled on Transifex, mainly because it was a managed service and provided an easy, engineer-friendly integration path.
Mobile content flow Our mobile apps ship every few months – Android and iOS are on the same cadence and on feature parity. Most of the process here is still driven by manual actions: whoever is in charge of the release simply executes a script that pushes the content to Transifex. On a regular basis (typically multiple times a day), a machine pulls the entirety of our translated content from Transifex into a dedicated git repository. When a release build is cut, the content is copied from that repository. The upside of the system is that we have a versioned history of our translated content. It’s hardly fine-grained but it can come handy to identify the source or the ocurrences of a translated term.
Web content flow When I started at Strava, we were releasing our web frontend 4 times a week, once a day Monday through Thursday. Since then we’ve extended the schedule to include Friday and doubled the number of daily releases. The end goal is to release as often as possible, with the minimal amount of overhead for the engienering team. We push the source content to Transifex on a daily basis, every night at 9pm.
  From then, our contract with our translation company stipulates that they will bring the project to being 100% translated on a daily basis. Because translators are generally distributed around the planet, there are temporary discrepancies between languages but nothing too hard to manage. Translations get pulled back by the same previously mentioned cron job. Upon the creation of a release branch, translations are updated from that repository. All of that is entirely automated.
CLDR Tooling-wise, invest time in getting familiar with CLDR. It’s the ultimate resource for doing all sorts of locale-sensitive tasks: formatting numbers, dates, sorting, currencies, etc… CLDR is actually just a set of data files from which code is generated. Official bindings exist for C++ and Java. Apple and Microsoft ship it as part of their native frameworks and, thanks to Cameron’s work at Twitter, Ruby and Javascript bindings are also available.
Choose boring technology As general advice when in comes to internationalization and technology in the larger sense, I like this article by Dan McKinley. It has nothing to do with internationalization in particular but sets a very relevant guideline: rolling out your own routines or trying to go off the beaten path has a high chance of failure because researching i18n problematics require time and thoroughness, and you very likely don’t have that. As an example, it wouldn’t occur to any reasonable person to use anything else but Unicode – it’s universal and it works. CLDR is like that, but for the rest.
  Investigate your dependencies – get a sense for whether they are ready for the ride and switch if they aren’t. We make extensive use of JQuery UI and are quite pleased with the level of support it provides in terms of languages. Other libraries express no interest, which is fine but needs to be accounted for.
Iterating releases In terms of timeline, we just couldn’t afford to do a massive language roll-out like big companies do. Back when I was working at Google, products were mandated to launch in 40 languages, which is nuts but is what Google does. We aimed a little lower and decided to launch our first language on our mobile apps first. Because they inherit from their desktop brethens, native mobile platforms have a level of maturity that the web platform simply doesn’t provide. Also, a majority of our users are on the mobile apps. France was a target market from the very beginning and I’m french so I could QA the app myself.
  The thing to see here is that despite internationalization being a serious thing, you can allow yourself to be scrappy as long as it isn’t observable by your users. Iterating allowed us to ship one language after the absolute minimum amount of time required. Launching the second language took less time, and so did the third, until it asymptotically converged to the time required for the translation and QA. We only started supporting international payments much later, too.
Pseudolocalization One of my favorite tools to use is pseudolocalization because it’s a visual representation of your progress: set a different default locale, make your UI strings go through a variety of filters and observe your product misbehaving: antipatterns such as string concatenation, failing layouts, strings that are simply hardcoded – it’s pretty gratifying to muscle through each issue at a time and have something to show for it.
Cub To that effect, I’d like to advertise for the tool which is based on Google’s pseudolocalization tool, which was originally developed by my team and later open-sourced. Cub is a Maven-ized version of that tool that adds support for common file formats such as Android XML, Mac Strings, and YAML.
Success looks like… So what defines success? First of all, your domestic users probably shouldn’t have seen a thing: to them, the product has remained the same even though you swapped the wheels on a moving train. Extending the railroad metaphor, the train (your product) should not have slowed down – features should have kept on shipping, modulo the engineering resources you put into it. On occasion, we’ve actually had Spanish users complaining that the product they were used to was all of a sudden not in English anymore – be mindful that you will be switching defaults for your users and implementing a way to override that is probably a good idea.
  Finally, your users will be happy to be able to use characters outside of the BMP. 🎉
Defining internationalization It happens on a regular basis that I try explaining what internationalization is and it’s always something I spend way too much time doing. Let’s start by what internationalization isn’t: a feature. Regardless of how proud you are of that fancy refactoring, you probably wouldn’t include it in your upcoming release notes. The same goes for i18n. Releasing for the German locale is, however, a feature that wouldn’t be possible without internationalization. It’s groundwork, grunt work, foundational. Evangelize and let others own their part of the pie: it’s in their interest and yours that there isn’t just a single person in charge.
  From the very first day a product or a company is started, it begins accruing debt. A debt most engineers are used to is the technical debt – you take shortcuts for short term gains but you ultimately have to repay that debt. Internationalization is one of the easiest thing to ignore from the very beginning. In a perfect world, everything and everyone would be ready for localization from the get-go. But if there’s no process in place to enforce it, it becomes an inevitable oversight.
  Tackling internationalization forces you to repay that debt, on multiple levels. How much of your content is unlocalizable? How does a pixel-perfect layout scale when faced with different content? But the most interesting debt to repay is the product debt: how much do you really know about your own product and can you foresee how it will succeed or fail when placed in the hands of a completely different user? Answering that question requires work – as most interesting things do. But it’s also an open-wide opportunity to make the product better, for everyone.