[HN Gopher] Show HN: Localization and translations should be cod...
       ___________________________________________________________________
        
       Show HN: Localization and translations should be code, not data
        
       Author : LeviticusMB
       Score  : 12 points
       Date   : 2022-07-05 20:15 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | verdverm wrote:
       | The problem I see with this is that every language would need to
       | replicate the code & logic.
       | 
       | With data / config, the translations are recorded in one place
       | and all consumers can get the update without code changes.
       | 
       | The big thing I've been wondering / looking for is a shared, open
       | source translation database. Anyone have links?
        
         | capableweb wrote:
         | > The big thing I've been wondering / looking for is a shared,
         | open source translation database. Anyone have links?
         | 
         | That's a neat idea. It'll be super useful for 80% of the cases,
         | where context is that important. But for the rest of the 20%,
         | context of where the translation will be used, is as important
         | as the word itself. So you cannot always reuse the same
         | translation in different contexts, as it'll sound unnatural
         | then.
         | 
         | Still, if there was a easy solution for being able to change
         | between different options for the translation, having a shared
         | open source translation database for projects to use, would be
         | very valuable and useful.
        
           | verdverm wrote:
           | The (surmountable) problem is tree-shaking so you only
           | include the translations you use
        
             | capableweb wrote:
             | If I can manage to store all the data from HN comments and
             | submissions in 99 GB (31993925 "items", in a very naive
             | way), we should be able to have a DB with most common
             | translations for most web apps way below that, closer to
             | 1GB, if some clever people do it :)
        
         | lwouis wrote:
         | Context-less translation can be done quite successfully these
         | days with online services. You could simply make a few hundred
         | calls to something like Google Translate and get good quality
         | translations in multiple languages.
         | 
         | This is built-in some of the top software translating platforms
         | to "seed" the initial translation. A bulk kickstart that can
         | optionally later be refined by human translators.
        
           | antaviana wrote:
           | As someone in the localization business, let me assure you
           | that, with the current state of the art, using machine
           | translation without any kind of human post-editing for UI is
           | a terrible idea.
           | 
           | That the UI is not in English does not mean that a non-
           | English person will be able to understand it and use it
           | successfully.
           | 
           | You can only do it if you do not have any kind of support for
           | those international users and if those users are not your
           | real customers but merely statistics in the usage dashboard
           | of a free product.
        
         | samuelstros wrote:
         | Since I am working on an open source localization solution
         | (that makes localization of software effortless), having an
         | open source "translation memory" database makes sense. I will
         | keep this idea in my mind! :)
        
       | msbarnett wrote:
       | It's a neat idea but by intermixing code, presentation, and data
       | you're going to run into a bunch of issues that the "traditional"
       | approach avoids.
       | 
       | For one thing, we get our translations by handing a yaml file to
       | external contractors. They don't need to squint at a file full of
       | code to distinguish the bits of english that need translating
       | from the bits that don't - they just have to translate the right
       | side of every key, and there's specialized tooling to help them
       | with this.
       | 
       | And for another, even in your toy example in the readme you've
       | now lost a Single Source of Truth for certain presentation
       | decisions. So now when some stakeholder comes to you and says
       | they hate the italicization in the intro paragraph and to lose it
       | ASAP, instead of taking the markup out of a common template that
       | different data gets inserted into, you have to edit each
       | language's version of the code to remove the markup (with all of
       | the attendant ease of making errors that comes along when you
       | lack a SPOT - easy to miss one language, etc). I'd expect these
       | kinds of multiplication-of-edit problems to grow increasingly
       | complex when you scale this approach beyond toy examples.
       | 
       | Basically this seems really hard to scale to large products, and
       | doesn't play well with division of labour.
        
         | bananarchist wrote:
         | > Single Source of Truth for certain presentation decisions.
         | 
         | You can't have a single source of truth for presentation
         | decisions in a multilingual product. Different languages have
         | different typographic traditions, will demand different minimum
         | container sizes based on word lengths and maybe this is
         | shocking but they sometimes run in different directions. If you
         | are not integrating the dev, design and localized copy editing
         | roles on your team, your product is going to look like trash
         | except where the primary language of the team is concerned.
         | 
         | Translation can scale for large products, but localization
         | cannot: until further notice, you can only do it the hard way,
         | or the wrong way.
        
           | msbarnett wrote:
           | > You can't have a single source of truth for presentation
           | decisions in a multilingual product. Different languages have
           | different typographic traditions, will demand different
           | minimum container sizes based on word lengths and maybe this
           | is shocking but they sometimes run in different directions.
           | 
           | Maybe this is shocking but I'm fluent in a language that is
           | sometimes written veritcally.
           | 
           | "You can't have one single common presentation for every
           | translation" is true in an absolute sense but often not true
           | in practice - eg) we hit most of Europe and North, Central,
           | and South America with ~10 static translations rendered into
           | one common presentational template, none of which run into
           | any of the truly complex layout differences that right-to-
           | left or vertical presentations would bring. We extensively QA
           | all of the languages we _do_ support, and presentation issues
           | are truly pretty damn rare. It 's your classic "80% of the
           | result for 20% of the effort" tradeoff.
           | 
           | Now, if you truly do need to localize in every language under
           | the sun then yeah, something like this can make sense, as it
           | gives you maximum flexibility wrt to varying your layout
           | alongside the translation.
           | 
           | But if you have _any_ simpler use-case (eg. supporting just
           | English, Spanish, French and Portuguese will give you an
           | enormous chunk of the planet with minimal overhead, as they
           | have very similar word lengths and presentation requirements)
           | then the approach here is just taking on all of the effort
           | and maintenance overhead of the maximally-complex case when
           | you have absolutely no need to.
        
       | olodus wrote:
       | "You tasked me with translating this scene, so since you gave me
       | a general programming language I used a buffer overflow to break
       | out into the animation engine and animate your characters to use
       | sign language."
       | 
       | Jokes aside I don't hate the idea and is actually quite positive
       | to writing translation in code. I am a bit questioning of why you
       | would need a new language for it though, why not use an existing
       | programming language?
       | 
       | As others pointed out here the biggest downside I can see is that
       | it would be harder to outsource.
        
       | [deleted]
        
       | LeviticusMB wrote:
       | Making localized web apps is such a pain and too often an
       | afterthought. But what if it took almost no extra effort to make
       | the app localized from the start?
       | 
       | What if you could get static type checking, key documentation and
       | code completion right in VS Code?
       | 
       | And what if the translations could be generated using an actual
       | programming language, and even represent HTML markup and not just
       | plain strings?
        
         | capableweb wrote:
         | Sounds like a great idea for translators who are also
         | programmers, or at least knows HTML (and syntax for logic,
         | judging by your examples). But I haven't worked in any
         | companies where the translators/the people doing localization
         | have been programmers, they have just been translators. This
         | will be more or less impossible for them to use efficiently, if
         | at all.
        
         | withinboredom wrote:
         | One solution is to use your native language as the key. Bam,
         | you have context in the code and when testing. No need for
         | shenanigans (and this is how it was done until someone decided
         | to popularize opaque keys in the last decade or so, in fact,
         | most battled-hardened and old libraries expect it to be done
         | that way). You can translate English to English (or whatever)
         | if you want to be able to change the wording without having to
         | retranslate everything... but then if you are changing the
         | wording for the native language, don't you have to retranslate
         | everything anyway?
        
           | duskwuff wrote:
           | > One solution is to use your native language as the key.
           | 
           | That fails pretty badly in two cases:
           | 
           | 1) If significant changes to the English (or whatever)
           | version need to be made, keeping the original text may be
           | more confusing than useful.
           | 
           | 2) When the native-language version is ambiguous in a way
           | that doesn't apply to other languages, e.g. when translating
           | to languages with grammatical gender, or when a single
           | English word can be used in multiple unrelated ways.
        
         | layer8 wrote:
         | ...then translators need to be programmers, or vice versa. That
         | may not scale to many languages/large products.
         | 
         | What would be useful is the ability to interactively see a
         | systematic set of examples of what the templates one is editing
         | evaluate to.
        
       | azeirah wrote:
       | The localization library I use supports most of this. Not all,
       | it's not a general purpose programming language of course, but it
       | supports variables and conditionals, which is basically enough to
       | do almost anything.
       | 
       | https://formatjs.io/docs/react-intl/api#message-syntax
        
       | samuelstros wrote:
       | Since months I am working on an open source localization solution
       | that tackles both developer and translator facing problems.
       | Treating translations as code completely leaves out translators,
       | who in most cases can not code.
       | 
       | I am working on making localization effortless via dev tools and
       | a dedicated editor for translators. Both pillars have one common
       | denominator: translations as data in source code. Treating
       | translations as code would break that denominator and prevent a
       | coherent end-to-end solution.
       | 
       | Take a look at the repository https://github.com/inlang/inlang.
       | The IDE extension already solves type safety, inline annotations,
       | and (partially) extraction of hardcoded strings.
        
       | rakshithbellare wrote:
       | What would be process for handoff from translators to
       | programmers?
        
       | eternityforest wrote:
       | I'm not quite sure I agree with the title. Having access to code
       | when you need it is probably a good thing.
       | 
       | But I think code is, in general, something to be avoided when
       | declarative approaches are available.
       | 
       | Declarative is easier for a computer to understand, it restricts
       | the inputs to one domain the computer can deal with.
       | 
       | You don't get the same classes of bugs with declarative. You
       | could even do things like double checking with machine
       | translation and flagging anything that doesn't match for human
       | review.
       | 
       | Plus, you don't need a programmer to do it. Security issues go
       | away. You often achieve very good reuse with code only existing
       | in one place without language variants.
       | 
       | I'm sure there are great uses for this, but I have trouble
       | thinking of even a single case where I'd prefer code to data in
       | general.
        
       ___________________________________________________________________
       (page generated 2022-07-05 23:01 UTC)