[HN Gopher] Show HN: Localization and translations should be cod... ___________________________________________________________________ Show HN: Localization and translations should be code, not data Author : LeviticusMB Score : 12 points Date : 2022-07-05 20:15 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | verdverm wrote: | The problem I see with this is that every language would need to | replicate the code & logic. | | With data / config, the translations are recorded in one place | and all consumers can get the update without code changes. | | The big thing I've been wondering / looking for is a shared, open | source translation database. Anyone have links? | capableweb wrote: | > The big thing I've been wondering / looking for is a shared, | open source translation database. Anyone have links? | | That's a neat idea. It'll be super useful for 80% of the cases, | where context is that important. But for the rest of the 20%, | context of where the translation will be used, is as important | as the word itself. So you cannot always reuse the same | translation in different contexts, as it'll sound unnatural | then. | | Still, if there was a easy solution for being able to change | between different options for the translation, having a shared | open source translation database for projects to use, would be | very valuable and useful. | verdverm wrote: | The (surmountable) problem is tree-shaking so you only | include the translations you use | capableweb wrote: | If I can manage to store all the data from HN comments and | submissions in 99 GB (31993925 "items", in a very naive | way), we should be able to have a DB with most common | translations for most web apps way below that, closer to | 1GB, if some clever people do it :) | lwouis wrote: | Context-less translation can be done quite successfully these | days with online services. You could simply make a few hundred | calls to something like Google Translate and get good quality | translations in multiple languages. | | This is built-in some of the top software translating platforms | to "seed" the initial translation. A bulk kickstart that can | optionally later be refined by human translators. | antaviana wrote: | As someone in the localization business, let me assure you | that, with the current state of the art, using machine | translation without any kind of human post-editing for UI is | a terrible idea. | | That the UI is not in English does not mean that a non- | English person will be able to understand it and use it | successfully. | | You can only do it if you do not have any kind of support for | those international users and if those users are not your | real customers but merely statistics in the usage dashboard | of a free product. | samuelstros wrote: | Since I am working on an open source localization solution | (that makes localization of software effortless), having an | open source "translation memory" database makes sense. I will | keep this idea in my mind! :) | msbarnett wrote: | It's a neat idea but by intermixing code, presentation, and data | you're going to run into a bunch of issues that the "traditional" | approach avoids. | | For one thing, we get our translations by handing a yaml file to | external contractors. They don't need to squint at a file full of | code to distinguish the bits of english that need translating | from the bits that don't - they just have to translate the right | side of every key, and there's specialized tooling to help them | with this. | | And for another, even in your toy example in the readme you've | now lost a Single Source of Truth for certain presentation | decisions. So now when some stakeholder comes to you and says | they hate the italicization in the intro paragraph and to lose it | ASAP, instead of taking the markup out of a common template that | different data gets inserted into, you have to edit each | language's version of the code to remove the markup (with all of | the attendant ease of making errors that comes along when you | lack a SPOT - easy to miss one language, etc). I'd expect these | kinds of multiplication-of-edit problems to grow increasingly | complex when you scale this approach beyond toy examples. | | Basically this seems really hard to scale to large products, and | doesn't play well with division of labour. | bananarchist wrote: | > Single Source of Truth for certain presentation decisions. | | You can't have a single source of truth for presentation | decisions in a multilingual product. Different languages have | different typographic traditions, will demand different minimum | container sizes based on word lengths and maybe this is | shocking but they sometimes run in different directions. If you | are not integrating the dev, design and localized copy editing | roles on your team, your product is going to look like trash | except where the primary language of the team is concerned. | | Translation can scale for large products, but localization | cannot: until further notice, you can only do it the hard way, | or the wrong way. | msbarnett wrote: | > You can't have a single source of truth for presentation | decisions in a multilingual product. Different languages have | different typographic traditions, will demand different | minimum container sizes based on word lengths and maybe this | is shocking but they sometimes run in different directions. | | Maybe this is shocking but I'm fluent in a language that is | sometimes written veritcally. | | "You can't have one single common presentation for every | translation" is true in an absolute sense but often not true | in practice - eg) we hit most of Europe and North, Central, | and South America with ~10 static translations rendered into | one common presentational template, none of which run into | any of the truly complex layout differences that right-to- | left or vertical presentations would bring. We extensively QA | all of the languages we _do_ support, and presentation issues | are truly pretty damn rare. It 's your classic "80% of the | result for 20% of the effort" tradeoff. | | Now, if you truly do need to localize in every language under | the sun then yeah, something like this can make sense, as it | gives you maximum flexibility wrt to varying your layout | alongside the translation. | | But if you have _any_ simpler use-case (eg. supporting just | English, Spanish, French and Portuguese will give you an | enormous chunk of the planet with minimal overhead, as they | have very similar word lengths and presentation requirements) | then the approach here is just taking on all of the effort | and maintenance overhead of the maximally-complex case when | you have absolutely no need to. | olodus wrote: | "You tasked me with translating this scene, so since you gave me | a general programming language I used a buffer overflow to break | out into the animation engine and animate your characters to use | sign language." | | Jokes aside I don't hate the idea and is actually quite positive | to writing translation in code. I am a bit questioning of why you | would need a new language for it though, why not use an existing | programming language? | | As others pointed out here the biggest downside I can see is that | it would be harder to outsource. | [deleted] | LeviticusMB wrote: | Making localized web apps is such a pain and too often an | afterthought. But what if it took almost no extra effort to make | the app localized from the start? | | What if you could get static type checking, key documentation and | code completion right in VS Code? | | And what if the translations could be generated using an actual | programming language, and even represent HTML markup and not just | plain strings? | capableweb wrote: | Sounds like a great idea for translators who are also | programmers, or at least knows HTML (and syntax for logic, | judging by your examples). But I haven't worked in any | companies where the translators/the people doing localization | have been programmers, they have just been translators. This | will be more or less impossible for them to use efficiently, if | at all. | withinboredom wrote: | One solution is to use your native language as the key. Bam, | you have context in the code and when testing. No need for | shenanigans (and this is how it was done until someone decided | to popularize opaque keys in the last decade or so, in fact, | most battled-hardened and old libraries expect it to be done | that way). You can translate English to English (or whatever) | if you want to be able to change the wording without having to | retranslate everything... but then if you are changing the | wording for the native language, don't you have to retranslate | everything anyway? | duskwuff wrote: | > One solution is to use your native language as the key. | | That fails pretty badly in two cases: | | 1) If significant changes to the English (or whatever) | version need to be made, keeping the original text may be | more confusing than useful. | | 2) When the native-language version is ambiguous in a way | that doesn't apply to other languages, e.g. when translating | to languages with grammatical gender, or when a single | English word can be used in multiple unrelated ways. | layer8 wrote: | ...then translators need to be programmers, or vice versa. That | may not scale to many languages/large products. | | What would be useful is the ability to interactively see a | systematic set of examples of what the templates one is editing | evaluate to. | azeirah wrote: | The localization library I use supports most of this. Not all, | it's not a general purpose programming language of course, but it | supports variables and conditionals, which is basically enough to | do almost anything. | | https://formatjs.io/docs/react-intl/api#message-syntax | samuelstros wrote: | Since months I am working on an open source localization solution | that tackles both developer and translator facing problems. | Treating translations as code completely leaves out translators, | who in most cases can not code. | | I am working on making localization effortless via dev tools and | a dedicated editor for translators. Both pillars have one common | denominator: translations as data in source code. Treating | translations as code would break that denominator and prevent a | coherent end-to-end solution. | | Take a look at the repository https://github.com/inlang/inlang. | The IDE extension already solves type safety, inline annotations, | and (partially) extraction of hardcoded strings. | rakshithbellare wrote: | What would be process for handoff from translators to | programmers? | eternityforest wrote: | I'm not quite sure I agree with the title. Having access to code | when you need it is probably a good thing. | | But I think code is, in general, something to be avoided when | declarative approaches are available. | | Declarative is easier for a computer to understand, it restricts | the inputs to one domain the computer can deal with. | | You don't get the same classes of bugs with declarative. You | could even do things like double checking with machine | translation and flagging anything that doesn't match for human | review. | | Plus, you don't need a programmer to do it. Security issues go | away. You often achieve very good reuse with code only existing | in one place without language variants. | | I'm sure there are great uses for this, but I have trouble | thinking of even a single case where I'd prefer code to data in | general. ___________________________________________________________________ (page generated 2022-07-05 23:01 UTC)