[HN Gopher] The second largest version of Wikipedia is written m...
       ___________________________________________________________________
        
       The second largest version of Wikipedia is written mostly by one
       bot
        
       Author : jxub
       Score  : 86 points
       Date   : 2020-02-24 12:59 UTC (10 hours ago)
        
 (HTM) web link (www.vice.com)
 (TXT) w3m dump (www.vice.com)
        
       | brokensegue wrote:
       | Slightly pedantic but the largest "Wikipedia" (depending on how
       | you define it) is http://wikidata.org/ and it's also primarily
       | written by bots.
        
         | playpause wrote:
         | That's a wiki, not a Wikipedia.
        
           | brokensegue wrote:
           | but both are wikimedia projects
        
         | suprfsat wrote:
         | And it's a data model that's actually suited to being written
         | by bots. Instead of ... whatever this is.
        
       | tangoalpha wrote:
       | Clicking on random article on
       | https://ceb.m.wikipedia.org/wiki/Espesyal:Random#/random , looks
       | like every article is that of either a tree, or an animal, or an
       | insect, or a place...
        
       | peterburkimsher wrote:
       | I discovered this in 2018, when comparing lists of languages
       | supported by different software and the number of speakers.
       | 
       | https://peterburk.github.io/i2018n/#wikipedia
       | 
       | Having machine-translated content is powerful for SEO, but I
       | don't know how practical that is for Cebuano. It would be nice
       | for English to no longer be practically required for people to
       | become computer literate.
        
         | BiteCode_dev wrote:
         | > It would be nice for English to no longer be practically
         | required for people to become computer literate.
         | 
         | French here. We are terrible at english in my country.
         | 
         | Still, the fact most information in computing is shared in
         | english is a god send. Sure, you have to learn it, but then:
         | 
         | - no need to search for it in so many languages
         | 
         | - no need to produce translations of tutorials/docs/comments in
         | so many languages
         | 
         | - the community to share and communicated with is huge and
         | diverse
         | 
         | - english is way more efficient than french, spanish, german or
         | chinese to talk about technical stuff
        
           | seventh-chord wrote:
           | Genuinely curious about your last point (don't know much
           | about the topic). Is english intrinsically better at this, or
           | is it because of the presence of jargon? Is it a studied
           | phenomena, or is it something most people feel?
        
             | FalconSensei wrote:
             | In my opinion, jargon helps, but English is just easier to
             | learn and shorter to communicate.
             | 
             | My first language is Portuguese, and English was not so
             | hard for me to learn. I studied a bit of French and it was
             | ok, but not as easy as English. And this considering it
             | should be a bit easier since Portuguese is my main
             | language.
             | 
             | Now I'm studying German and... it's way harder than English
             | and French. At least, way more verbose and which more
             | complex grammar rules and conjugations.
        
             | reaperducer wrote:
             | I can see that being true. Many of the web sites I build
             | are multi-lingual, and when designing for many languages,
             | you have to take into consideration that certain languages
             | take many more characters or words to express an idea than
             | in English.
             | 
             | Off the top of my head, I believe we factor in 15% more
             | text space for Spanish. German is something like 60% more.
        
               | kevingadd wrote:
               | When localizing software also it's a general rule of
               | thumb that your on-screen UI spaces need to be something
               | like 40% bigger than the English text that goes in them
               | since the German equivalent is always going to be way
               | bigger. It's common for it to turn out half-way through
               | localization that some of your text doesn't fit anymore.
               | 
               | (FWIW, this also happens when swapping from ideographic
               | languages to English - western localizations of Japanese
               | video games often end up with very small text as a
               | result.)
        
             | BiteCode_dev wrote:
             | English is usually shorter than other latin-based
             | languages. It's longer than ideogram based ones but you
             | don't have to learn 100000 symbols to express yourself in
             | it.
             | 
             | It also has a very simple grammar compared to most
             | languages. Take this sentence:
             | 
             | "I would like not to go to school today"
             | 
             | The french equivalent would be:
             | 
             | "Je voudrais ne pas aller a l'ecole aujourd'hui."
             | 
             | "would like" is a simple combination of two words, but in
             | french you need to know the precise conjugation of it.
             | 
             | "not" is actually expressed as 2 words with "ne pas", which
             | can be positioned in several ways.
             | 
             | Infinitive, like with "to go", is simple in english: just
             | add "to". In french, each verb is different, like "aller".
             | 
             | Then you got "the" in any circumstances in english, but the
             | "l'", could also be "le, la, or les" depending of the word
             | after it. A;so remember that each word is either feminine
             | or masculine in french, even a stone or the sun.
             | 
             | Then "a" and "ecole" got an accent. French has many of
             | them, you need to know the right one, where to place it,
             | how to pronunciation it and type it on the keyboard.
             | 
             | Finally, "today" vs "aujourd'hui". I know which one is
             | easier to type in a bug report.
             | 
             | Not to say English doesn't have weird traps, but it's very,
             | very relaxing compared to the rest. And much more
             | efficient.
             | 
             | Also describing a view of the country side with it feels a
             | bit limiting. But I'm not Shakespear :)
        
             | airstrike wrote:
             | English has an incredibly simple grammar, comparatively
             | speaking. There are only three verb conjugations (if you
             | don't count auxiliary verbs) and one gender. Even naming
             | variables is easier!
        
         | FanaHOVA wrote:
         | > It would be nice for English to no longer be practically
         | required for people to become computer literate.
         | 
         | That's already case in other mission critical industries, like
         | aviation. Hard to build businesses with cross-border
         | collaboration without using English. (This is also how I
         | learned English in the first place, it was a good motivator!)
        
         | airstrike wrote:
         | > It would be nice for English to no longer be practically
         | required for people to become computer literate.
         | 
         | In turn, you'll get "every other language becomes practically
         | required for people to be able to communicate about computers"
        
           | FalconSensei wrote:
           | Yes. In the end, we need one common language so that we can
           | communicate with people around the globe, and that's not only
           | about computers, but everything.
        
       | [deleted]
        
       | qwerty456127 wrote:
       | So they mean to tell us "insignificant" facts and articles must
       | be deleted?
        
         | brodo wrote:
         | The German Wikipedia would be twice as big if mods weren't
         | obsessed with some made up criteria of relevance.
        
           | linksnapzz wrote:
           | Wait, the Germans are being _picky_ and _pedantic_? Way to
           | travel in unfortunate national stereotypes...:-)
        
             | markdown wrote:
             | All articles that fail to meet criteria are to be purged.
        
           | FalconSensei wrote:
           | That sad. In the end, all this would (if not already) make
           | then just go for the English version. I already do this (I'm
           | Brazilian) as the Portuguese version is nowhere near the
           | international (English) version in terms of completeness and
           | being up-to-date.
           | 
           | BTW, do you have a link for their terms on "relevance"?
        
             | Polylactic_acid wrote:
             | The English version isn't particularly free. I attempted to
             | add a page about a file format that is fairly well used but
             | doesn't have a huge amount of information online about it.
             | The only real source is a zip file from a companies website
             | which contains a pdf with the file spec and some example
             | programs. Unfortunately the editors decided that due to the
             | lack of referencable sources, they would rather no article
             | exist at all.
        
               | qwerty456127 wrote:
               | This bullshit policy drives me mad. I will start donating
               | regularly once it's cancelled. Not sooner, nor later.
        
               | Polylactic_acid wrote:
               | I understand it for some cases where the mods just need
               | to stop people making up random crap on topics that don't
               | exist or can't be verified. But in this case a single
               | reference is more than enough to write the whole page
               | because the spec is literally the only source of truth on
               | the topic.
               | 
               | Unfortunately I think the mods may be too passionate
               | about "protecting the integrity of wikipedia" that they
               | let legitimate content be deleted. It also doesn't help
               | that the wikipedia UI for disputes and edits is really
               | confusing and I had a hard time trying to work out what
               | was going on or how I communicate to this moderator. The
               | whole system is designed for power users only.
        
         | [deleted]
        
       | sings wrote:
       | I always thought it was a bit bizarre that different language
       | editions of Wikipedia contain different information. It seems the
       | focus should be more on translation than content creation. Maybe
       | that isn't practical with the current structure, but surely the
       | aim should be a definitive knowledge graph rather than a
       | disparate and unevenly duplicated set of articles. Just my two
       | cents - I am sure many have put a lot of thought into how to best
       | tackle this.
        
         | telesilla wrote:
         | I'd rather see Wikipedia find a way to link these different
         | sites in more interesting ways, for example if I go to the
         | entry for Carnival (https://en.wikipedia.org/wiki/Carnival),
         | why doesn't it link me to the Brazilian
         | (https://pt.wikipedia.org/wiki/Carnaval), Spanish
         | (https://es.wikipedia.org/wiki/Carnaval) or Italian
         | (https://it.wikipedia.org/wiki/Carnevale) entries for which I
         | might learn more, using auto-translate?
        
           | carlinmack wrote:
           | All of those languages and more are linked in the sidebar,
           | what would you prefer to see?
        
             | telesilla wrote:
             | You know, I never knew that that sidebar was a link to the
             | same entry in different languages - thanks! Still, it makes
             | me wonder if there is still a way to open up more content
             | in other languages, so that those who contribute more in-
             | depth can somehow have that content be shared on other
             | language pages more transparently. But, I never studied
             | library science and I'm sure finer minds than mine have
             | considered this problem.
        
               | jcranmer wrote:
               | The tricky thing is that any text content has to have
               | translation. You might be able to get away with not
               | translating maps, since place names tend to be more
               | stable (or at least generally pretty easy to work out)
               | across different languages. For example, "Pologne-
               | Lituanie" is going to be within the capability of most
               | English speakers to work out, even if they've never heard
               | of "Poland-Lithuania".
               | 
               | It is possible to link images and other things via
               | Wikimedia, and my understanding is that Wikipedia does
               | push for people to do this.
        
           | wodenokoto wrote:
           | It links to articles in over 80 languages. So on one hand it
           | does a really good job at cross linking. On the other hand,
           | missing out on linking to the languages you mention seems
           | like a huge error.
        
             | samatman wrote:
             | It does link to all those languages. I think what the
             | grandparent was referring to is that there's no way to
             | indicate that an article in another language might be
             | interesting in some way.
             | 
             | Such as, for Carnival, being written in a language spoken
             | by a people who celebrate it, or for Alexander the Great,
             | highlighting the languages spoken in territories he
             | conquered.
             | 
             | It's an interesting proposal, but I get a headache just
             | thinking about the politics of implementing it.
        
         | Polylactic_acid wrote:
         | I'm guessing the task is just too hard so this is the next best
         | option. For all of the versions to contain the same content you
         | would have to have every edit made to at least the English
         | version and optionally another version. What happens when
         | someone who only knows a non English language wants to make an
         | edit? Does the site ping a user who knows both languages to
         | translate it? Its just easier to let the versions be split.
        
         | StavrosK wrote:
         | How do you mean? I'm fine with the fact that the Greek
         | Wikipedia doesn't contain an article about the Boston Tea
         | Party, but I like that it contains an article about the 1821
         | rebellion. Requiring the information to be the same across
         | languages would mean that either both should be translated, or,
         | if no translator can be found, one should be deleted.
         | 
         | EDIT: Or do you mean contain the same information between
         | different languages of a specific article?
        
       | 4cao wrote:
       | This endeavor looks largely orthogonal to what the objectives of
       | an online encyclopedia should be. Creating as many stub articles
       | as possible and filling them with "formulaic, generic, and
       | reusable templated sentences with spots for specific information"
       | seems more like a recipe for an automated content farm than for
       | "disseminating the sum of _human_ knowledge. "
       | 
       | It would be most interesting to know what the 148 active Cebuano
       | Wikipedia users think of the 5,331,028 articles the bot created,
       | ostensibly for them. Too bad nobody apparently cared to ask.
       | 
       | In particular, since Cebuano speakers are likely to be fluent in
       | Tagalog and/or English as well, they can easily use one of the
       | other Wikipedia editions too. Without the hyperactive bot, the
       | much smaller Cebuano Wikipedia would arguably be more relevant,
       | reflecting topics truly of interest to the community.
       | 
       | While the number of articles is a convenient way of comparing
       | Wikipedia language editions, it only works as such to the extent
       | that the articles are kept to a certain standard. It seems to me
       | that what we are observing here is yet another example of the
       | situation that when a measure becomes a target it ceases to be a
       | good measure.
        
       ___________________________________________________________________
       (page generated 2020-02-24 23:00 UTC)