[HN Gopher] The second largest version of Wikipedia is written m... ___________________________________________________________________ The second largest version of Wikipedia is written mostly by one bot Author : jxub Score : 86 points Date : 2020-02-24 12:59 UTC (10 hours ago) (HTM) web link (www.vice.com) (TXT) w3m dump (www.vice.com) | brokensegue wrote: | Slightly pedantic but the largest "Wikipedia" (depending on how | you define it) is http://wikidata.org/ and it's also primarily | written by bots. | playpause wrote: | That's a wiki, not a Wikipedia. | brokensegue wrote: | but both are wikimedia projects | suprfsat wrote: | And it's a data model that's actually suited to being written | by bots. Instead of ... whatever this is. | tangoalpha wrote: | Clicking on random article on | https://ceb.m.wikipedia.org/wiki/Espesyal:Random#/random , looks | like every article is that of either a tree, or an animal, or an | insect, or a place... | peterburkimsher wrote: | I discovered this in 2018, when comparing lists of languages | supported by different software and the number of speakers. | | https://peterburk.github.io/i2018n/#wikipedia | | Having machine-translated content is powerful for SEO, but I | don't know how practical that is for Cebuano. It would be nice | for English to no longer be practically required for people to | become computer literate. | BiteCode_dev wrote: | > It would be nice for English to no longer be practically | required for people to become computer literate. | | French here. We are terrible at english in my country. | | Still, the fact most information in computing is shared in | english is a god send. Sure, you have to learn it, but then: | | - no need to search for it in so many languages | | - no need to produce translations of tutorials/docs/comments in | so many languages | | - the community to share and communicated with is huge and | diverse | | - english is way more efficient than french, spanish, german or | chinese to talk about technical stuff | seventh-chord wrote: | Genuinely curious about your last point (don't know much | about the topic). Is english intrinsically better at this, or | is it because of the presence of jargon? Is it a studied | phenomena, or is it something most people feel? | FalconSensei wrote: | In my opinion, jargon helps, but English is just easier to | learn and shorter to communicate. | | My first language is Portuguese, and English was not so | hard for me to learn. I studied a bit of French and it was | ok, but not as easy as English. And this considering it | should be a bit easier since Portuguese is my main | language. | | Now I'm studying German and... it's way harder than English | and French. At least, way more verbose and which more | complex grammar rules and conjugations. | reaperducer wrote: | I can see that being true. Many of the web sites I build | are multi-lingual, and when designing for many languages, | you have to take into consideration that certain languages | take many more characters or words to express an idea than | in English. | | Off the top of my head, I believe we factor in 15% more | text space for Spanish. German is something like 60% more. | kevingadd wrote: | When localizing software also it's a general rule of | thumb that your on-screen UI spaces need to be something | like 40% bigger than the English text that goes in them | since the German equivalent is always going to be way | bigger. It's common for it to turn out half-way through | localization that some of your text doesn't fit anymore. | | (FWIW, this also happens when swapping from ideographic | languages to English - western localizations of Japanese | video games often end up with very small text as a | result.) | BiteCode_dev wrote: | English is usually shorter than other latin-based | languages. It's longer than ideogram based ones but you | don't have to learn 100000 symbols to express yourself in | it. | | It also has a very simple grammar compared to most | languages. Take this sentence: | | "I would like not to go to school today" | | The french equivalent would be: | | "Je voudrais ne pas aller a l'ecole aujourd'hui." | | "would like" is a simple combination of two words, but in | french you need to know the precise conjugation of it. | | "not" is actually expressed as 2 words with "ne pas", which | can be positioned in several ways. | | Infinitive, like with "to go", is simple in english: just | add "to". In french, each verb is different, like "aller". | | Then you got "the" in any circumstances in english, but the | "l'", could also be "le, la, or les" depending of the word | after it. A;so remember that each word is either feminine | or masculine in french, even a stone or the sun. | | Then "a" and "ecole" got an accent. French has many of | them, you need to know the right one, where to place it, | how to pronunciation it and type it on the keyboard. | | Finally, "today" vs "aujourd'hui". I know which one is | easier to type in a bug report. | | Not to say English doesn't have weird traps, but it's very, | very relaxing compared to the rest. And much more | efficient. | | Also describing a view of the country side with it feels a | bit limiting. But I'm not Shakespear :) | airstrike wrote: | English has an incredibly simple grammar, comparatively | speaking. There are only three verb conjugations (if you | don't count auxiliary verbs) and one gender. Even naming | variables is easier! | FanaHOVA wrote: | > It would be nice for English to no longer be practically | required for people to become computer literate. | | That's already case in other mission critical industries, like | aviation. Hard to build businesses with cross-border | collaboration without using English. (This is also how I | learned English in the first place, it was a good motivator!) | airstrike wrote: | > It would be nice for English to no longer be practically | required for people to become computer literate. | | In turn, you'll get "every other language becomes practically | required for people to be able to communicate about computers" | FalconSensei wrote: | Yes. In the end, we need one common language so that we can | communicate with people around the globe, and that's not only | about computers, but everything. | [deleted] | qwerty456127 wrote: | So they mean to tell us "insignificant" facts and articles must | be deleted? | brodo wrote: | The German Wikipedia would be twice as big if mods weren't | obsessed with some made up criteria of relevance. | linksnapzz wrote: | Wait, the Germans are being _picky_ and _pedantic_? Way to | travel in unfortunate national stereotypes...:-) | markdown wrote: | All articles that fail to meet criteria are to be purged. | FalconSensei wrote: | That sad. In the end, all this would (if not already) make | then just go for the English version. I already do this (I'm | Brazilian) as the Portuguese version is nowhere near the | international (English) version in terms of completeness and | being up-to-date. | | BTW, do you have a link for their terms on "relevance"? | Polylactic_acid wrote: | The English version isn't particularly free. I attempted to | add a page about a file format that is fairly well used but | doesn't have a huge amount of information online about it. | The only real source is a zip file from a companies website | which contains a pdf with the file spec and some example | programs. Unfortunately the editors decided that due to the | lack of referencable sources, they would rather no article | exist at all. | qwerty456127 wrote: | This bullshit policy drives me mad. I will start donating | regularly once it's cancelled. Not sooner, nor later. | Polylactic_acid wrote: | I understand it for some cases where the mods just need | to stop people making up random crap on topics that don't | exist or can't be verified. But in this case a single | reference is more than enough to write the whole page | because the spec is literally the only source of truth on | the topic. | | Unfortunately I think the mods may be too passionate | about "protecting the integrity of wikipedia" that they | let legitimate content be deleted. It also doesn't help | that the wikipedia UI for disputes and edits is really | confusing and I had a hard time trying to work out what | was going on or how I communicate to this moderator. The | whole system is designed for power users only. | [deleted] | sings wrote: | I always thought it was a bit bizarre that different language | editions of Wikipedia contain different information. It seems the | focus should be more on translation than content creation. Maybe | that isn't practical with the current structure, but surely the | aim should be a definitive knowledge graph rather than a | disparate and unevenly duplicated set of articles. Just my two | cents - I am sure many have put a lot of thought into how to best | tackle this. | telesilla wrote: | I'd rather see Wikipedia find a way to link these different | sites in more interesting ways, for example if I go to the | entry for Carnival (https://en.wikipedia.org/wiki/Carnival), | why doesn't it link me to the Brazilian | (https://pt.wikipedia.org/wiki/Carnaval), Spanish | (https://es.wikipedia.org/wiki/Carnaval) or Italian | (https://it.wikipedia.org/wiki/Carnevale) entries for which I | might learn more, using auto-translate? | carlinmack wrote: | All of those languages and more are linked in the sidebar, | what would you prefer to see? | telesilla wrote: | You know, I never knew that that sidebar was a link to the | same entry in different languages - thanks! Still, it makes | me wonder if there is still a way to open up more content | in other languages, so that those who contribute more in- | depth can somehow have that content be shared on other | language pages more transparently. But, I never studied | library science and I'm sure finer minds than mine have | considered this problem. | jcranmer wrote: | The tricky thing is that any text content has to have | translation. You might be able to get away with not | translating maps, since place names tend to be more | stable (or at least generally pretty easy to work out) | across different languages. For example, "Pologne- | Lituanie" is going to be within the capability of most | English speakers to work out, even if they've never heard | of "Poland-Lithuania". | | It is possible to link images and other things via | Wikimedia, and my understanding is that Wikipedia does | push for people to do this. | wodenokoto wrote: | It links to articles in over 80 languages. So on one hand it | does a really good job at cross linking. On the other hand, | missing out on linking to the languages you mention seems | like a huge error. | samatman wrote: | It does link to all those languages. I think what the | grandparent was referring to is that there's no way to | indicate that an article in another language might be | interesting in some way. | | Such as, for Carnival, being written in a language spoken | by a people who celebrate it, or for Alexander the Great, | highlighting the languages spoken in territories he | conquered. | | It's an interesting proposal, but I get a headache just | thinking about the politics of implementing it. | Polylactic_acid wrote: | I'm guessing the task is just too hard so this is the next best | option. For all of the versions to contain the same content you | would have to have every edit made to at least the English | version and optionally another version. What happens when | someone who only knows a non English language wants to make an | edit? Does the site ping a user who knows both languages to | translate it? Its just easier to let the versions be split. | StavrosK wrote: | How do you mean? I'm fine with the fact that the Greek | Wikipedia doesn't contain an article about the Boston Tea | Party, but I like that it contains an article about the 1821 | rebellion. Requiring the information to be the same across | languages would mean that either both should be translated, or, | if no translator can be found, one should be deleted. | | EDIT: Or do you mean contain the same information between | different languages of a specific article? | 4cao wrote: | This endeavor looks largely orthogonal to what the objectives of | an online encyclopedia should be. Creating as many stub articles | as possible and filling them with "formulaic, generic, and | reusable templated sentences with spots for specific information" | seems more like a recipe for an automated content farm than for | "disseminating the sum of _human_ knowledge. " | | It would be most interesting to know what the 148 active Cebuano | Wikipedia users think of the 5,331,028 articles the bot created, | ostensibly for them. Too bad nobody apparently cared to ask. | | In particular, since Cebuano speakers are likely to be fluent in | Tagalog and/or English as well, they can easily use one of the | other Wikipedia editions too. Without the hyperactive bot, the | much smaller Cebuano Wikipedia would arguably be more relevant, | reflecting topics truly of interest to the community. | | While the number of articles is a convenient way of comparing | Wikipedia language editions, it only works as such to the extent | that the articles are kept to a certain standard. It seems to me | that what we are observing here is yet another example of the | situation that when a measure becomes a target it ceases to be a | good measure. ___________________________________________________________________ (page generated 2020-02-24 23:00 UTC)