[HN Gopher] Who lusts for certainty lusts for lies
       ___________________________________________________________________
        
       Who lusts for certainty lusts for lies
        
       Author : hprotagonist
       Score  : 349 points
       Date   : 2023-09-26 10:50 UTC (12 hours ago)
        
 (HTM) web link (www.etymonline.com)
 (TXT) w3m dump (www.etymonline.com)
        
       | laura_g wrote:
       | What is it specifically about the 1970/80s that causes this dip?
       | Was there an explosion of this academic writing around that era
       | or something else to have this effect?
        
       | thfuran wrote:
       | That or maths. Though I seem to recall a quote about
       | statistics...
        
         | [deleted]
        
         | hprotagonist wrote:
         | in the case of ngrams, both!
        
           | thfuran wrote:
           | Yes, I think (as the article says) using ngrams can easily
           | land you in the camp of telling lies with statistics.
        
       | tensor wrote:
       | The authors assert that the ngram statistics for "said" are
       | wrong, and imply that they have evidence of the contrary, but
       | they don't provide the evidence. Looking at their own website,
       | all they provide is google ngram statistics:
       | https://www.etymonline.com/word/said#etymonline_v_25922.
       | 
       | This coupled with the huge failing of not displaying zero on the
       | y-axis of their graph, and even _interpreting_ the bad graph
       | wrong, makes me not believe them at all. A very low quality
       | article.
        
         | coldtea wrote:
         | A low effort comment. That "said" haven't declined and raised
         | the way shown isn't what needs evidence.
         | 
         | It's the extraordinary claim that it has that does.
         | 
         | That claim is Google's, and before accusing the author of the
         | blog, maybe how representative their unseen dataset is. Should
         | we take statistics with no knowledge of their input set at face
         | value because "trust Google"?
        
           | tensor wrote:
           | Google isn't claiming any such statement. It's merely
           | providing fun statistics based on their data set. With that
           | context, when I read a headline claiming that the statistics
           | are "wrong," it would imply that the counts are somehow off.
           | Maybe due to a bug in the algorithm or the like.
           | 
           | Instead, we get a strawman put up where they misrepresent
           | what the data set is, make up things that its "claiming,"
           | fail to investigate the underlying data sources and look into
           | "why" they see the trend they see, and also fail to provide
           | any alternative data.
           | 
           | It's cheap and snobby grandstanding, ironically complete with
           | faulty interpretations of the little data they DO present.
        
             | mattigames wrote:
             | But Google is claiming such thing by calling it "trends",
             | which the dictionary defines as "a general direction in
             | which something is developing or changing.", if they didn't
             | want to create such misunderstandings they would just call
             | it "word frequency on Google books" so the biases of the
             | data would be a lot more clear.
        
         | prepend wrote:
         | It's hard to present evidence because there's only one source.
         | So the article basically calls out flaws in the methodology of
         | Google Books/Ngram.
         | 
         | I think this is reasonable. As otherwise we end up accepting
         | things that exist solely, but are flawed. Just because
         | something exists and is easy to use doesn't mean it's right.
         | 
         | Just like the answer to "the most tweeted thing is X therefore
         | it is most popular and important" does not require a separate
         | study to find the truth. It's acceptable just to say "this is a
         | stupid methodology, don't accept it just because that's what
         | twitter says."
        
         | lolc wrote:
         | A decline to half the usage of "said" within 6 decades,
         | followed by a recovery to the previous level within two
         | decades? Show me evidence that the English language changed so
         | fast in that way. It's extraordinary and you'd have to bring
         | something convincing. Otherwise I believe their hypothesis and
         | their conclusion that ngrams are bunk.
         | 
         | Yeah they interpreted the "toast" graph wrong. They should be
         | more careful to read shitty graphs that cut off at the low
         | point.
        
           | pixelesque wrote:
           | It's possible (but I think unlikely) it could be somewhat due
           | to different usage of words than the English language
           | changing completely (which clearly didn't happen).
           | 
           | i.e. maybe instead of lots of books having direct text like
           | "David said" or "Dora said", over time there was a trend to
           | use a different more varied/descriptive way of describing
           | that, i.e. "David replied" or "Dora retorted"?
        
             | lolc wrote:
             | Yea there may be a shift in usage hidden in those numbers.
             | As this article laments, we can't use ngrams to measure the
             | develpment of usage between said, replied, and retorted.
        
           | tensor wrote:
           | It depends entirely on what the data set is, and to conclude
           | that it's "wrong" you'd have to consider the underlying data
           | too. Google ngrams makes no claim to be a consistent
           | benchmark type data set. Over time the content its based on
           | shifts, which can cause effects like this.
           | 
           | To make any sort of claim like "this word's usage changes
           | over time" in an academic sense you'd need to include a
           | discussion of the data sources you used and why those are
           | representative of word usage over time. The fact that they'd
           | even try to use google ngrams in this way shows how little
           | they actually researched the topic.
           | 
           | Google ngrams is a cute data set that can sometimes show
           | rough trends, but it's not some "authoritative source on
           | usage over time" and it doesn't claim to be.
           | 
           | The authors, on the other hand, are claiming to be
           | authoritative and thus the burden of evidence on their claims
           | is far far far higher. I didn't even get into their
           | completely unobjective and vague accusations of "AI" somehow
           | doing something bad. Ngrams don't involve AI, it's simple
           | word counting.
        
             | lolc wrote:
             | The way I read it, the article was a rant about how people
             | shouldn't be using ngrams to prove things.
        
         | lolinder wrote:
         | EtymOnline isn't in the business of tracking shifts in the
         | popularity of words over time, they set out to track shifts in
         | _meaning_. So it 's understandable that they don't have any
         | specific contrary evidence in their listing for "said".
         | 
         | As for why they don't include the evidence in TFA, as others
         | have noted, it's the extraordinary claim that "said" dropped to
         | nearly 1/3 of its peak usage that needs extraordinary evidence
         | backing it up. It's plenty sufficient for them to say "this
         | doesn't make any sense at all on its face, and is most likely
         | due to a major shift in the genre makeup of Google's dataset".
        
         | wrsh07 wrote:
         | I think what you want is for someone (yourself, me, the author)
         | to review newspapers or some similar source and determine how
         | the frequency percent changes over time for the word "said".
         | 
         | This is a reasonable request, but I also think it's fine for
         | the author to state it _as an expert_ that newspapers continued
         | using said at a similar frequency. The story they tell us
         | plausible, and I don't really think the burden of proof is on
         | them.
        
       | vlz wrote:
       | While the point made by the authors is certainly a valid one,
       | it's a bit sneaky and not very fitting to their overall message
       | that they have the Y-axes on the ngram graphs not 0-indexed. This
       | makes the google results seem more extreme than they in fact are
       | and is a bit of misdirection in itself.
       | 
       | Compare e.g. to the actual ngram viewer which seems to index by 0
       | per default:
       | 
       | https://books.google.com/ngrams/graph?content=said&year_star...
       | 
       | https://books.google.com/ngrams/graph?content=said&year_star...
        
         | boxed wrote:
         | Such a shame too as the point would be equally valid without
         | the graph-lies.
        
           | chefandy wrote:
           | Kind of. The author could fix a lot of their problems with
           | the very prominent dropdown above the graph letting them
           | select the collection-- English fiction for example. The long
           | s character can be tricky for OCR, but is not likely relevant
           | to most people's casual use of the tool. I worked on a team
           | that overcame it in a high volume scanning project so they
           | should be able to correct that with software and their
           | existing page images. The plurals criticism is just wrong--
           | you can even do case sensitive searches.
           | 
           | It's not perfect, but it's not useless, and it's not a
           | "lie"-- it's just a blunt instrument. Even if the criticism
           | was factually correct, 'proving' that you can't do fine work
           | with blunt instrument is of dubious value.
           | 
           | I think a lot of folks around here are super thirsty to see
           | big tech companies get zinged and when it happens, their fact
           | checking skills suffer.
        
         | [deleted]
        
       | stefantalpalaru wrote:
       | [dead]
        
       | nerdponx wrote:
       | This is the fundamental problem of data analysis: your analysis
       | is only as good as your data.
       | 
       | This is not an easy problem.
       | 
       | It's hard in general to evaluate data quality: How do we know
       | when our data is good? Are we sure? How do we measure that and
       | report on it?
       | 
       | If we do have some qualitative or quantitative assessment of data
       | quality, how do we present it in a way that is integrated with
       | the results of our analysis?
       | 
       | And if we want to quantitatively adjust our results for data
       | quality, how do we do that?
       | 
       | There are answers to the above, but they lie beyond the realm of
       | a simple line chart, and they tend to require a fair amount of
       | custom effort for each project.
       | 
       | For example in the Google Ngrams case, one could present the data
       | quality information on a chart showing the composition of data
       | sources over time, broken out into broad categories like
       | "academic" and "news". But then you have to assign categories to
       | all those documents, which might be easy or hard depending on how
       | they were obtained. And then you also have to post a link to that
       | chart somewhere very prominently, so that people actually look at
       | it, and maybe include some explanatory disclaimer text. That
       | would help, but it's not going to prevent the intuitive reaction
       | when a human looks at a time series of word usage declining.
       | 
       | Maybe a better option is to try to quantify the uncertainty in
       | the word usage time series and overlay that on the chart. There
       | are well-established visualization techniques for doing this. but
       | how do we quantify uncertainty in word usage? In this case, our
       | count of usages is exact: the only uncertainty is uncertainty
       | related to sampling. In order to quantify uncertainty, we must
       | estimate how much our sample of documents deviates from all
       | documents written at that time. It might be doable, but it
       | doesn't sound easy. And once we have that done, will people
       | actually interpret that uncertainty overlay correctly? Or will
       | they just look at the line going down and ignore the rest?
       | 
       | Your analysis is only as good as your data. This has been a
       | fundamental problem for as long as we have been trying to analyze
       | data, and it's never going to go away. We would do well to
       | remember this as we move into the "AI age".
       | 
       | It also says something about us as well: throughout our lives, we
       | learn from data. We observe and consider and form opinions. How
       | good it is the data that we have observed? Are our conclusions
       | valid?
        
       | gcanyon wrote:
       | From the comments on that page: "Do publishers still order many
       | carloads of "is" each year during spring thaw..."
       | 
       | In Dictionopolis they do! Any Phantom Tollbooth peeps here?
       | 
       | https://en.wikipedia.org/wiki/The_Phantom_Tollbooth
        
       | gitgud wrote:
       | Reminds me of a feeling I had when solving a jigsaw puzzle:
       | 
       |  _Everything must fit together to reveal the big picture!_ ...
       | 
       | In reality things almost _never_ fit together to reveal some big
       | picture... so trying to make them fit like puzzle pieces often
       | leads to false conclusions
        
       | digitalsushi wrote:
       | When a measure (certainty) becomes a target, it ceases to be a
       | good measure (lies)
        
       | gniv wrote:
       | BTW, that glyph should have a small bar on the left, but I don't
       | see it in the article (in Chrome on Mac).
       | 
       | https://www.compart.com/en/unicode/U+017F (that looks more like
       | an s)
       | 
       | Edit: But I see it in fixed-width font:                   s
        
         | bradrn wrote:
         | > that glyph should have a small bar on the left
         | 
         | It depends on the typeface. My browser's fixed-width font, for
         | instance, doesn't display a bar.
        
       | brightball wrote:
       | "Only a fool is sure if anything, the wise man is always
       | guessing." - MacGuyver
        
       | dotsam wrote:
       | > It doesn't look like an indicator of the diachronic change in
       | the popularity...
       | 
       | I thought all change is diachronic.*
       | 
       | I looked it up and found out that 'diachrony' is a term of art in
       | linguistic analysis, contrasting with synchronic analysis.
       | 
       | https://en.wikipedia.org/wiki/Diachrony_and_synchrony
       | 
       | *Edit: I initially thought that saying 'diachronic change' was
       | like saying 'three-sided triangle'. But thinking about it, I
       | suppose things do change in space but not time, e.g 'the pattern
       | changes abruptly'
        
       | robertlagrant wrote:
       | > Who Lusts for Certainty Lusts for Lies
       | 
       | Well, maybe[0].
       | 
       | [0] with thanks to https://xkcd.com/552
        
       | diogenes4 wrote:
       | At this point I'm waiting for data to show up validating that
       | google ngrams has use.
        
       | taeric wrote:
       | Is this that the n-grams are wrong, or that they are limited in
       | what you can do/say with them? I find the data fun, but I'm not
       | entirely sure what to make of it. You will be doing a query on
       | past books on today's lexicon. Which just feels wrong.
       | 
       | As an easy example that I know, if you search for "the", you will
       | not find a lot of hits. Which, is mostly fair, as historically we
       | know that "th" dropped off around the 1400s. That said, add in
       | "ye" and you see a ton of its use.
       | 
       | Is that an intentional feature of n-grams? Feels more like an
       | encoding mistake passed down through the ages. Would be like
       | getting upset at the great vowel shift and not realizing that our
       | phonetic symbols are not static universal truths.
        
       | bluetomcat wrote:
       | You can never construct a representative image of the past. You
       | are operating with a limited amount of sources which have
       | survived in one form or another. They are not evenly distributed
       | across time and space. There is an inherent "data loss" problem
       | when a person dies - gone are all the impressions, unwritten
       | experiences, familiar smells. Even a living person's memory may
       | not be reliable at one point.
        
         | psychoslave wrote:
         | That's why I always found so strange that only those with
         | fame/wealth distorted social representations ends up with a
         | Wikipedia biography.
        
           | not_knuth wrote:
           | Wikipedia is not meant to be an archive of _all_ information.
           | It 's meant to be an encyclopedia of things that are
           | _notable_ [1], which is probably where the confusion comes
           | from.
           | 
           | As you can imagine, the topic of what notability is, has been
           | discussed at length since Wikipedia's inception [2].
           | 
           | [1] Notability according to Wikipedia
           | https://en.wikipedia.org/wiki/Wikipedia:Notability
           | 
           | [2] Oldest Wikipedia talk comments I could find on Notability
           | https://en.m.wikipedia.org/w/index.php?title=Special:History.
           | ..
        
         | pintxo wrote:
         | At one point? Human memory is surprisingly unreliable.
         | 
         | One example to test for yourself:
         | https://youtu.be/vJG698U2Mvo?si=16fwk8wG8Yyhim5t
        
           | psychoslave wrote:
           | That is not even memory bias here.
           | 
           | Sure, what you pay attention to will impact what you
           | remember, but this experience goes further and show how your
           | attention can be manipulated to be blind to ploted events.
        
             | Miraltar wrote:
             | Exact but the point is still valid. The Mandela Effect is a
             | great example of it.
        
           | ongy wrote:
           | Serious question
           | 
           | Are you supposed to not see the gorilla? I assumed it's the
           | trap and there's some slightly less obvious catch in there.
        
       | djha-skin wrote:
       | The best part of this article is perhaps the following critique
       | of ngrams and by extension their popular use in modern
       | algorithms:
       | 
       | > The text of Etymonline is built entirely from print sources,
       | and is done entirely by human beings. Ngrams are not. They are
       | unreliable, a sloppy product of an ignorant technology, one made
       | to sell and distract, _one never taught the difference between
       | "influence" and "inform."_
       | 
       | > Why are they on the site at all? Because now, online, _pictures
       | win and words lose_. The war is over; they won.
       | 
       |  _One never taught the difference between "influence" and
       | "inform"._ What a scathing rebuke of our modern world and the
       | social media that is part of it. Algorithms that attempt to
       | quantify human speech and interaction and get it wrong most of
       | the time in their quest to maximize their owner's profits.
       | 
       | This somber warning is especially poignant in an age more and
       | more ruled by generative AI, which I'm told is essentially an
       | ngram predictor.
        
         | acyou wrote:
         | Influence and inform are two sides of the same moral coin,
         | where we claim others ideas aren't their own, whereas we are
         | the virtuous informed ones who draw our own conclusions.
         | 
         | The low-pass filter of the mind only allows in what fits
         | somewhere inside the existing framework. If you don't reject
         | something, then being informed by it and being influenced by it
         | are the same thing. In that framework, people who claim to be
         | informed come off as high and mighty and a little lacking in
         | self consciousness.
        
           | gpderetta wrote:
           | I inform, you influence, he propagandizes.
        
         | thrdbndndn wrote:
         | > The text of Etymonline is built entirely from print sources,
         | and is done entirely by human beings. Ngrams are not.
         | 
         | I'm confused about this part actually. I assume by "entirely
         | from print sources" it means it does not include digital
         | sources? That doesn't sound very relevant to the issues
         | mentioned in the article though: unless it uses the "complete"
         | set of _all_ print source, it totally could have the same
         | skewed-dataset issues too; and humans can make the same mistake
         | as OCR does.
        
           | sudobash1 wrote:
           | Etymonline compiles the information on etymology and
           | historical usage from printed books (eg the Oxford English
           | Dictionary). That is what is being referred to here. They are
           | not having humans tally up different words from books. That
           | data is entirely from ngrams.
        
       | crazygringo wrote:
       | The n-grams aren't _wrong_ , but it is a real problem that the
       | underlying corpus distribution changes massively over time (in
       | this case, proportion of academic vs. non-academic works).
       | 
       | This is a really devilish problem with no easy answer.
       | 
       | Because on the one hand, it's certainly easy enough to normalize
       | by genre -- e.g. fix academic works at 20%, popular magazines at
       | 20%, fiction books at 40%, and so forth.
       | 
       | But the problem is that the popularity of genres changes over
       | time separately in terms of supply and demand, as well as
       | consumption of printed material overall. Fiction written might
       | increase while fiction consumed might decrease. Or the
       | consumption of books might decrease as television consumption
       | increases.
       | 
       | So there isn't any objectively "right" answer at all.
       | 
       | But it would be nice if Google allowed you to plot popularity _by
       | genre_ -- I think that would help a lot in terms of determining
       | where and how words become more or less common.
        
       | hyperific wrote:
       | It seems to me that Google Ngram isn't _wrong_. It 's reporting
       | statistics on the words it correctly identified in the corpus.
       | The problem is the context of the statistics. You may somewhat
       | confidently say the word "said" dips in usage at such and such
       | time _in the Google Books corpus_. You can more confidently say
       | it dips at such and such time for the subset of the corpus for
       | which OCR correctly identified every instance of the word. But
       | you can 't make claims in a broader context like "this word
       | dipped in usage at such and such time" without having sufficient
       | data.
        
         | dredmorbius wrote:
         | And this is why _sampling methodology_ is so much more vastly
         | important in drawing inferential population statistics than
         | _sample size_.
         | 
         | Sample 1 million books from an academic corpus, and you'll turn
         | up a very different linguistic corpus than selecting the ten
         | best-selling books for each decade of the 20th century.
        
         | gmd63 wrote:
         | Just as "it depends" is a meme for economists, "need more data"
         | is the galaxy-brain statistician meme.
         | 
         | Until you've solved the grand unified theory, you can never be
         | fully confident in the completeness of your data or statistical
         | inferences.
         | 
         | What's wrong is misleading the public away from this
         | understanding.
        
       | thomasfromcdnjs wrote:
       | Does this criticism of ngrams also translates to keyword trends
       | when considering SEO/SEM?
        
       | andrewflnr wrote:
       | The title is true for a lot more areas of life than linguistics.
       | There are no shortcuts to truth, DVD anyone who tries to offer
       | you one is probably trying to sell you something.
        
         | madsbuch wrote:
         | The title is about certainty and not truth.
         | 
         | > Who Lusts for Certainty Lusts for Lies
         | 
         | I think this is one of the one-liners that sound good, but is
         | bogus at closer inspection.
         | 
         | That articles talks about history. In that context it might
         | make sense as it is hard to say something with certainty.
         | 
         | But in every speech I can say things with certainty without
         | lying.
         | 
         | If we furthermore drag the word certainty out of a philosophers
         | grip and apply a layman meaning to it, then many things are
         | certain as the word can also mean commitment.
        
           | RockyMcNuts wrote:
           | Who demands certainty demands bullshit would be more
           | accurate.
        
           | Delk wrote:
           | I don't think it's bogus.
           | 
           | I've seen people who strongly crave for (a feeling of)
           | certainty prefer simplified categorizations and false
           | absolutes to complexity that doesn't offer absolute certainty
           | and discrete clarity.
           | 
           | Similarly, some things aren't readily quantifiable, and in
           | some cases any quantification might be a great
           | oversimplification at best. In those cases wanting a
           | quantified and measurable answer instead of a more complex
           | answer with less (of a feeling of) certainty can amount to
           | wanting a lie. Or at least to wanting an answer that feels a
           | lot more certain and true than it actually is.
           | 
           | I think that's what the post is about.
           | 
           | Of course the title isn't absolutely true either. Of course
           | you can say and find things that are true and (to a good
           | approximation) certain. But that's not really what the post
           | or its title are trying to say.
        
           | speak_plainly wrote:
           | There's an entire field of study dedicated to these puzzles:
           | epistemology.
           | 
           | https://plato.stanford.edu/entries/certainty/
        
           | AnimalMuppet wrote:
           | In every speech you can say _some_ things with certainty
           | without lying.
           | 
           | But I think the point of the saying is in the other
           | direction. If you are _listening_ to a speech, the things
           | that the speaker can say with certainty may not be the ones
           | where you want certainty. And if you demand certainty on
           | those things, you will find those who will give it to you.
           | But the certainty itself is a lie - that 's why the speaker
           | can't (honestly) say those things with certainty.
           | 
           | What is the optimum political program for the United States?
           | There are plenty of people willing tell you with (apparent)
           | certainty what the answer is. The truth is that nobody knows
           | with certainty, and so the answers that sound certain are
           | lies. The actual program may be correct - _may_ be - but the
           | certainty itself is a lie.
           | 
           | This is often true in linguistics, and history, and politics,
           | and economics. Don't demand certainty where there is none.
        
         | ta8645 wrote:
         | This hits close to home with all the appeals to authority over
         | the last few years. With absolute confidence they were holders
         | of the truth, "trust the science!".
        
           | andrewflnr wrote:
           | Kinda, but most of the anti-scientific bullshit out there is
           | a symptom of precisely this phenomenon. _Actual_ science
           | cannot offer absolute certainly, so people reach for whatever
           | alternate theory offers the feeling of certainty. Blind faith
           | in  "the science" kind of works, and even gets pretty decent
           | practical results, but you know what's structurally really
           | hard to disprove and thus amenable to feeling certain?
           | Conspiracy theories!
        
             | ta8645 wrote:
             | > Conspiracy theories!
             | 
             | I hear what you're saying. In the end, we have to believe
             | _something_ -- on less than perfect information.
             | 
             | But understanding human nature, isn't a conspiracy theory.
             | And accepting obviously overreaching statements of "fact",
             | that literally nobody had the data to state unequivocally,
             | is not following the science.
             | 
             | It wasn't so long ago, that most people understood big
             | pharma was a profit seeking machine, that wasn't primarily
             | motivated by what is best for humanity. Overstating the
             | risks of Covid, and pretending that we faced an existential
             | threat, made everyone forget that truth, and
             | unquestioningly believe that only the purest of intentions
             | motivated the industrial/media response.
        
         | gilleain wrote:
         | What does "DVD anyone" mean?
         | 
         | (Perhaps a roundabout way to say "Make obsolete", as a way to
         | say "Get rid of"?)
        
           | mancerayder wrote:
           | I just can't CD what that means either.
        
             | Tactician_mark wrote:
             | It's a Blu-ray mystery to me.
        
               | psychoslave wrote:
               | It fades away vinyl from my ens.
               | 
               | https://en.wiktionary.org/wiki/ens
        
               | compiler-devel wrote:
               | The redditification of HN is sad. With reddit de facto
               | purging third-party apps with increased API prices, we
               | now see reddit-tier conversations spamming message boards
               | like HN.
        
               | sk0g wrote:
               | https://news.ycombinator.com/newsguidelines.html
               | 
               | > Please don't post comments saying that HN is turning
               | into Reddit. It's a semi-noob illusion, as old as the
               | hills.
        
               | decremental wrote:
               | [dead]
        
           | thechao wrote:
           | Typo insertion where the autocorrect hallucinates a word?
           | Happens to me sometimes...
        
             | andrewflnr wrote:
             | This. Sorry everyone.
        
             | adrianmonk wrote:
             | It's probably supposed to be "and" instead of "DVD". Both
             | words have a similar shape on the keyboard, especially if
             | you're doing swipe-style smartphone keyboard input.
        
       | cainxinth wrote:
       | Agnostics have been saying this for years (jk... sorta).
        
         | guardian5x wrote:
         | You are not wrong there. This title could also be an article
         | about atheism and religion.
        
         | lvass wrote:
         | Surely you meant to write agnostics.
        
           | cainxinth wrote:
           | Corrected it
        
       | ttoinou wrote:
       | The y-axis do not start at zero. So basically the author doesnt
       | know how to read a graph.. what am I missing ?
        
       | dahart wrote:
       | > Ngram says toast almost vanishes from the English language by
       | 1980, and then it pops back up.
       | 
       | The Ngram plot does not say that. It shows usage dropping ~40%
       | (since 1800). It's indeed a problem that the graph Y axis doesn't
       | go to zero, as others have pointed out. But did the etymonline
       | authors really not notice this before declaring incorrectly what
       | it says? I would find that hard to believe (especially
       | considering the subsequent "see, no dip" example that has a zero
       | Y and a small but visible plateau around 1980), and it's ironic
       | considering the hyperbolic and accusatory title and and opening
       | sentence.
        
         | lolinder wrote:
         | The graph axis isn't the only problem. The word "toast" did not
         | drop in usage by 40%, Google's dataset shifted dramatically
         | towards a different genre than it was composed of previously.
         | I've been in conversations with people trying to explain those
         | drops in the 70s, and no one (myself included) realized that it
         | was such a dramatic flaw in the data.
        
           | bee_rider wrote:
           | Is there no way to filter out particular data sets? This
           | seems like a pretty huge limitation.
        
           | dahart wrote:
           | That's fair, the article has a very valid point, which would
           | be made even stronger without the misreading of the plots
           | they're critiquing, whether it was accidental or intentional.
           | I always thought Ngrams were weird too, I remember in the
           | past thinking some of the dramatic shifts it shows were
           | unlikely.
        
       | tantalor wrote:
       | Why the title change?
       | 
       | Title on the site is "Who Lusts for Certainty Lusts for Lies"
       | 
       | Title here is "Google Ngram Viewer n-grams are wrong"
        
         | 0xfae wrote:
         | HN in general doesn't like "editorialized" titles. HN titles
         | are meant to be a factual representation of what you are going
         | read without the attention grabbing (albeit clever) title.
        
           | tantalor wrote:
           | Er no.
           | 
           | > Otherwise please use the original title, unless it is
           | misleading or linkbait; don't editorialize.
           | 
           | The "don't editorialize" guideline is meant for the
           | _submitter_ to not change the the title to make some point.
           | 
           | The site can & should use whatever title it wants. So be it
           | if they want to editorialize. That's their prerogative.
        
             | dredmorbius wrote:
             | Both your and GP comment are inaccurate and/or unclear.
             | 
             | HN _prefers_ but does not _require_ the original title.
             | 
             | HN _does not permit_ submitter editorialising.
             | 
             | Where the original title is clickbait, _which may include
             | editorialising_ , HN requests that submitters change the
             | title, if at all possible to some phrase within the
             | article.
             | 
             | Another de facto rule concerns "title fever", which is when
             | a title is so distracting that it overwhelms the content of
             | the article in discussion.
             | 
             | From the guidelines:
             | 
             |  _If the title includes the name of the site, please take
             | it out, because the site name will be displayed after the
             | link._
             | 
             |  _If the title contains a gratuitous number or number +
             | adjective, we 'd appreciate it if you'd crop it. E.g.
             | translate "10 Ways To Do X" to "How To Do X," and "14
             | Amazing Ys" to "Ys." Exception: when the number is
             | meaningful, e.g. "The 5 Platonic Solids."_
             | 
             |  _Otherwise please use the original title,_ unless it is
             | misleading or linkbait; _don 't editorialize._
             | 
             | <https://news.ycombinator.com/newsguidelines.html>
             | 
             | Some of dang's comments on the issue:
             | 
             | - On changing original title (from yesterday, and NPR to
             | boot): <https://news.ycombinator.com/item?id=37625424>.
             | Also: <https://news.ycombinator.com/item?id=36655892>
             | 
             | - On substituting a phrase from the article: <https://hn.al
             | golia.com/?dateRange=all&page=0&prefix=true&que...>
             | 
             | - On submitter editorialising:
             | <https://news.ycombinator.com/item?id=8357252>
             | <https://news.ycombinator.com/item?id=35163133>
             | 
             | - Distracting titles:
             | <https://news.ycombinator.com/item?id=37137478>.
             | Particularly cases where "the thread will lose its mind":
             | <https://news.ycombinator.com/item?id=22176686>
             | 
             | - "Title fever": (Beginning 4 'graphs in)
             | <https://news.ycombinator.com/item?id=20429573>
        
       | AugustoCAS wrote:
       | I'm going to use that title on the next conversations I have
       | about estimates, in particular in the context of 'we need to know
       | that this piece of work will be started in 4 months and finished
       | in 8'. Those conversations definitely suck for me.
        
         | js8 wrote:
         | Though you should also remember "who lusts for promotion lusts
         | for telling lies".
        
         | CapitalistCartr wrote:
         | Only one goal can be first. If you want to set absolute dates,
         | all other requirements must be subordinate to that. In which
         | case, sure, we can absolutely meet it.
        
           | ChrisMarshallNY wrote:
           | There's that classic poster that you see in almost every auto
           | mechanic's shop.                   Good         Fast
           | Cheap              Pick 2
        
             | nuancebydefault wrote:
             | Not so rarely, you even need to settle for picking 1
        
         | jklinger410 wrote:
         | This title is an absolute banger
        
         | [deleted]
        
         | d-lisp wrote:
         | [flagged]
        
         | gascoigne wrote:
         | Surely if you have story pointed and T-shirt sized your epics
         | correctly that shouldn't be difficult? /s
        
         | dumbfounder wrote:
         | This guy sucks.
        
           | [deleted]
        
         | fenomas wrote:
         | And boo, incidentally, to whomever changed the HN title - from
         | the most memorably evocative title this site has ever seen to
         | one of the blandest.
        
           | etrevino wrote:
           | What was it? I arrived too late.
        
             | fenomas wrote:
             | Sorry, HN previously had TFA's actual title - "Who Lusts
             | for Certainty Lusts for Lies".
        
               | scubbo wrote:
               | I, uhhhh.....I would like to know what TFA is meant to
               | stand for, because I assume it is not "the sucking
               | article", but that was my first thought. Maybe
               | "featured"? Google is only giving me "Teach For America"
               | or "Trade Facilitation Agreement".
        
               | klyrs wrote:
               | Does "fornicating" sound more polite to you?
        
               | iudqnolq wrote:
               | it is the fucking article. or "featured" if you're
               | feeling classy.
        
               | mjochim wrote:
               | I like to read it as The Fine Article.
        
               | idrios wrote:
               | This is the kind of question that doesnt need to be
               | answered with certainty. "The fucking article" is
               | definitely the most fun interpretation of "TFA".
        
               | etrevino wrote:
               | lol, that's pretty good, I agree with you.
        
               | djsavvy wrote:
               | Looks like it's been changed back! What was the "bland"
               | title in the middle?
        
               | Intralexical wrote:
               | "Google Ngram Viewer n-grams are wrong".
        
           | [deleted]
        
           | dahart wrote:
           | The article title is certainly provocative, yes, and that's
           | the problem. Do you want clickbait titles? The article's
           | title is a combination of a platitude, an inaccurate and/or
           | irrelevant statement, and an implied inflammatory accusation.
           | Swapping the title for the more accurate more informational
           | less provocative first line is much better for me, but maybe
           | true that not flinging around the word "lies" could result in
           | fewer clicks.
        
             | fenomas wrote:
             | I don't think "Ngrams are wrong" is what TFA is about. The
             | author isn't an expert on Ngrams and he's not sharing any
             | new information about them; what he's really talking about
             | is how data about language is unreliable, and why Ngram
             | images are on his site even though he knows they're flawed.
             | Personally, I found the original title truer to the article
             | than the current one.
        
             | zem wrote:
             | the word "clickbait" is flung around way too readily these
             | days. a good title is _supposed_ to make you want to read
             | the article, and at its best it is an artistic flourish
             | that enhances the overall piece. and personally, i love
             | that. i enjoy seeing how writers (or editors) come up with
             | good titles, and the fun and interesting ways they relate
             | to the text of the piece. i enjoy when the title is clearly
             | an allusion or reference to something, and chasing it down
             | leads me to learn something new. and i even enjoy when the
             | title is just a pun or play on words, because writers live
             | for moments like that :)
             | 
             | in this case i definitely felt "wow, that's an interesting
             | quote, and i can see what they are getting at. let's read
             | the article to see how it's substantiated or used as a
             | springboard".
             | 
             | clickbait is more "we have some amazing!!!!! information to
             | tell you but to find out what you will have to read the
             | article", e.g. the classic listicle format "10 things we
             | imagined a beowulf cluster of - number 4 will shock you!",
             | the spammy "one weird trick doctors don't want you to know"
             | or the tabloid "john brown's shocking affair!". and yes,
             | that sort of thing is a plague on the internet and i would
             | not like to see more of it, but also that is not what is
             | going on here.
        
           | ComputerGuru wrote:
           | I personally feel like more people will click with this new
           | title. The old one was far too vague and ambiguous for a news
           | aggregation site. I thought the old title would be about
           | scientific papers and trying too hard to get definitive
           | answers out of them.
        
             | dredmorbius wrote:
             | The title and site reward those who'd click through on the
             | original rather than the bland substitute.
        
             | fenomas wrote:
             | Horses for courses, but to me the original title was the
             | forest and the stuff about Ngrams was the trees. As such I
             | found TFA interesting, even though I have no interest in
             | Ngrams or whether they're correct (which is why I
             | definitely would not have clicked on the current title).
        
               | setgree wrote:
               | adding "horses for courses" to my lexicon, TY :)
        
         | 1970-01-01 wrote:
         | At first glance, I thought it was a translated Latin phrase.
         | 
         | desiderat certum, desiderat falsitates
        
       | PaulHoule wrote:
       | Don't like the title, at least for this article.
       | 
       | When it comes to results like this it is more "lusting for
       | clickbait" or the scientific equivalent thereof. (e.g. papers in
       | _Science_ and _Nature_ aren't really particularly likely to be
       | right, but they are particularly likely to be outrageous,
       | particularly in fields like physics that aren't their center)
       | 
       | On the other hand, "Real Clear Poltics" always had a toxic
       | sounding name to me since there is nothing "Real" or "Clear"
       | about poltics: I think the best book about politics is Hunter S.
       | Thompson's _Fear and Loathing on the Campaign Trail '72_ which is
       | a druggie's personal experience following the candidates around
       | and picking up hitchhikers on the road at 3am and getting strung
       | out on the train and having moments of jarring sobriety like the
       | time when he understood the parliamentary maneuvering that won
       | McGovern the nomination while more conventional journalists were
       | at a loss.
       | 
       | What I do know is 20 years from now an impeccably researched book
       | will come out that makes a strong case that what we believed
       | about political events today was all wrong and really it was
       | something different. In the meantime different people are going
       | to have radically different perspectives and... that's the way it
       | is. Adjectives like "real" and "clear" are an attempt to shut
       | down most of those perspectives and pretend one of those
       | viewpoints is privileged. Makes me thing of Baudrillard's
       | thorough shitting on the word "real" in _Simulacra and
       | Simulation_ which ought to completely convince you that people
       | peddling the fake will be heralded by the word "real".
       | 
       | (Or for that matter, that Scientology calls itself the "science
       | of certainty.")
        
         | paulsutter wrote:
         | And it will also be wrong.
         | 
         | > 20 years from now an impeccably researched book will come out
         | that makes a strong case that what we believed about political
         | events today was all wrong and really it was something
         | different
         | 
         | The one good thing about politics is that the motives are
         | crystal clear, politicians want to stay in power first, and
         | only secondarily want to improve things.
         | 
         | Once you know this, everything makes sense. Even if we never
         | find out what "really" happened
        
           | Karellen wrote:
           | > politicians want to stay in power first, and only
           | secondarily want to improve things.
           | 
           | The politicians who want to be in power first, and only
           | secondarily want to improve things, tend to be the
           | politicians in power.
           | 
           | Politicians who want to improve things first do exist, but
           | they tend not to achieve power, because power is not their
           | goal, and they are out-maneuvered by the first type.
           | 
           | Notably, politicians who want to improve things are easily
           | side-tracked by suggesting that their proposed policy is not
           | the best way to improve things, and that some other way would
           | be better. This explains to some degree a lot of infighting
           | on the left, because many do want to genuinely help, but it's
           | never 100% clear what the best way to help is. It also
           | explains why the right can put aside major differences of
           | opinion (2A is important to fight the government who can't be
           | trusted, but support the troops and arm the police!) to
           | achieve power, because acquiring and maintaining power is
           | more important than exactly what you plan to do with it.
        
             | Vt71fcAqt7 wrote:
             | >2A is important to fight the government who can't be
             | trusted, but support the troops and arm the police!
             | 
             | I fail to see the contradiction here. 2A proponents would
             | say that 2A is there for when the government goes wrong, or
             | "when in the Course of human events, it becomes necessary
             | for one people to dissolve the political bands which have
             | connected them with another." At all other times, however,
             | it would be up to the government to enforce the law and
             | protect the people. Destroying the state is a different
             | ideology.
             | 
             | (To be clear, the last few wars may not have been about
             | protecting the people. But that the US has not been
             | attacked since Pearl Harbor may be a result of the
             | investment made in "defence" since then, as well as
             | favourable borders ect.)
             | 
             | In any case 'both sides' have people who people who actualy
             | care about society. And there are people on the left who
             | may simply want power, and complex people who seem to be a
             | bit of both (for example perhaps Lyndon Johnson depending
             | on how you see him).
        
           | bilbo0s wrote:
           | _politicians want to stay in power first, and only
           | secondarily want to improve things._
           | 
           | In all honesty, many don't even want to improve things. Most
           | people with power, love power. It's contrary to their nature
           | to change a system that confers power to themselves. That's
           | not just in your own, but in any nation, the people in power
           | will be resistant to change.
        
           | PaulHoule wrote:
           | That's as close as you will get to a master narrative but it
           | isn't all of it.
           | 
           | Politicians aren't always sure what will win for them, often
           | face a menu of unappetizing choices and have other
           | motivations too. (Quite a few of the better Republicans have
           | quit in disgust in the last decade: I watched the pope speak
           | in front of congress flanked by Joe Biden, then VP and John
           | Boehner, then House Speaker when the pope obliquely said they
           | should start behaving like adults and then Boehner quit a few
           | days later and got into the cannabis business.)
           | 
           | I was an elected member of the state committee of the Green
           | Party of New York and found myself arguing against a course
           | of action that I emotionally agreed with, thought was a
           | tactical mistake, and that my constituents were (it turns out
           | fatally) divided about. It was a strategic disaster in the
           | end.
        
             | paulsutter wrote:
             | You're right, I should have added that politics is also
             | extremely difficult and filled with unpalatable choices.
             | Each of the politicians I have met are intelligent, caring
             | people with a clear grasp of the issues.
             | 
             | And then you see what they do, and you wonder, what the...
        
       | phkahler wrote:
       | Classic mistake of not including zero on the vertical axis of a
       | graph. If you're thinking "but then there won't be so much
       | variation" you're right. Leaving zero off allows small variations
       | to look large.
        
         | mattkrause wrote:
         | Am I alone in thinking that the graph was okay and the text was
         | just indulging in a bit of hyperbole?
         | 
         | It's a sudden ~50% dip, following nearly a century of apparent
         | stability.
        
         | PaulHoule wrote:
         | On the other hand there are the cases where you do want to
         | emphasize small variations. In a control chart showing the fill
         | weight of cereal boxes you certainly don't want zero on the
         | chart. Neither do you want to plot daily temperatures in a city
         | on a chart that includes 0 Kelvin.
        
           | hef19898 wrote:
           | Sure you do, why not? If you don't, show the deviation values
           | (plus and minus) centered around zero again.
        
             | PaulHoule wrote:
             | Not if it means the line looks flat.
        
               | slenk wrote:
               | Sometimes the data is flat...
        
               | thfuran wrote:
               | And many times small variations matter.
        
               | slenk wrote:
               | Yes, the CMB for instance.
        
               | PaulHoule wrote:
               | It sure feels like the temperature in Upstate NY varies
               | by more than 10%!
        
           | Scubabear68 wrote:
           | Exactly. A lot of investment market charts are zoomed in like
           | that because small deviations can matter a lot, and you don't
           | want the base price (or whatever measure you're looking at)
           | to swamp the signal.
        
         | lolinder wrote:
         | Including zero would have helped the "said" graph but not
         | solved it--it just would still look like "said" dropped to
         | almost 1/3 of its prior popularity, when what actually happened
         | is the makeup of the sample changed dramatically.
        
       | jgalt212 wrote:
       | The words of Colonel Nathan R. Jessup come to mind.
        
       ___________________________________________________________________
       (page generated 2023-09-26 23:00 UTC)