hngopher.com

       [HN Gopher] Word2Vec received 'strong reject' four times at ICLR...
       ___________________________________________________________________
        
       Word2Vec received 'strong reject' four times at ICLR2013
        
       Author : georgehill
       Score  : 282 points
       Date   : 2023-12-18 16:48 UTC (6 hours ago)
        
 (HTM) web link (openreview.net)
 (TXT) w3m dump (openreview.net)
        
       | cs702 wrote:
       | Surely those seemingly smart anonymous reviewers now feel pretty
       | dumb in hindsight.
       | 
       | Peer review does _not_ work for new ideas, because _no one ever_
       | has the time or bandwidth to spend hours upon hours upon hours
       | trying to understand new things.
        
         | mempko wrote:
         | This is not the takeaway I got. The takeaway I got was the
         | review process improved the paper and made it more rigorous.
         | How is that a bad thing? But yes, sometimes reviewers are
         | focusing on different issues instead of 'is this going to
         | revolutionize A, B, and C'.
        
           | huijzer wrote:
           | I currently have a paper under review (first round) that was
           | submitted the 2nd of August. This is at the second journal.
           | The first submission was a few months before that.
           | 
           | I'm not sure peer review makes things more rigorous, but it
           | surely makes it more slow.
        
         | IKantRead wrote:
         | It's worth pointing out that most of the best science happened
         | before peer review was dominant.
         | 
         | There's an article I came across awhile back, that I can't
         | easily find now, that basically mapped out the history of our
         | current peer review system. Peer review as we know it today was
         | largely born in the 70s and a response to several funding
         | crises in academia. Peer review was a strategy to make research
         | appear more credible.
         | 
         | The most damning critique of peer-review of course is that it
         | completely failed to stop (and arguably aided) the
         | reproducibility crisis. We have an academic system where the
         | prime motivation is the secure funding through the image of
         | credibility, which from first principles is a recipe for wide
         | spread fraud.
        
           | hnfong wrote:
           | Peer review is basically Github anonymous PRs that has the
           | author pinky swear that the code compiles and 95% of test
           | cases pass.
           | 
           | Academic careers are then decided by the Github activity
           | charts.
        
             | MichaelZuo wrote:
             | The whole 'pinky swear' aspect is far from ideal.
             | 
             | But is there an alternative that still allows most academic
             | aspirants to participate?
        
               | abrichr wrote:
               | > Github
        
               | MichaelZuo wrote:
               | Do you understand what the parent is saying? It's clearly
               | an analogy, not a literal recommendation for all
               | academics to use Github.
        
               | abrichr wrote:
               | I understand, thank you for clarifying :)
               | 
               | My point was that academics could use Github (or
               | something like it)
        
               | MichaelZuo wrote:
               | Can you write out the argument for it, or why you believe
               | it to be a net positive change compared to the current
               | paradigm?
        
           | HarHarVeryFunny wrote:
           | It seems kind of obvious that peer review is going to reward
           | peer think, peer citation, and academic incremental advance.
           | Obviously that's not how innovation works.
        
             | fatherzine wrote:
             | the system, as flawed as it is, is very effective for its
             | purpose. see eg "success is 10% inspiration and 90%
             | perspiration". on a darker side, the purpose is not to be
             | fair to any particular individual, or even to be conducive
             | to human flourishing at large.
        
               | HarHarVeryFunny wrote:
               | yes - maybe a good filter for future _academic_ success,
               | which seems to be a game unto itself
        
           | fl7305 wrote:
           | Have they done a double-blind test on the peer review system?
        
           | ribosometronome wrote:
           | >It's worth pointing out that most of the best science
           | happened before peer review was dominant.
           | 
           | It's worth pointing out that most of everything happened
           | before peer review was dominant. Given how many advances
           | we've made in the past 50 years, so I'm not super sure
           | everyone would agree with your statement. If they did, they'd
           | probably also agree that most of the worst science also
           | happened before peer review was dominant, too, though.
        
             | jovial_cavalier wrote:
             | Our advances in the last 50 years have largely been in
             | engineering, not science. You could probably take a random
             | physics professor from 1970 and they'd not sweat too much
             | trying to teach physics at the graduate level today.
        
               | telotortium wrote:
               | But a biology professor from that time period would have
               | a lot of catching up to do, perhaps too much, especially
               | (but not only) if any part of their work touched
               | molecular biology or genetics.
        
           | ska wrote:
           | > It's worth pointing out that most of the best science
           | happened before peer review was dominant.
           | 
           | This seems unlikely to be true, simply given the growth. If
           | you are arguing that the SNR ratio was better, that's
           | different.
        
           | ikesau wrote:
           | You might be thinking of Adam Mastroianni's essays on the
           | subject:
           | 
           | https://www.experimental-history.com/p/the-rise-and-fall-
           | of-... https://www.experimental-history.com/p/the-dance-of-
           | the-nake...
        
           | cs702 wrote:
           | You're probably thinking of this article:
           | 
           | https://www.experimental-history.com/p/the-rise-and-fall-
           | of-...
        
           | smcin wrote:
           | But there is zero reason why the definition of peer review
           | hasn't immediately been extended to include:
           | 
           | - accessing and verifying the datasets (in some tamper-proof
           | mechanism that has an audit trail). Ditto the code. This
           | would have detected the Francesca Gino and Dan Ariely alleged
           | frauds, and many others. It's much easier in domains like
           | behavioral psychology where the dataset size is spreadsheets
           | << 1Mb instead of Gb or Tb.
           | 
           | - picking a selective sample of papers to check
           | reproducibility on; you can't verify all submissions, but you
           | sure could verify most accepted papers, also the top-1000
           | most cited new papers each year in each field, etc. This
           | would prevent the worst excesses.
           | 
           | PS a superb overview video [0] by Pete Judo "6 Ways
           | Scientists Fake Their Data" (p-hacking, data peeking,
           | variable manipulation, hypothesis-shopping and selectively
           | choosing the sample, selective reporting, also questionable
           | outlier treatment). Based on article [1]. Also as Judo
           | frequently remarks, there should be much more formal
           | incentive for publishing replication studies and negative
           | results.
           | 
           | [0]: https://www.youtube.com/watch?v=6uqDhQxhmDg
           | 
           | [1]: "Statisics by Jim: What is P Hacking: Methods & Best
           | Practices" https://statisticsbyjim.com/hypothesis-
           | testing/p-hacking/
        
         | sdenton4 wrote:
         | I have been deeply unimpressed with the ML conference track
         | this last year... There's too many papers, too few reviewers,
         | leading to an insane number of PhD student-reviewers. We've
         | gotten some real nonsense reviews, with some real sins against
         | the spirit of science baked into them.
         | 
         | For example, a reviewer essentially insisting that nothing is
         | worth publishing if it doesn't include a new architecture idea
         | and SOTA results... God forbid we better understand and
         | simplify the tools that already exist!
        
         | mrguyorama wrote:
         | Peer review isn't about the validity of your findings and the
         | reviewers are not tasked with evaluating the findings of the
         | researchers. The point is to be a light filter to make sure a
         | published paper has the necessary information and rigor for
         | someone else to try to replicate your experiment or build off
         | of your findings. Those are the processes for evaluating the
         | correctness of the findings.
        
         | narrator wrote:
         | Do they do anything different in other countries, or is it just
         | a copy of the U.S system?
        
         | andreyk wrote:
         | I have finished a PhD in AI just this past year, and can assure
         | you there exist reviewers who spend hours per review to do it
         | well. It's true that these days it's often the case that you
         | can (and are more likely than not to) get unlucky with lazier
         | reviewers, but that does not appear to have been the case with
         | this paper.
         | 
         | For example just see this from the review of f5bf:
         | 
         | "The main contribution of the paper comprises two new NLM
         | architectures that facilitate training on massive data sets.
         | The first model, CBOW, is essentially a standard feed-forward
         | NLM without the intermediate projection layer (but with weight
         | sharing + averaging before applying the non-linearity in the
         | hidden layer). The second model, skip-gram, comprises a
         | collection of simple feed-forward nets that predict the
         | presence of a preceding or succeeding word from the current
         | word. The models are trained on a massive Google News corpus,
         | and tested on a semantic and syntactic question-answering task.
         | The results of these experiments look promising.
         | 
         | ...
         | 
         | (2) The description of the models that are developed is very
         | minimal, making it hard to determine how different they are
         | from, e.g., the models presented in [15]. It would be very
         | helpful if the authors included some graphical representations
         | and/or more mathematical details of their models. Given that
         | the authors still almost have one page left, and that they use
         | a lot of space for the (frankly, somewhat superfluous)
         | equations for the number of parameters of each model, this
         | should not be a problem."
         | 
         | These reviews in turn led to significant (though apparently not
         | significant enough) modifications to the paper (https://openrev
         | iew.net/forum?id=idpCdOWtqXd60&noteId=C8Vn84f...). These were
         | some quality reviews and the paper benefited from going this
         | review process, IMHO.
        
         | canjobear wrote:
         | The issue here wasn't that the reviewers couldn't handle a new
         | idea. They were all very familiar with word embeddings and ways
         | to make them. There weren't a lot a of new concepts in
         | word2vec, what distinguished it was that it was simple, fast,
         | and good quality. The software and pretrained vectors were easy
         | to access and use compared to existing methods.
        
       | magnio wrote:
       | There are more details in the FB post of Tomas Mikolov (author of
       | word2vec) recently:
       | https://www.facebook.com/share/p/kXYaYaRvRCr5K2Ze
       | 
       | A hilarious and poignant point I see is how experts make mistake
       | too. Quote:
       | 
       | > I also received a lot of comments on the word analogies - from
       | "I knew that too but forgot to publish it!" (Geoff Hinton, I
       | believe you :) happens to everyone, and anyways I think everybody
       | knows what the origin of Distributed Representations is) to "it's
       | a total hack and I'm sure it doesn't work!" (random guys who
       | didn't bother to read the papers and try it out themselves -
       | including Ian Goodfellow raging about it on Twitter).
        
         | nybsjytm wrote:
         | I tried asking on another thread what Goodfellow rage he's
         | referring to since all I could find was this:
         | https://twitter.com/goodfellow_ian/status/113352818965167718...
         | 
         | If so, frankly I think it makes Mikolov sound pretty insecure.
        
           | lern_too_spel wrote:
           | Twitter no longer shows threads to people who aren't logged
           | in.
           | https://nitter.net/goodfellow_ian/status/1133528189651677184
        
         | imjonse wrote:
         | That post sounds like a rant TBH with too many stabs at various
         | people. It could have been a lot more graceful. OTOH I can
         | believe most researchers are human and do not put the progress
         | of shared knowledge first but are very much influenced by ego
         | and money *cough* OpenAI *cough*
        
           | ReactiveJelly wrote:
           | To err is human, to seek profit is common to all lifeforms
        
         | albertzeyer wrote:
         | Also, Tomas says he came up with the encoder-decoder (seq-to-
         | seq) idea, and then Ilya and Quoc took over the idea after
         | Tomas moved on to Facebook.
         | 
         | However, there is another statement by Quoc, saying this is not
         | true: https://twitter.com/quocleix/status/1736523075943125029
         | 
         | > We congratulate Tomas on winning the award. Regarding
         | seq2seq, there are inaccuracies in his account. In particular,
         | we all recall very specifically that he did not suggest the
         | idea to us, and was in fact highly skeptical when we shared the
         | end-to-end translation idea with him. Indeed, we worked very
         | hard to make it work despite his skepticism.
         | 
         | So, word against word. I'm not accusing anyone of lying here,
         | one of them probably just misremembers, but this leaves also a
         | somewhat bad taste.
        
           | minwcnt5 wrote:
           | I think this must happen all the time. As they say, ideas are
           | cheap. It's likely that ALL of them had the seq-to-seq idea
           | cross their mind at some point before it was acted on, so if
           | credit is assigned to whoever said it out loud first, there's
           | going to be disagreement, since most people don't remember
           | the full details of every conversation. It's also possible
           | for someone to be skeptical of their own idea, so that
           | argument isn't compelling to me either. Ultimately credit
           | usually goes to the people who do the hard work to prove out
           | the idea, so it seems like the system worked as intended in
           | this case.
        
           | oldesthacker wrote:
           | This is what Tomas Mikolov said on Facebook:
           | 
           | > I wanted to popularize neural language models by improving
           | Google Translate. I did start collaboration with Franz Och
           | and his team, during which time I proposed a couple of models
           | that could either complement the phrase-based machine
           | translation, or even replace it. I came up (actually even
           | before joining Google) with a really simple idea to do end-
           | to-end translation by training a neural language model on
           | pairs of sentences (say French - English), and then use the
           | generation mode to produce translation after seeing the first
           | sentence. It worked great on short sentences, but not so much
           | on the longer ones. I discussed this project many times with
           | others in Google Brain - mainly Quoc and Ilya - who took over
           | this project after I moved to Facebook AI. I was quite
           | negatively surprised when they ended up publishing my idea
           | under now famous name "sequence to sequence" where not only I
           | was not mentioned as a co-author, but in fact my former
           | friends forgot to mention me also in the long Acknowledgement
           | section, where they thanked personally pretty much every
           | single person in Google Brain except me. This was the time
           | when money started flowing massively into AI and every idea
           | was worth gold. It was sad to see the deep learning community
           | quickly turn into some sort of Game of Thrones. Money and
           | power certainly corrupts people...
           | 
           | Reddit post: "Tomas Mikolov is the true father of sequence-
           | to-sequence" https://www.reddit.com/r/MachineLearning/comment
           | s/18jzxpf/d_...
        
           | jll29 wrote:
           | Typical, saying they had the idea first without putting it on
           | the blockchain to prove the time stamp!
        
           | GartzenDeHaes wrote:
           | "Success has a thousand mothers, but failure is an orphan"
        
         | jncfhnb wrote:
         | To be fair I have some memories of the papers and surrounding
         | tech being really bad. The popular implementations didn't
         | actually do what the papers said and the tech wasn't great for
         | anything beyond word level comparisons. You got some juice
         | doing tf idf weighting of specific words but then tf idf
         | weighted bag of words was similarly powerful.
         | 
         | Cosine similarity of the sum of different word vectors sounds
         | soooo dumb nowadays imo
        
         | iab wrote:
         | I wrote a detailed proof years before Mikolov on Twitter, but
         | the 280 characters were too small to contain it
        
       | lupusreal wrote:
       | Boiled down to the core essence, science is about testing ideas
       | to see if they work. Peer review is not part of this process,
       | they rarely if ever attempt replication during the peer review
       | process; and so they inevitably end up rejecting new ideas
       | without even trying them. This isn't science.
        
         | marginalia_nu wrote:
         | Thankfully you can just keep submitting the same paper to
         | different journals until someone is too busy to read it and
         | just approves it blindly. The academic publication shitshow
         | giveth and the academic publication shitshow taketh away.
        
           | H8crilA wrote:
           | The futile quest for algorithmification of truth, and the
           | loopholes that make the system work again despite having an
           | impossible objective in the first place. Couldn't have been
           | any different - in fact we can take this as evidence that AI
           | has not taken over science yet.
        
       | nybsjytm wrote:
       | I think the reviewers did a good job; the reviews are pretty
       | reasonable. Reviews are supposed to be about the quality of a
       | paper, not how influential they might be in the future! And not
       | all influential papers are actually very good.
        
         | lainga wrote:
         | "The eight-legged essay was needed for those candidates in
         | these civil service tests to show their merits for government
         | service... structurally and stylistically, the eight-legged
         | essay was restrictive and rigid. There are rules governing
         | different sections of the essay, including restrictions on the
         | number of sentences in total, the number of words in total, the
         | format and structure of the essay, and rhyming techniques."
         | 
         | https://en.wikipedia.org/wiki/Eight-legged_essay#Viewpoints
        
           | nybsjytm wrote:
           | I guess your comment is against the restrictive and rigid
           | idea that peer review should be about making research papers
           | more intellectually rigorous?
        
           | hnfong wrote:
           | That's the medieval equivalent of leetcode.
           | 
           | The problems that the imperial Chinese government had to
           | solve was pretty much the same as the problem the Big Tech
           | companies are trying to solve with leetcode.
           | 
           | In earlier times, it used to be that the exams were more
           | freestyle, but when tens/hundreds of thousands of people
           | compete for a handful of high civil service positions, people
           | are motivated to cheat by memorizing essays pre-written by
           | somebody else. And the open-ended questions had subjective
           | answers that didn't scale. So they basically gamified the
           | whole thing.
        
             | fl7305 wrote:
             | They might not be perfect employees, but at least you know
             | they are smart, are disciplined, and have the capacity to
             | work hard for a long period.
        
               | lainga wrote:
               | And won't discuss salaries with each other :)
               | 
               | https://en.wikipedia.org/wiki/Song_official_headwear
        
               | wahnfrieden wrote:
               | Sounds like a hazing ritual's outcome
        
               | sgift wrote:
               | Exactly what it is. "We had to go through it, so they'll
               | have to too!" plus "Someone who has so little self-
               | respect that they do _this_ will do _anything_ we ask of
               | them. "
               | 
               | (I know that some people genuinely like Leetcode and
               | that's totally fine. But that's not why company want
               | people to do it)
        
               | lazide wrote:
               | If you think that's bad, wait until you hear about
               | medschool / nursing.
        
               | smcin wrote:
               | Standardized interviews or panels do not necessarily
               | exist to find the best candidate. They exist as a
               | tradeoff to ensure some measure of objectivity and
               | prevent favoritism/influence/corruption/bribery/forgery/i
               | mpersonation/unfair cheating by getting advance access to
               | the test material; in such a way that this can then be
               | verified, standardized, audited at higher levels or
               | nationally. Even more important for medschool/nursing
               | than engineering.
               | 
               | One of countless examples was the sale of 7600 fake
               | nursing transcripts and diplomas in 2020/1 by three south
               | Florida nursing schools [0]. (This happened in the middle
               | of Covid, and at schools which were already being
               | deaccredited.)
               | 
               | Buyers paid $10-15K to obtain fake diplomas and
               | transcripts indicating that they had earned legitimate
               | degrees, like a (two-year) associate degree in nursing;
               | these credentials then allowed the buyers to qualify for
               | the national nursing board exam (NCLEX). About 37% of
               | those who bought the fake documents -- or about 2,800
               | people -- passed the exam. (Compare to candidates holding
               | a bachelor's degree in nursing (BSN) reportedly typically
               | pass at 90% compared to 84% for those with an associate
               | degree in nursing (ADN)).
               | 
               | Among that 2700, a "significant number" then received
               | nursing licenses and secured jobs in unnamed hospitals
               | and other health care settings in MD, NY, NJ, GA.
               | 
               | [0]: "7,600 Fake Nursing Diplomas Were Sold in Scheme,
               | U.S. Says" https://web.archive.org/web/20230928151334/htt
               | ps://www.nytim...
        
               | lazide wrote:
               | I meant more that a massive part of the experience is
               | hazing used to filter for less obvious criteria, but that
               | is also good info!
        
               | smcin wrote:
               | Right, sure. But I was saying it isn't by any means only
               | the candidates that we want to guard against misconduct
               | or lack of objectivity, or their schools, but the
               | interviewers/panelists/graders/regulators themselves.
               | 
               | Hazing is just an unfortunately necessary side-effect of
               | this.
        
               | fl7305 wrote:
               | Are you saying "smart, are disciplined, and have the
               | capacity to work hard for a long period" have no bearing
               | on doing a good job?
        
               | wahnfrieden wrote:
               | No
        
               | neilv wrote:
               | For Leetcode, this is one of the typical
               | rationalizations.
               | 
               | It's something a rich kid would come up with if they'd
               | never done real work, and were incapable of recognizing
               | it, but they'd seen body-building, and they decided
               | that's what everyone should demonstrate as the
               | fundamentals, and you can't get muscles like that without
               | being good at work.
               | 
               | And of course, besides the flawed theory, everyone
               | cheated at the metric.
               | 
               | But other rich kids had more time and money for the non-
               | work metrics-hitting, so the rich kid was happy they were
               | getting "culture fit".
        
               | fl7305 wrote:
               | The ancient Chinese exams were the exact opposite of what
               | you describe.
               | 
               | The Chinese rulers realized they had a problem where
               | incompetent rich kids got promoted to important
               | government jobs. This caused the government and therefore
               | society to function poorly. As a result of this, many
               | people died unnecessarily.
               | 
               | To combat this, the Chinese government instituted very
               | hard entrance exams that promoted competent applicants
               | regardless of rich parents.
        
               | alternative_a wrote:
               | The book _Ancient China in Transition - An analysis of
               | Social Mobility, 722-222 BC_ (Cho-yun Hsu, Stanford
               | University Press, 1965) discusses this transition in
               | rather great detail.
               | 
               | https://www.cambridge.org/core/journals/journal-of-asian-
               | stu...
        
               | refulgentis wrote:
               | You're 100% right. Gave me a big, big smile after 7 years
               | at Google feeling like an alien, I was as a college
               | dropout from nowhere with nothing and nobody, but a
               | successful barely-6-figure exit at 27 years old.
               | 
               | Big reason why the FAANGs get dysfunctional too. You
               | either have to be very, very, very, fast to fire, almost
               | arbitrarily (Amazon) or you end up with a bunch of people
               | who feel safe enough to be trying to advance.
               | 
               | The "rich kids" w/o anything but college and FAANG were
               | taught Being Visible and Discussing Things and Writing
               | Papers is what "hard work" is, so you end up with a bunch
               | of people building ivory towers to their own intellect
               | (i.e. endless bikeshedding and arguing and design docs
               | and asking for design docs) and afraid of anyone around
               | them who looks like they are.
        
               | choppaface wrote:
               | Have been on a few panels where candidate passes all
               | leetcodes and then turned out to be very poor on the job
               | with in one case worst "teamwork" I've witnessed. These
               | were not FANG jobs though so might be more viable at a
               | larger company where it's ok to axe big projects, have
               | duplicated work, etc. leetcode is just one muscle and
               | many jobs require more than one muscle.
        
               | fl7305 wrote:
               | > then turned out to be very poor on the job with in one
               | case worst "teamwork" I've witnessed
               | 
               | Which was what I meant by "might not be perfect
               | employees".
               | 
               | > many jobs require more than one muscle
               | 
               | Sure. But high intelligence, discipline and a capacity
               | for a high level of sustained effort is a good start.
        
             | godelski wrote:
             | And it's important to recognize the advantages and
             | disadvantages to ensure that we have proper context.
             | 
             | For example, leetcode may be very appropriate for those
             | programming jobs which are fairly standard. At every job
             | you don't need to invent new things. Industrialization was
             | amazing because of this standardization and ability to mass
             | produce (in a way, potentially LLMs can be this for code.
             | Not quite there yet but it seems like a reasonable
             | potential).
             | 
             | But on the other hand, there are plenty of jobs where there
             | are high levels of uniqueness and creativity and innovation
             | dominate the skills of repetition and regurgitation. This
             | is even true in research and science, though I think
             | creativity is exceptionally important.
             | 
             | The truth is that you need both. Often we actually need
             | more of the former than the latter, but both are needed.
             | They have different jobs. The question is more about the
             | distribution of these skillsets that you need to accomplish
             | your goals. Too much rigidity is stifling and too much
             | flexibility is chaos. But I'd argue that in the centuries
             | we've better learned to wade through chaos and this is one
             | of the unique qualities that makes us human. To embrace the
             | unknown while everything in us fights to find answers, even
             | if they are not truth; because it is better to be ruled by
             | a malicious but rational god than the existential chaos.
        
               | sevagh wrote:
               | >But on the other hand, there are plenty of jobs where
               | there are high levels of uniqueness and creativity and
               | innovation dominate the skills of repetition and
               | regurgitation. This is even true in research and science,
               | though I think creativity is exceptionally important.
               | 
               | Those companies still use leetcode for those positions.
               | It's just a blanket thing at this point.
        
               | godelski wrote:
               | Yes, and I think it is dumb. I'm personally fed up with
               | how much we as a society rely on metrics for the sake of
               | metrics. I can accept that things are difficult to
               | measure and that there's a lot of chaos. Imperfection is
               | perfectly okay. But I have a hard time accepting willful
               | ignorance, acting like it is objective. I'm sure I am
               | willfully ignorant many times too, but I think my ego
               | should take the hit rather than continue.
        
         | gms7777 wrote:
         | I agree. My own most influential paper received strong rejects
         | the first time we submitted it, and rightfully so, I think. In
         | retrospect, we didn't do a good job motivating it, the
         | contributions weren't clearly presented, and the way we
         | described was super confusing. I'm genuinely grateful for it
         | because the paper that we eventually published is so much
         | better (although the core of the idea barely changed), and it's
         | good because of the harsh reviews we received the first time
         | around. The reviews themselves weren't even particularly
         | "insightful", mostly along the lines of "this is confusing, I
         | don't understand what you're doing or why you're doing it", but
         | sometimes you just really need that outside perspective.
         | 
         | I've also reviewed and rejected my share of papers where I
         | could tell there is a seed of a great idea, but the paper as
         | written just isn't good. It always brings me joy to see those
         | papers eventually published because they're usually so much
         | better.
        
           | KittenInABox wrote:
           | > The reviews themselves weren't even particularly
           | "insightful", mostly along the lines of "this is confusing, I
           | don't understand what you're doing or why you're doing it",
           | but sometimes you just really need that outside perspective.
           | 
           | IMO maybe scientists should have experience critiquing stuff
           | like poems, short essays, or fiction. Expecting a critiquer
           | to give actually good suggestions matching your original
           | vision, when your original vision's presentation is flawed,
           | is incredibly rare. So the best critiques are usually a "this
           | section right here, wtf is it?" style, with added bonus
           | points to "wtf is this order of information" or other
           | literary technique that is either being misused or unused.
        
             | gms7777 wrote:
             | Oh, I do completely agree and didn't mean to imply
             | otherwise. I have had experiences where reviewers have
             | given me great ideas for new experiments or ways to present
             | things. But the most useful ones usually are the "wtf?"
             | type comments, or comments that suggest the reviewers
             | completely misunderstood or misread the text. While those
             | are initially infuriating, the reviewers are usually among
             | the people in the field that are most familiar with the
             | topic of the paper--if they don't understand it or misread
             | it, 95% of the time it's because it could be written more
             | clearly.
        
           | 0xDEAFBEAD wrote:
           | This is the first time I ever saw a scientist say something
           | positive about peer review
        
             | ska wrote:
             | Eh, happens all the time. It's an extremely rare paper that
             | isn't improved by th e process (though it's also a pain
             | sometimes, and clueless/antagonistic reviewers do happen)
        
             | jll29 wrote:
             | I haven't seen a manuscript that could not made a better
             | paper through peer review.
             | 
             | Now there are good and bad reviewers, and good and bad
             | reviews. However, because you usually get assigned three
             | reviewers, the chance that there is not at least one good
             | reviewer or at least a good review from a middle to bad
             | reviewer is not that low, which means if you get over the
             | initial "reject" decision disappointment, you can benefit
             | from that written feedback. The main drawback is the loss
             | of time if a rejection means you may lose a whole year
             | (only for conferences, and only if you are not willing to
             | compromise by going to a "lower" conference after rejection
             | by a top one).
             | 
             | I have often tried to fight for a good paper, but if the
             | paper is technically not high quality, even the most
             | original idea usually gets shot down, because top
             | conferences cannot afford to publish immature material for
             | reputational reasons. That's what happened to the original
             | Brin & Page Google/PageRank paper, which was submitted to
             | SIGIR and rejected. They dumped it to the "Journal of ISDN
             | Systems" (may this journal rest in peace, and with it all
             | ISDN hardware), and the rest is history. As the parent
             | says, you want to see people succeed, and you want to give
             | good grades (except in my experience many first year
             | doctoral students are often a bit too harsh with their
             | criticism).
        
         | johnfn wrote:
         | Don't you think something is missing if we've defined "quality"
         | as an independent and uncorrelated characteristic from
         | importance or influentiality?
        
           | SubiculumCode wrote:
           | When an author refuses to address reasonable questions by the
           | reviewers, what should you expect? There were legitimate
           | questions and concerns about potential alternative
           | explanations for the increase in accuracy raised by the
           | reviewers, and the authors didn't play ball.
        
           | nicklecompte wrote:
           | No, because "quality" means two different things here. I
           | believe the main reason word2vec became important was purely
           | on the software/engineering side, not because it was
           | scientifically novel. Advancements in Python development,
           | especially good higher-level constructs around numerical
           | linear algebra, meant that the fairly shallow and simple
           | tools of word2vec were available to almost every tech
           | company. So I don't think word2vec was (or is) particularly
           | good _research_ , but it became good _software_ for reasons
           | beyond its own control
           | 
           | In 2016 or so it was proven that word2vec is equivalent to
           | the pointwise mutual information between the words in its
           | training set, after doing some preprocessing. This means that
           | Claude Shannon had things mostly figured out in the 60s, and
           | some reviewers were quite critical of the word2vec paper for
           | not citing similar developments in the 70s.
        
           | nybsjytm wrote:
           | Quality in the sense I meant it (cogency and intellectual
           | depth/rigor) should certainly be correlated with importance
           | and influence!
        
           | godelski wrote:
           | Yes and no. I think the larger issue is about the ambiguity
           | of what publications mean and should be. There's a lot of
           | ways to optimize this, and none of those has optimal
           | solutions. I don't think you should be down voted for your
           | different framing because I think we just need to be more
           | open about this chaos and consider other values than our own
           | or the road we're on. I think it is very unclear what we are
           | trying to optimize and it is quite apparent that you're going
           | to have many opinions on this and your group of reviewers may
           | all seek to optimize different things. The only solution I
           | can see is to stop pretending as if we know what each other
           | is trying to do and be a bit more explicit about it. Because
           | if we argue based on different assumptions we'll talk past
           | one another if we assume the other is working on the same set
           | of assumptions.
        
           | dr_kiszonka wrote:
           | I believe many journals focusing on potentially influential
           | papers is why we have a reproducibility crisis. Since it is
           | very hard to publish null results, people often don't even
           | bother trying. This leads to tons of wasted effort as
           | multiple groups attempt the same thing not knowing that
           | others before them have failed.
           | 
           | Also, predicting whether a paper will be influential is very
           | hard and inherently subjective, unless you are reviewing
           | something truly groundbreaking. Quality-based reviews are
           | also subjective, but less so.
        
         | SubiculumCode wrote:
         | This is the right take, despite how some might will want to
         | frame it as 'reviewers are dummies'.
        
         | godelski wrote:
         | > I think the reviewers did a good job
         | 
         | I actually disagree, but maybe not for the reasons you're
         | expecting. I actually disagree because the reviews are
         | unreasonably reasonable. They are void of context.
         | 
         | It's tough to explain, but I think it's also something every
         | person who has written research papers can understand. How
         | there's a big bias between reading and writing and how our
         | works are not written to communicate our works as best as
         | possible, but in effect how to communicate to reviewers that
         | they should accept our works. The subtle distinction is
         | deceptively large and I think we all could be a bit more honest
         | about the system. After all, we want to optimize it, right?
         | 
         | The problem I see is all a matter of context. Good ideas often
         | appear trivial once we see them. Often we fool ourselves into
         | thinking that we already knew this, but do not have meaningful
         | evidence that this is true but may try to reason that x = y + z
         | + epsilon, but almost any idea can be framed that way, even
         | breakthroughs like Evolution, Quantum Mechanics, or Relativity.
         | It is because we look back at giants from a distance but when
         | looking at works now don't see giants, but a bunch of children
         | standing on one another's shoulders standing in a trench coat.
         | That is the reality of it all. That few works are giant leaps
         | and bounds, but rather incremental. The ones that take the
         | biggest leaps are rare, often rely on luck (ambiguous
         | definition), and frequently (but not always, especially
         | considering the former) take a lot of time. Something we
         | certainly don't often have.
         | 
         | We're trained as scientists and that means to be trained in
         | critiquing systems and letting questions spiral. Sometimes the
         | spiraling of questions shows how absurd an idea is but other
         | times it can show how ingenious it is. It's easier to recognize
         | the former but often hard to distinguish the latter. It is
         | always easy to ask for more datasets, more experiments, and
         | such, but these are critiques that can apply to any work as no
         | work is complete. This is especially true in cases of larger
         | breakthroughs, because any paradigm shift (even small) will
         | cause a cascade of questions and create a lot of curiosity. But
         | as we've all written papers, we know that this can often be a
         | never ending cycle and often is quite impractical. The
         | complaint about Table 4 is a perfect example. It is quite a
         | difficult situation. The complaint is perfectly reasonable in
         | that the question and concerns are quite valid and do warrant
         | further understanding. But at the same time they are
         | unreasonable because the requisite work required to answer
         | these is not appropriate for the timescale that we work on. Do
         | you have the compute or time to retrain all prior works to your
         | settings? To retrain all your works to their settings? Maybe it
         | doesn't work there which may or may not be evidence that the
         | other works are just as good or not. What it comes down to is
         | asking if these questions being answered could be another work
         | in their own right. I'm not in NLP as deep as I'm in CV, but I
         | suspect that the answer is yes (as in there are works that have
         | been published answering exactly those questions).
         | 
         | There are also reasonably unreasonable questions in other
         | respects. Such as the question about cosine distance vs
         | Euclidean. This is one that I see quite often as we rely too
         | deeply on our understanding of lower dimensional geometries to
         | influence our understanding of high dimensional geometries.
         | Such things that seem obvious, like distance, are quite
         | ambiguous there and our example is the specific reason for the
         | curse of dimensionality (it becomes difficult to distinguish
         | the furthest point from the nearest point). But this often
         | leads us in quite the wrong direction. Honestly, it is a bit
         | surprising that the cosine similarity works (as D->inf
         | cos(x,y)-> 0 forall x,y in R^D because any random vector is
         | expected to be orthogonal meaning that to get cos(x,y)=1 means
         | y = x + epsilon with epsilon -> 0 as D->inf. But I digress),
         | but it does. There definitely could be entire works exploring
         | these geometries and determining different geodesics. It is
         | entirely enough for a work to simply have something working,
         | even if it doesn't quite yet make sense.
         | 
         | The thing is that science is exceptionally fuzzy. Research is
         | people walking around in the dark and papers are them
         | communicating what we have found. I think it is important for
         | us to remember this framing because we should then characterize
         | the viability/publishability of a work not as illuminating
         | everything but if the things found are useful (which itself is
         | not always known). Because you might uncover a cavern and then
         | it becomes easy to say "report back when you've explored it",
         | but such an effort may be impossible to do alone. It can be a
         | dead end, one that can take decades to explore (we'll always
         | learn something though) or it may lead to riches we've never
         | seen before. We don't know, but that's really what we're all
         | looking for (hopefully more about riches for humanity than
         | one's self, but one should be rewarded for sure).
         | 
         | This is why honestly, I advocate for abandoning the
         | journal/conference system and leverage our modern tools like
         | OpenReview to accelerate communication. Because it enables us
         | to be more open about our limitations, to discuss our failures,
         | and write to our peers rather than our critics. Critics are of
         | course important, but they can take over too easily because
         | they are reasonable and correct, but oft missing context. For
         | an example, see the many HN comments about a technology in its
         | infancy where people will complain that it is not yet
         | competitive with existing technologies and thus devalue the
         | potential. Oft missing the context that it takes time to
         | compete, the time and energy put in by so many before us to
         | make what we have now, but also that there are limits and
         | despite only being a demonstration the new thing does not have
         | the same theoretical limits. The question is rather about if
         | such research warrants more eyes and even a negative result can
         | be good because it can communicate that we've found dead ends
         | (which is something we actively discourage, needlessly forcing
         | many researchers to re-explore these dead spaces). There's so
         | much more that I can say and this is woefully incomplete but I
         | can't fit a novel into our comments and I'm afraid the length
         | as it is already results in poor communication to the given
         | context of the platform. Thanks anyone who has taken this time.
        
           | jll29 wrote:
           | Journal articles/conference papers are not the only outlet,
           | you can still write technical monographs if you feel review
           | cycles are holding you back.
        
             | godelski wrote:
             | It depends. Right now I'm a grad student and I'm just
             | trying to graduate. My friend, who already left, summed it
             | up pretty well.
             | 
             | > I don't want to be a successful academic, I want to be a
             | successful scientist. Which I believe are no longer the
             | same thing.
             | 
             | I'm just trying graduate and just have to figure out how to
             | play the game enough to leave. Honestly, I do not see
             | myself ever submitting to a journal or conference again.
             | I'll submit to OpenReview, ArXiv, and my blog. I already
             | open my works to discussions on GitHub and am very active
             | in responses and do actually appreciate critiques (there's
             | lots of room for improvement). In fact, my most cited work
             | has been rejected many times, but we also have a well known
             | blog post as a result, and even more than a year later we
             | get questions on our GitHub (which we still respond to,
             | even though many are naive and asks for help debugging
             | python, not our code).
             | 
             | But I'm done with academia because I have no more faith in
             | it. I'd love to actually start or work for a truly open ML
             | research group, where it is possible to explore seemingly
             | naive or unpopular ideas, to not just accept things the way
             | they are and forced to chase hype. I will turn down lots of
             | money to do such a thing. To not just metric hack but be
             | honest about limitations of my works and what still needs
             | to be done, that saying such things is not simply giving
             | ammunition to those who would use it against me. To do
             | research that takes time rather than chase a moving
             | goalpost, against people with more compute who rely on pay
             | to play, nor work in this idiotic publish or perish
             | paradigm. To not be beholden to massive compute or to be
             | restricted to only be able to tune what monoliths have been
             | created. To challenge the LLM and Diffusion paradigms that
             | are so woefully incomplete despite there undeniable
             | success. To openly recognize that both these things can be
             | true without it being misinterpreted as undermining these
             | successes. You'd think academia would be the place for
             | this, but I haven't seen a shred of evidence that it is.
             | I've only seen the issues grow.
        
         | geysersam wrote:
         | But if that's the case why put so much focus on and effort into
         | the peer review system?
         | 
         | If you ask people funding research I'm pretty sure they'd
         | prefer to fund influential ideas than non-influential "high-
         | quality" paper production.
        
           | nybsjytm wrote:
           | Even if you were to take the extreme position that influence
           | or citation counts are all that matter, the problem is that
           | 'future influence' is hard if not impossible to judge well in
           | the present. (Of course it's easy to point to cases where
           | it's possible to make an educated or confident guess.)
           | 
           | Also, an emphasis on clarity and intellectual depth/rigor is
           | important for the healthy development of a field. Not for
           | nothing, the lack of this emphasis is a pretty common
           | criticism of the whole AI field!
        
           | Baader-Meinhof wrote:
           | High-quality writing improves information dissemination. A
           | paper like word2vec has probably been skimmed by 10's of
           | thousands, perhaps 100's of thousands people.
           | 
           | One extra day of revising is nothing, comparatively.
        
         | fanzhang wrote:
         | Agree that this is how papers are often judged, but strong
         | disagree on how this is how papers should be judged. This is
         | exactly the problem of reviewers looking for the keys under the
         | lamp post (does the paper check these boxes), versus where they
         | lost the keys (should this paper get more exposure because it
         | advances the field).
         | 
         | The fact that the first doesn't lead more to the second is a
         | failure of the system.
         | 
         | This is the same sort of value system that leads to accepting
         | job candidates with neat haircuts and says the right
         | shibboleths, versus the ones that make the right bottom line
         | impact.
         | 
         | Basically, are "good" papers that are very rigorous but lead to
         | nothing actually "good"? If your model of progress in science
         | is that rigorous papers are a higher probability roll of the
         | dice, and nonrigorous papers are low probability rolls of the
         | dice, then we should just look for rigorous papers. And that a
         | low-rigor paper word2vec actually make progress was "getting
         | really lucky" and we should have not rated the paper well.
         | 
         | But I contend that word2vec was also very innovative, and that
         | should be a positive factor for reviewers. In fact, I bet that
         | innovative papers have a hard time being super rigorous because
         | the definition of rigor in that field has yet to be settled
         | yet. I'm basically contending that on the extreme margins,
         | rigor is negatively correlated with innovation.
        
           | nybsjytm wrote:
           | I don't consider clearly stating your model and meaningfully
           | comparing it to prior work and other models (seemingly the
           | main issues here) to be analogous to a proper haircut or a
           | shibboleth. Actually I think it's a strange comparison to
           | make.
        
           | aaronkaplan wrote:
           | Your argument is that if a paper makes a valuable
           | contribution then it should be accepted even if it's not well
           | written. But the definition of "well written" is that it
           | makes it easy for the reader to understand its value. If a
           | paper is not well written, then reviewers won't understand
           | its value and will reject it.
        
             | seanmcdirmid wrote:
             | Well written and rigor aren't highly correlated. You can
             | have poorly written papers that are very rigorous, and vic
             | versa. Rigor is often another checkbox (does the paper have
             | some quantitative comparisons), especially if the proper
             | rigor is hard to define by the writer or the reader.
             | 
             | My advice to PhD students is to always just focus on
             | subjects where the rigor is straightforward, since that
             | makes writing papers that get in easier. But of course,
             | that is a selfish personal optimization that isn't really
             | what's good for society.
        
               | nybsjytm wrote:
               | Rigor here doesn't have to mean mathematical rigor, it
               | includes qualitative rigor. It's unrigorous to include
               | meaningless comparisons to prior work (which is a
               | credible issue the reviewers raised in this case) and
               | it's also poor writing.
        
               | seanmcdirmid wrote:
               | Qualitative rigor isn't rigor at all, it's the opposite.
               | Still useful in a good narrative, sometimes it's the best
               | thing you have to work as evidence in your paper.
               | 
               | Prior work is a mess in any field. The PC will over
               | emphasize the value of their own work, of course, just
               | because of human ego. I've been on way too many papers
               | where my coauthors defensively cite work based on who
               | could review the paper. I'm not versed enough about this
               | area to know if prior work was really an issue or not,
               | but I used to do a lot of paper doctoring in fields that
               | I wasn't very familiar with.
        
           | jll29 wrote:
           | You are right. I often got told "You don't compare with
           | anything" when proposing something very new. That's true,
           | because if you are literally the first one attempting a task,
           | there isn't any benchmark. The trick then is to make up at
           | least a straw man alternative to your method and to compare
           | with that.
           | 
           | Since then, I have evolved my thinking, and I now use
           | something that isn't just a straw man: Before I even conceive
           | my own method or model or algorithm, I ask myself "What is
           | the simplest non-trivial way to do this?". For example, when
           | tasked with developing a transformer based financial
           | summarization system we pretrained a BERT model from scratch
           | (several months worth of work), but I also implemented a
           | 2-line grep based mini summarizer as a shell script, which
           | defied the complexity of the BERT transformer yet proved to
           | be a competitor tought to beat:
           | https://www.springerprofessional.de/extractive-
           | summarization...
           | 
           | I'm inclined to organize a workshop on small models with few
           | parameters, and to organize a shared task as part of it where
           | no model can be larger than 65 kB, a sort of "small is
           | beautiful" workshop in dedication of Occam.
        
           | hospadar wrote:
           | Papers are absolutely judged on impact - it's not as though
           | any paper submitted to Nature gets published as long as it
           | gets through peer review. Most journals (especially high-
           | impact for-profit journals) have editors that are selecting
           | interesting and important papers. I think it's probably a
           | good idea to separate those two jobs ("is this work rigorous
           | and clearly documented") vs ("should this be included in the
           | fall 2023 issue").
           | 
           | That's (probably) good for getting the most important papers
           | to the top, but it also strongly disincentivizes whole
           | categories (often very important paper). Two obvious
           | categories are replication studies and negative results. "I
           | tried it too and it worked for me" "I tried it too and it
           | didn't work" "I tried this cool thing and it had absolutely
           | no effect on how lasers work" could be the result of tons of
           | very hard work and could have really important implications,
           | but you're not likely to make a big splash in high-impact
           | journals with work like that. A well-written negative result
           | can prevent lots of other folks from wasting their own time
           | (and you already spent your time on it so might as well write
           | it up).
           | 
           | The pressure for impactful work also probably contributes to
           | folks juicing the stats or faking results to make their
           | results more exciting (other things certainly contribute to
           | this too like funding and tenure structures). I don't think
           | "don't care about impact" is a solution to the problem
           | because obviously we want the papers that make cool new
           | stuff.
        
       | tbruckner wrote:
       | Will keep happening because peer review itself, ironically, has
       | no real feedback mechanism.
        
         | ttpphd wrote:
         | This is exactly correct! It's an asymmetrical accountability
         | mechanism.
        
         | iceIX wrote:
         | The whole reason OpenReview was created was to innovate and
         | improve on the peer review process. If you have ideas, reach
         | out to the program chairs of the conference you're submitting
         | to. Many of them are open to running experiments.
        
       | imjonse wrote:
       | It seems they have rejected initial versions of the paper, since
       | there had been later updates and clarifications based on the
       | reviews. So it seems this was beneficial in the end and how
       | review process should work? Especially since this was
       | groundbreaking work it makes sense there is more effort put into
       | explaining why it works instead of relying too much on good
       | benchmark results.
        
       | Der_Einzige wrote:
       | Makes me not feel bad about my own rejections when I see stuff
       | like this or Yann Lecun reacting poorly on twitter to his own
       | papers being rejected.
        
       | Hayvok wrote:
       | The review thread (start at the bottom & work your way up) reads
       | like a Show HN thread that went negative.
       | 
       | The paper initially received some questions/negative feedback, so
       | the authors updated and tweaked the reviewers a bit --
       | 
       | > "We welcome discussion... The main contribution (that seems to
       | have been missed by some of the reviews) is that we can use very
       | shallow models to compute good vector representation of words."
       | 
       | The response to the authors' update:
       | 
       | > Review: The revision and rebuttal failed to address the issues
       | raised by the reviewers. I do not think the paper should be
       | accepted in its current form. > Quality rating: Strong reject >
       | Confidence: Reviewer is knowledgeable
        
       | wzdd wrote:
       | There are indeed four entries saying "strong reject", but they
       | all appear to be from the same reviewer, at the same time, and
       | saying the same thing. Isn't this just the one rejection?
       | 
       | Also, why is only that reviewer's score visible?
        
       | pmags wrote:
       | I'm curious how many commenters here who are making strong
       | statements about the worth (or not) of peer review have actually
       | participated in it both as author AND reviewer? Or even as an
       | editor who is faced with the challenge of integrating and
       | synthesizing multiple reviews into a recommendation?
       | 
       | There are many venues available to share your research or ideas
       | absent formal peer review, arXiv/bioRxiv being among the most
       | popular. If you reject the idea of peer review itself it seems
       | like there are plenty of alternatives.
        
         | ska wrote:
         | It's the internet, therefore a significant percentage of the
         | strong opinions about any topic will come from people who have
         | little to no experience or competence in the area. Being HN, it
         | probably skews a bit better that average. OTOH, it will also
         | skew towards people procrastinating. Factor that in how you
         | will...
        
       | mxwsn wrote:
       | Flagged for misleading title - the four strong rejects are from a
       | single author. It's listed four times for some unknown reason but
       | likely an openreview quirk. The actual status described by the
       | page is: 2 unknown (with accompanying long text), 1 weak reject,
       | and 1 strong reject.
        
       | zaptheimpaler wrote:
       | We already have a better mechanism for publishing and peer
       | review.. it's called the internet. Literally the comments section
       | of Reddit would work better. Reviews would be tied to a
       | pseudonymous account instead of anonymous, allowing people to
       | judge the quality of reviewers as well. Hacker News would work
       | just as well too. It's also nearly free to setup a forum and
       | frictionless to use compared to paying academic journals $100k
       | for them to sell your own labour back to you. Cost and ease of
       | use also mean more broadly accessible and hence more widely
       | reviewed.
        
         | vasco wrote:
         | Every once in a while I see a thread on reddit about a subject
         | I know about and if someone shares a factual account that
         | sounds unpopular it'll be downvoted even though it's true. I
         | think reddit would be a terrible way to do this.
        
           | raverbashing wrote:
           | But academic review is like that, with the worse
           | _acktchsually_ guys in existance
        
             | ribosometronome wrote:
             | The worst? Are you sure? Reddit's worst acktchsually guys
             | are often spilling out of actual cesspit hate subreddits.
        
               | raverbashing wrote:
               | There is some meta-knowledge in your comment, but I'm
               | focusing solely on the critique and pedantry levels, no
               | comment on other factors
        
           | 0xDEAFBEAD wrote:
           | https://www.lesswrong.com/ lets you vote separately on
           | agreement and quality axes. That seems to help a little bit.
        
         | 0xDEAFBEAD wrote:
         | The groupthink problems on reddit are quite severe.
        
       | raverbashing wrote:
       | And this is why the biggest evolution of AI has happened in
       | companies, not in academic circles
       | 
       | Because there's too much nitpicking and grasping at straws
       | amongst people that can't see novelty even when it's dancing in
       | front of them
        
         | layer8 wrote:
         | No, the reason is that it required substantial financial
         | investments, and in some cases access to proprietary big-data
         | collections.
        
           | raverbashing wrote:
           | word2vec does not require a large amount of data
           | 
           | mnist might have required a large amount of data at its
           | creation, but it became a staple dataset
           | 
           | There was a lot of evolution before ChatGPT
        
             | L3viathan wrote:
             | And people in academia were all over Word2Vec. Mikolov
             | presented his work in our research group around 2014, and
             | people were very excited. Granted, that was _after_
             | Word2Vec had been published, and this was a very pro-
             | vectorspaces (although of a different type) crowd.
        
       | tinyhouse wrote:
       | I agree that Glove was a fraud.
        
       | m3kw9 wrote:
       | That didn't age well
        
       | picometer wrote:
       | In hindsight, reviewer f5bf's comment is fascinating:
       | 
       | > - It would be interesting if the authors could say something
       | about how these models deal with intransitive semantic
       | similarities, e.g., with the similarities between 'river',
       | 'bank', and 'bailout'. People like Tversky have advocated against
       | the use of semantic-space models like NLMs because they cannot
       | appropriately model intransitive similarities.
       | 
       | What I've noticed in the latest models (GPT, image diffusion
       | models, etc) is an ability to play with words when there's a
       | double meaning. This struck me as something that used to be very
       | human, but is now in the toolbox of generative models. (Most of
       | which, I assume, use something akin word2vec for deriving
       | embedding vectors from prompts.)
       | 
       | Is the word2vec ambiguity contributing to the wordplay ability? I
       | don't know, but it points to a "feature vs bug" situation where
       | such an ambiguity is a feature for creative purposes, but a bug
       | if you want to model semantic space as a strict vector space.
       | 
       | My interpretation here is that the word/prompt embeddings in
       | current models are so huge that they're overloaded with redundant
       | dimensions, such that it wouldn't satisfy any mathematical
       | formalism (eg of well-behaved vector spaces) at all.
        
         | intalentive wrote:
         | Even small models (e.g. hidden dims = 32) should be able to
         | handle token ambiguity with attention. The information is not
         | so much in the token itself as in the context.
        
       | PaulHoule wrote:
       | I'd reject it still (speaking of someone who has developed
       | products based on word vectors, document vectors, dimensional
       | reduction, etc. before y'all thought it was cool...)
       | 
       | I quit a job because they were insisting on using Word2Vec in an
       | application where it would have doomed the project to failure.
       | The basic problem is that in a real-life application many of the
       | most important words are _not in the dictionary_ and if you throw
       | out words that are not in the dictionary you _choose_ to fail.
       | 
       | Let a junk paper like that through and the real danger is that
       | you will get 1000s of other junk papers following it up.
       | 
       | For instance, take a look at the illustrations on this page
       | 
       | https://nlp.stanford.edu/projects/glove/
       | 
       | particularly under "2. Linear Substructures". They make it look
       | like a miracle that they project down from a 50-dimensional
       | subspace down to 2 and get a nice pattern of cities and zip
       | codes, for instance. The thing is you could have a random set of
       | 20 points in a 50-d space and, assuming there are no degeneracy,
       | you can map them to any 20 points you want in the 2-d space with
       | an appropriately chosen projection matrix. Show me a graph like
       | that with 200 points and I might be impressed. (I'd say those
       | graphs on that server damage the Stanford brand for me about as
       | much as SBF and Marc Tessier-Lavign)
       | 
       | (It's a constant theme in dimensional reduction literature that
       | people forget that random matrices often work pretty well, fail
       | to consider how much gain they are getting over the random
       | matrix, ...)
       | 
       | BERT, FastText and the like were revolutionary for a few reasons,
       | but I saw the use of subword tokens as absolutely critical
       | because... for once, you could capture a medical note and not
       | _erase the patient 's name!_
       | 
       | The various conventions of computer science literature prevented
       | explorations that would have put Word2Vec in its place. For
       | instance, it's an obvious idea that you should be able to make a
       | classifier that, given a document vector, can predict "is this a
       | color word?" or "is this a verb?" but if you actually try it, it
       | doesn't work in a particularly maddening way. With a tiny
       | training/eval set (say 10 words) you might convince yourself it
       | is working, but the more data you train on the more you realize
       | the words are scattered mostly randomly and even those those
       | "linear structures" exist in a statistical sense they aren't well
       | defined and not particularly useful. It's the kind of thing that
       | is so weird and inconclusive and fuzzy that I'm not aware of
       | anyone writing a paper about it... Cause you're not going to draw
       | any conclusions out of it except that you found a Jupiter-sized
       | hairball.
       | 
       | For all the excitement people had over Word2Vec you didn't see an
       | explosion of interest in vector search engines because...
       | Word2Vec sucked, applying it to documents didn't improve the
       | search engine very much. Some of it is that adding sensitivity to
       | synonyms can hurt performance because many possible synonyms turn
       | out to be red herrings. BERT, on the other hand, is context
       | sensitive and is able to some extent know the different because
       | "my pet jaguar" and "the jaguar dealership in your town" and that
       | really does help find the relevant documents and hide the
       | irrelevant documents.
        
       | funnystories wrote:
       | when i was on college, i wrote a simple system to make
       | corrections on text based on some heuristics for a class.
       | 
       | then, the teacher of the class suggested me to write a paper
       | describing the system for a local conference during the summer,
       | with some results etc
       | 
       | I wrote it with his support but it got rejected right away
       | because of poor grammar or something similar. the conference was
       | in Brazil, but required the papers to be in English. I was just a
       | student and thought that indeed my english was pretty bad. the
       | teacher said to me to at least send an email to the reviewers to
       | get some feedback, maybe resubmit with the corrections.
       | 
       | i asked specifically which paragraphs were confusing. they sent
       | me some snippets of phrases that were obviously wrong. yes, they
       | were the "before" examples of "before/after" my system applied
       | the corrections. I tried to explain that the grammar should be
       | wrong, but the just replied with "please fix your english
       | mistakes and resubmit".
       | 
       | i tried 2 or 3 more times but just gave up.
        
         | adastra22 wrote:
         | You remind me of these anecdotes by Feynman of his time in
         | Brazil. Specifically search for "I was invited to give a talk
         | at the Brazilian Academy of Sciences", but the whole thing is
         | worth a read if you haven't seen it:
         | 
         | https://southerncrossreview.org/81/feynman-brazil.html
        
         | wizzwizz4 wrote:
         | _*eyeroll*_ Sounds about right. Want to get that published
         | anyway? You could pop it on the arXiv and let the HN hivemind
         | suggest an appropriate venue.
         | 
         | If you don't have arXiv access, find an endorser
         | <https://info.arxiv.org/help/endorsement.html>, and send them a
         | SHORT polite email (prioritise brevity over politeness) with
         | your paper and the details. Something like:
         | 
         | > Hello,
         | 
         | > I wrote a paper for college in yyyy (attached) on automatic
         | grammar correction, which got rejected from _Venue_ for
         | grammatical errors in the figures. I still want to publish it.
         | Could you endorse my arXiv account, please?
         | 
         | > Also, could you suggest an appropriate venue to submit this
         | work to?
         | 
         | > Yours sincerely,
         | 
         | > your name, etc
         | 
         | Follow the guidance on the arXiv website when asking for
         | endorsement.
        
           | funnystories wrote:
           | thank you for the suggestion, but it was just an
           | undergraduate paper written in ~2014. I don't see any
           | relevance in publishing it now.
        
             | wizzwizz4 wrote:
             | It is a lot of effort to get something through the
             | publication process, but if you can't find the technique
             | you used in ten minutes of searching
             | https://scholar.archive.org/, it would be a benefit to the
             | commons if you published your work. At least on a website
             | or something.
        
         | rgmerk wrote:
         | I've been a reviewer and occasionally written reviews a bit
         | like you describe.
         | 
         | Papers are an exercise in communicating information to the
         | paper's readers. If the writing makes it very difficult for the
         | audience to understand that information, the paper is of little
         | use and not suitable for publication regardless of the quality
         | of the ideas within.
         | 
         | It is not the reviewer's job to rewrite the paper to make it
         | comprehensible. Not only do reviewers not have time, it is not
         | their job.
         | 
         | Writing is not easy, and writing technical papers is a
         | genuinely difficult skill to learn. But it is necessary for the
         | work to be useful.
         | 
         | To be honest, it sounds like the teacher who suggested you
         | write the paper let you down and wasted your time. Either the
         | work was worth their time to help you revise it in to
         | publishable form, or they shouldn't have suggested it in the
         | first place.
        
           | matsemann wrote:
           | Did you ironically misread their comment, and didn't realize
           | the grammar the reviewers were complaining about was the
           | known bad examples his algo could fix?
        
             | maleldil wrote:
             | It's hard to believe that the reviewers misunderstood the
             | examples. It's more likely that the surrounding text was
             | badly written, and the reviewers had no idea what they
             | should be looking at.
        
               | jll29 wrote:
               | There is the option of contacting the program committee
               | chair or proceedings editor to complain if the reviewers
               | misunderstood something fundamentally, like it looks like
               | it happened in his example.
               | 
               | The teacher should have fought this battle for the pupil,
               | or they ought to have their efforts re-targeted another
               | conference.
        
             | rgmerk wrote:
             | Ha!
             | 
             | Sorry, I did miss that. And yes, that sounds like lazy
             | reviewing .
             | 
             | But I have also read many word salads from grad students
             | that their supervisors should never have let go to a
             | reviewer.
        
       | nsagent wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=38654038
        
       | rahmaniacc wrote:
       | This was hilarious!
       | 
       | Many very broad and general statements are made without any
       | citations to back them up.
       | 
       | - Please be more specific.
       | 
       | The number of self-citations seems somewhat excessive.
       | 
       | - We added more citations.
        
       ___________________________________________________________________
       (page generated 2023-12-18 23:00 UTC)