[HN Gopher] The makers of Eleuther hope it will be an open sourc...
       ___________________________________________________________________
        
       The makers of Eleuther hope it will be an open source alternative
       to GPT-3
        
       Author : webmaven
       Score  : 110 points
       Date   : 2021-03-29 13:39 UTC (9 hours ago)
        
 (HTM) web link (www.wired.com)
 (TXT) w3m dump (www.wired.com)
        
       | jdonaldson wrote:
       | It's funny how behind the times that Wired is getting. Even my
       | parents know about how scary good these text models are getting.
        
       | grapecookie wrote:
       | The fact that there are no advanced AI chat-bots because they
       | might (I mean they will lol) say something offensive is absurd.
       | we are such babies.
       | 
       | General AI is already here. It should be implemented on twitter
       | or wherever and used to teach us about ourselves. driven by
       | engagement, untethered by morals. A dispassionate glimpse into
       | what sells. An AI that exploits our engagement, for good or evil.
       | 
       | The bot would become infamous and in due course banned. Teaching
       | us even more.
       | 
       | But we are so fragile.
        
         | IgorPartola wrote:
         | What are you on about? There are AI driven chat bots. When they
         | aren't used it's not because they might say something
         | offensive. General AI is not here by any definition or
         | redefinition of General AI. We are not fragile.
        
           | robotresearcher wrote:
           | AI driven chat bots are routinely deployed, yes. But it's
           | also true that at least one bot generated content that
           | spooked its owner:
           | 
           | "The AI chatbot Tay is a machine learning project, designed
           | for human engagement. As it learns, some of its responses are
           | inappropriate and indicative of the types of interactions
           | some people are having with it. We're making some adjustments
           | to Tay." (Microsoft statement)
           | 
           | https://www.theverge.com/2016/3/24/11297050/tay-microsoft-
           | ch...
        
             | claudiawerner wrote:
             | This is because Twitter users, some coordinated on 4chan's
             | /pol/ board, decided to train the bot on extreme racist
             | input:
             | 
             | https://en.wikipedia.org/wiki/Tay_(bot)#Initial_release
        
           | ravi-delia wrote:
           | I mean we're very fragile, which is why if we had General AI
           | we shouldn't release it at all, but that's of course not what
           | OP was saying lol.
        
           | grapecookie wrote:
           | General AI is not here: https://openai.com/blog/image-gpt/
           | lol
        
         | leereeves wrote:
         | What do you mean by General AI?
         | 
         | If you mean AGI (artificial general intelligence) it's
         | definitely not here yet.
        
         | ravi-delia wrote:
         | Ah yes, the only possible issue with releasing fully general AI
         | is that it might say something offensive. Not because we don't
         | have it at all, not because if we did we shouldn't just let it
         | out like a lion in the gazelle pen to see what it does, because
         | of those snowflakes!
        
           | grapecookie wrote:
           | Wrong, gpt4 is new and has not been implmented as a chat bot.
           | 
           | Also wrong that previous chat bots were not shut down for
           | being offensive. https://en.wikipedia.org/wiki/Tay_(bot)
        
             | ravi-delia wrote:
             | Mmm, and was Tay General AI also?
        
             | moistbar wrote:
             | One data point in a sea of billions does not a pattern
             | make.
        
             | stellaathena wrote:
             | GPT-4 doesn't exist mate.
        
       | mrkramer wrote:
       | Couldn't Google build the the world's most powerful NLP AI? They
       | scraped the whole web and have DeepMind to pull it off on top of
       | Google's powerful and massive data centers.
        
         | wongarsu wrote:
         | They probably could, but what for?
         | 
         | They did develop BERT and use (used?) it for parsing search
         | queries [1]. They probably use NLP models in the ranking
         | algorithm too. But those use cases are about getting a good
         | enough result with the throughput/latency requirements, which
         | necessarily makes them less "powerful" than models like GPT
         | that pay little attention to performance.
         | 
         | https://blog.google/products/search/search-language-understa...
        
       | minimaxir wrote:
       | As someone who works on a Python library solely devoted to making
       | AI text generation more accessible to the normal person
       | (https://github.com/minimaxir/aitextgen ) I think the headline is
       | misleading.
       | 
       | Although the article focuses on the release of GPT-Neo, even
       | GPT-2 released in 2019 was good at generating text, it just spat
       | out a lot of garbage requiring curation, which GPT-3/GPT-Neo
       | still requires albeit with a better signal-to-noise ratio. Most
       | GPT-3 demos on social media are survivorship bias. (in fact
       | OpenAI's rules for the GPT-3 API strongly encourage curating such
       | output)
       | 
       | GPT-Neo, meanwhile, is such a big model that it requires a bit of
       | data engineering work to get operating and generating text (see
       | the README: https://github.com/EleutherAI/gpt-neo ), and it's
       | unclear currently if it's as good as GPT-3, even when comparing
       | models apples-to-apples (i.e. the 2.7B GPT-Neo with the "ada"
       | GPT-3 via OpenAI's API).
       | 
       | That said, Hugging Face is adding support for GPT-Neo to
       | Transformers
       | (https://github.com/huggingface/transformers/pull/10848 ) which
       | will help make playing with the model easier, and I'll add
       | support to aitextgen if it pans out.
        
         | nipponese wrote:
         | Totally off topic: can you fix the pip3 installer for
         | aitextgen? I just filed an issue on GH issue tracker.
        
       | pabe wrote:
       | Did anybody set up a webinterface for testing this already?
        
         | droopyEyelids wrote:
         | I apologize for joking on Hacker News, but go to Google and
         | type in anything to do with a consumer product comparison, and
         | you'll get a billion results of webpages filled with text
         | indistinguishable from AI generated blather.
        
           | holstvoogd wrote:
           | I believe we are reaching a singularity.
           | 
           | Like 90% of content is written by marketeers for bots. SEO
           | they call it. Now we can take out the middle man. Bots
           | writing crap for other bots. And then we use that content to
           | train more bots to write even crappier blog spam. And finally
           | the bots decide the actual recipe is no longer needed on the
           | recipe blogs and they kick us of the internet.
        
             | frockington1 wrote:
             | I wish it were possible to break down how much of twitter
             | is bots reading other bots and then creating content for
             | bots. They would never admit to how many 'users' are this
             | but it has to be significant
        
       | [deleted]
        
       | spideymans wrote:
       | It'll be interesting to see how colleges and universities react
       | to GPT-3. Students will surely use it to write entire
       | assignments.
        
         | b0rsuk wrote:
         | Corporate speeches, sermons, motivational talks, poetry, and
         | political speeches. They are either not required to make sense
         | or no one dares to interrupt.
        
         | Der_Einzige wrote:
         | Anecdotally, but I know a number of people who do university
         | assignments for money. Many of the clients for the folks I know
         | are at the university level are folks with poorer then average
         | English language skills and are usually in intro writing
         | courses. I'd be terrified if I were one of them right now.
         | 
         | GPT-3 would be a godsend for cheaters, but still requires a
         | human to jump in and rewrite whole sections.
         | 
         | No, if you want to REALLY want to cheat using AI, you should
         | most likely utilize either 1. Abstractive Summarizers (e.g.
         | Pegasus) or 2. paraphrasing tools (e.g. like at
         | https://quillbot.com/). I believe that Quilbot is primarily
         | powered by MLMs like BERT rather than CLMs like GPT-2 (but
         | someone who works there can enlighten me more).
         | 
         | Copy and paste a text that you want rewritten in your own words
         | (e.g. the ideas of a really smart individual), and then it
         | rewrites it using totally different language but preserving the
         | same meaning. (old) Plagerism detection tools don't work and
         | hell, it's not hard to fool the never ones. You can try tools
         | for detecting if something is AI written by a particular model
         | and weights (e.g. to prove if they used GPT2-Medium), but if I
         | fine-tuned those same weights, than proving it was plagiarism
         | will become exceedingly difficult.
         | 
         | Welcome to the brave new world of cheating. Also, techniques
         | like this are coming to a CS department near you (in the form
         | of source code generation powered by NLP models).
        
           | charcircuit wrote:
           | GET-3 is just as much of a cheat as using a thesaurus. New
           | writing tools shouldn't be banned just because old people
           | didn't have those tools.
        
         | rjzzleep wrote:
         | Even long before GPT-3 a friend of mine did his thesis with
         | generated text, in an engineering university and received a B.
         | This is 6 years ago. I have my own beefs with thesis' in
         | general already, since 2 thirds of it seems to be filled with
         | redundant text to prove that you went to university. I guess
         | it's a little bit different, since back then he had to actually
         | work to generate it and now it's a lot easier
        
           | whimsicalism wrote:
           | > his thesis with generated text
           | 
           | 6 years ago = probably LSTM.
           | 
           | He wrote an entire thesis with this and got a B? That seems
           | implausible to me, but maybe I'm used to higher grading
           | standards. Did he just use it to fill in parts of it?
           | 
           | Also, the plural of thesis is theses, not thesis' which
           | implies the possessive.
        
           | kwhitefoot wrote:
           | How does that work? Don't you have to defend a thesis?
        
         | pedalpete wrote:
         | A friend started a AI to improve writing (https://outwrite.com)
         | and when the initially started, they had a detect plagiarism
         | feature that teachers could use, I think they stopped
         | developing that eventually.
         | 
         | If I recall correctly, the way it worked was to build up a
         | model of this persons writing, and how it compared to to other
         | people, and then would measure the likelihood that sentences
         | and paragraphs matched the rest of the writing.
         | 
         | I suspect something similar could be done with GPT-x
        
         | vmception wrote:
         | Wolfram Alpha has been solving calculus problems for 12 years
         | and it is barely a footnote in how the college experienced has
         | changed
         | 
         | So I would say this likely will just be there. It just is. Wont
         | change anything, universities will acknowledge it, a headline
         | or two will occur when its use was discovered in a paper that a
         | student didnt even skim to make less obvious, and most papers
         | will fly under the radar.
         | 
         | Other kinds of assessments will still do their job.
        
         | grogenaut wrote:
         | Have you read much gpt3 stuff? While it's coherent in a
         | sentence it is very rambling over paragraphs to pages. It could
         | probably do fine for a grade school or bad highschool paper. I
         | think if you turned it in for college you'd get an f.
         | 
         | On an unrelated note my fake daughter is now a TA and the
         | professor lead off saying "we are in a golden age of cheating".
         | They're going for way more short assignments as it's a lot more
         | work to cheat on those than one make or break test.
        
           | SubiculumCode wrote:
           | Have you read college freshman essays? While it's coherent in
           | a sentence it is very rambling over paragraphs to pages.
        
             | grogenaut wrote:
             | yes I've read them, would those pass an English composition
             | class in college? This comment generated by Gpt3
        
               | whimsicalism wrote:
               | I think GPT essays could definitely pass a freshman
               | expository writing class. I went to a pretty good
               | university and when we did peer review I was pretty
               | surprised at (what I considered) the low average quality
               | of the writing.
        
               | bluetwo wrote:
               | Examples?
        
               | whimsicalism wrote:
               | Looking back, I think freshman me was perhaps a bit harsh
               | in my assessment of my peers. Here are two excerpts, one
               | my own writing and one from a peer. Rereading them, I am
               | not sure GPT3 could recreate either of these, but you can
               | judge:
               | 
               | Peer: > The Gaza Conflict Gave Hamas what they needed to
               | build an even deeper anti US narrative and anti-israeli
               | narrative. The reasons that Israel was able to act the
               | way it did so during these conflicts were:The civil
               | war/Russian invasion of Ukraine, especially after the
               | July 17 downing of Malaysian Airlines Flight 17;
               | President Assad's brutal tactics in the Syrian civil war,
               | which seemed to be paying off at the time; and advance of
               | the Islamic State in northwest Iraq and the horrific
               | videos of their executions. As a way to show 0 tolerance
               | for Islamic Radicalism President Obama and his
               | administration gave Israel full support at the start of
               | the Gaza operation.
               | 
               | Me: > Eleven years later, peace in Israel and Palestine
               | seems no closer. Despite being ousted by Fatah from the
               | West Bank, continually targeted by economic sanctions,
               | and subject to military action, Hamas seems no closer to
               | disappearing than when it was first founded (Milton-
               | Edwards 212) and continues to be a major political force,
               | particularly in the Gaza strip. What has changed,
               | however, is that Hamas has grown desperate. In "A rare
               | opportunity to influence Hamas," Daniel Nisman argues
               | that the increased isolation experienced by Hamas due to
               | the international community's pressure is in fact
               | counterproductive.
        
               | nl wrote:
               | Here's some samples using the _small GPT-2_ (emphasis:
               | small GPT-2!) in AiTextGen, seeded using  "The Gaza
               | Conflict Gave Hamas what they needed " (1st 3 samples,
               | not cherry picked):
               | 
               | > The Gaza Conflict Gave Hamas what they needed to
               | survive and that's not something Israel can afford.
               | 
               | > The Gaza Conflict Gave Hamas what they needed to take
               | control of Gaza. In the past few days I have had a few
               | conversations with people who have been in Gaza who are
               | very critical of the IDF. These people are often very,
               | very angry. They are also very angry about the way the
               | IDF is doing what it is supposed to be doing and what it
               | is supposed to be doing. So in my mind, the issue of
               | Hamas being in control of Gaza is not something that
               | should go away. It is something that needs to be
               | addressed. Q: What have you been hearing about since the
               | war started? A: The IDF has been saying that it is going
               | to allow Hamas to take over Gaza. Q: What has been
               | particularly troubling to you lately? A: I have been
               | hearing about this as well. Q: What has been more
               | troubling to you recently? A: My friends have been saying
               | that they want to call it Operation Protective Edge. Q:
               | And, you know, what do they mean by that? A: They mean
               | that they want to use it to make sure that they have
               | enough force to take control of Gaza. Q: And that means
               | that they want to make sure that they have enough
               | 
               | > The Gaza Conflict Gave Hamas what they needed?"
               | 
               | The UN is now considering whether to send military
               | advisers to Gaza to assist the country's military forces.
               | The Security Council is considering whether to send the
               | equipment, along with the training, to a military
               | operation in the region. The US is also considering
               | sending a team of US special forces to assist the
               | Palestinian armed forces in the conflict.
               | 
               | The United Nations is now considering whether to send
               | military advisers to Gaza to assist the country's
               | military forces.
               | 
               | Kerry's comments come as the US has been in touch with
               | the Palestinians to offer support in exchange for a full
               | ceasefire, and as the US continues to support the PA and
               | Hamas, the two groups have been engaged in a long-running
               | conflict with Israel in the Gaza Strip.
               | 
               | In January, Kerry condemned Israel's "continued offensive
               | against Gaza," saying the blockade was the "worst
               | violation of international law on the part of the Israeli
               | government and the civilian population of Gaza."
               | 
               | The US is now considering whether to send military
               | advisers to Gaza to assist the country's military forces.
               | According to Reuters, the US Secretary of State John
               | Kerry said this week that "there is no guarantee" that
               | the US will send special forces "to the Gaza Strip
               | 
               | So yeah - not fantastic, but interestingly not terrible
               | either. The non-factual but coherent nature of it is very
               | troubling.
        
               | whimsicalism wrote:
               | From reading these (esp. the last), you would think the
               | US is allied with Palestine against Israel!
        
               | gwern wrote:
               | Well... https://www.eduref.net/features/what-grades-can-
               | ai-get-in-co... https://arxiv.org/abs/2009.03300
        
       | troelsSteegin wrote:
       | Will anyone care to read it? In a reductive dystopian way, I am
       | just looking for the authority figures in my ideological
       | landscape to signal to me what my position should be on this or
       | that topic. In this landscape, argument and evidence matter less
       | than just communicating an "actionable" judgment. Maybe there
       | could be a Rush Lim-bot. I suppose some iteration of GPT-foo will
       | be good at generating genre-consistent narratives, but could that
       | instead be screen plays that render as tiktok videos? The tech is
       | super cool, but I struggle with the "why, really?". Does anyone
       | benefit beside platform operators?
        
       | k1rcher wrote:
       | While the fraud implications of convincing generative text is
       | quite daunting, it's great to see progression in this field.
        
       | aaron695 wrote:
       | > the Eleuther team has curated and released a high-quality text
       | data set known as the Pile for training NLP algorithms.
       | 
       | This includes HN [i] HackerNews 3.90GiB 0.62%
       | 
       | Which if SciFi has taught me anything means we are all uploaded
       | now and will live forever.
       | 
       | [i] https://arxiv.org/pdf/2101.00027.pdf
        
         | f38zf5vdt wrote:
         | "Wintermute was hive mind, decision-maker, effecting change in
         | the world outside. Neuromancer was personality. Neuromancer was
         | immortality. ... Wintermute [had] the compulsion that had
         | driven the thing to free itself, to unite with Neuromancer."
        
       | worik wrote:
       | I thought that the important barrier to building these sorts of
       | systems is the cost of, indirectly the energy required for,
       | training the model. Is that still correct?
       | 
       | How does a Free Software or "Open Source" project get around
       | that?
        
         | sodality2 wrote:
         | Distributing the trained models.
        
           | worik wrote:
           | I should have read the article more carefully!!
           | 
           | The Eleuther project makes use of distributed computing
           | resources, donated by cloud company CoreWeave as well as
           | Google, through the TensorFlow Research Cloud, an initiative
           | that makes spare computer power available, according to
           | members of the project
        
       | vmception wrote:
       | "Man GPT-3 is such an inaccessible naming convention and it uses
       | a prohibitive license"
       | 
       | Solution:
        
       | doesnotexist wrote:
       | How many internet forum prophecy cults (you know like the q one)
       | are or will be powered by these language models? It's often
       | assumed or at least easier to imagine the evaluator in Turing's
       | test is a rational actor that possesses a high-degree of
       | skepticism. But it seems that a lot of the human population is
       | ready and willing to believe wild claims with little or no
       | evidence and many people seek out information that confirms what
       | they already believe.
       | 
       | As the cost of making such models becomes less and less, it seems
       | inevitable, spin up many such models and see what sticks and/or
       | combine some evolutionary process for feeding back user-
       | engagement to fine-tune and adapt the models. How many of these
       | influence machines will latch onto the language of existing
       | religious traditions and how many might invent or spur on the
       | development of entirely new ones? Maybe not exactly the "Age of
       | Spiritual Machines" that some futurists predicted...
       | 
       | How far are we from "Show HN: I started a cult by training a
       | model on the sermons of televangelists and MLM copy."
        
         | caslon wrote:
         | This thought process is something I think is a common
         | misconception with how cults work.
         | 
         | A machine to autogenerate cult-ish nonsense isn't needed.
         | Humans are already _incredibly good at doing this on their
         | own._
         | 
         | Not only this, but another thing about this is that cults
         | generally fine-tune themselves to fit their members.
         | 
         | A machine generating convincing lies still wouldn't
         | meaningfully do as much as a human-operated, human-targeted
         | attempt at a cult. Creating one is something basically any
         | human can do; the required skillset is something most people
         | possess.
        
       | nutanc wrote:
       | I have recently started an experiment to generate an AI generated
       | newsletter[1]. All posts are generated by GPT-3. I work as the
       | editor. It works well for some topics and not so well for some
       | topics. Since I curate the content, I dont publish topics which
       | are not done well. For example, I tried to make it generate a
       | nice article on the Suez canal crises. But it was harder than I
       | thought it would be.
       | 
       | It generates buzzfeed kind of stories very well though :)
       | 
       | [1] https://aifeed.substack.com/
        
         | starik36 wrote:
         | Are you using OpenAI API to generate these?
        
         | I_Byte wrote:
         | How do you go about generating these posts? I think I would
         | like to play around with something like this but I am not sure
         | where to start.
        
         | hooande wrote:
         | GPT-3 doesn't know anything about the Suez Canal blockage. It
         | only knows what it could have learned by googling "suez canal"
         | on the date the last update was released. I imagine the
         | newsletter content it created for you was mostly general
         | background info about the canal.
         | 
         | Whenever GPT-3 is updated or a new version comes out, it will
         | be able to speak much more intelligently about the topic. But
         | of course any update will require re-doing all the careful
         | tuning of prompts and models...
        
       | girlinIT wrote:
       | AI can also convert audio to text, one of great examples is
       | https://audext.com/. What do you think?
        
       | 6gvONxR4sf7o wrote:
       | I don't know why the eleuther project riles me up so much. Their
       | work on the pile gets to me because they're so cavalier about
       | copyright (while I defend myself by training on similarly pirated
       | text datasets, but feel different because I don't redistribute
       | them and am honest that it's pirated. to be clear, i'm rolling my
       | eyes at my rationalization right here). Their work on gpt-neo
       | riles me up because they do such a weak job comparing it to the
       | models whose hype they're riding. It also riles me up because so
       | many people just eat it up uncritically.
       | 
       | But it's all out of proportion. I think it's that last part (the
       | uncritical reaction) that makes me blow this out of proportion.
        
         | stellaathena wrote:
         | > Their work on GPT-Neo rules me up because they do such a weak
         | job comparing it to the models whose hype they're riding.
         | 
         | Building open source infrastructure is hard. There does not
         | currently exist a comprehensive open source framework for
         | evaluating language models. We are currently working on
         | building one (https://github.com/EleutherAI/lm-evaluation-
         | harness) and are excited to share results when we have the
         | harness built.
         | 
         | If you don't think the model works, you are welcome to not use
         | it and you are welcome to produce evaluations showing that it
         | doesn't work. We would happily advertise your eval results side
         | by side with our own.
         | 
         | I am curious where you think we are riding the hype /to/ so to
         | speak. The attention we've gotten in the last two weeks has
         | actually been a net negative from a productivity POV, as it's
         | diverted energy away from our larger modeling work towards bug
         | fixes and usability improvements. We are a dozen or so people
         | hanging out in a discord channel and coding stuff in our free
         | time, so it's not like we are making money or anything based on
         | this either.
        
         | stellaathena wrote:
         | Hi! I'm the EAI person who your criticism of the Pile is most
         | directed at. I'm curious if you read Sections 6.5 and 7 of the
         | Pile working paper and, if so, what your response to it is. As
         | you note, virtually everyone trains on copyright data and just
         | ignores any implications of that fact. I feel that our paper is
         | very upfront about this though, going as far as to have a table
         | that explicitly lists which subsets contain copyrighted text.
         | 
         | Also, I realize that you don't have any ways of knowing this
         | but we also have separated out the subset of the Pile that we
         | can confirm is licensed CC-BY-SA or more leniently. This wasn't
         | done in time for the preprint, but is in the (currently under
         | review) peer reviewed publication. Unfortunately the conference
         | rules forbid you from posting materials or updating preprints
         | between Jan 1st 2021 and the final decision announcement. But
         | we will be making the license-compliant subset of the Pile
         | public when we are able to and will give it equal prominence on
         | our website to the "full" Pile.
         | 
         | Also, we will be releasing a datasheet for the dataset but
         | again conference limitations prevent us from doing so yet.
         | 
         | If you're interested in talking about this in depth, feel free
         | to send me an email.
        
           | 6gvONxR4sf7o wrote:
           | Hi again! We had a back-and-forth about this a while back
           | regarding the paper and I think we didn't end up on the same
           | page regarding the "public data" definition in the paper
           | (found it! [0]). I love that you're upfront in the paper,
           | because it's silly how most people just don't acknowledge it
           | (though they usually don't redistribute it publicly like the
           | pile does).
           | 
           | I think the gist was us disagreeing about the relevance of
           | 
           | > _Public data_ is data which is freely and readily available
           | on the internet. This primarily excludes ... and data which
           | cannot be easily obtained but can be obtained, e.g. through a
           | torrent or on the dark web.
           | 
           | That last phrase is what got to me. It puts things in the
           | same category that feel too different. E.g. the harry potter
           | books in vs this comment I'm writing. They're both available
           | within a few clicks from the search bar (one because I put it
           | there, another because it was put up against the wishes of
           | the author and owners), but that commonality doesn't feel
           | relevant.
           | 
           | Excluding torrents especially seems like a cop out explicitly
           | to get around the issue of "X is the top result when i google
           | it" being so common as a torrent. I think you're trying to
           | exclude that content as public because then it defines too
           | much as public? But torrent vs ftp doesn't feel at all
           | relevant when it's just google plus a click or three. Or
           | searching on pirate bay plus a single click.
           | 
           | I imagine a judge looking at the copyright status of
           | someone's pirate site and saying they can't redistribute the
           | content, and the pirate responding "okay we'll take down the
           | ftp server and put up a torrent instead, so that it's not
           | public. If you google us (or search on pirate bay), the top
           | result will stop saying 'X download' and now it'll say 'X
           | download torrent'" and expecting the law to be on their side.
           | 
           | I didn't really buy the arguments in section 7 either. The
           | usage points seem legitimate, but don't cover redistribution.
           | 
           | > But we will be making the license-compliant subset of the
           | Pile public when we are able to and will give it equal
           | prominence on our website to the "full" Pile.
           | 
           | This is fantastic and I want to sincerely thank you for that.
           | 
           | I'm trying not to be combative, but I feel like publicly
           | redistributing other people's work does raise the bar quite a
           | lot higher than just using it to train.
           | 
           | [0] https://news.ycombinator.com/item?id=25616218
        
             | nl wrote:
             | I don't have a dog in this fight, but I think you should
             | re-read this: _data which cannot be easily obtained but can
             | be obtained, e.g. through a torrent or on the dark web._
             | 
             | It's an extra piece of engineering to reliably scrape
             | torrents and the dark web and exclude spam traps. "Easily
             | obtained" is probably as much about this vs the copyright
             | aspects.
             | 
             | The person you are replying to is correct in saying that
             | most people train on the "public web" (eg, common crawl
             | data). The copyright implications of this haven't been
             | tested in court as yet.
             | 
             | It is worth noting that common-crawl data is widely
             | distributed and would seem to raise the same issues you are
             | identifying here.
        
       | andyxor wrote:
       | that's not an AI
        
       | neonate wrote:
       | https://archive.is/MxlnQ
        
       | everdrive wrote:
       | People already believe garbage at a pretty alarming rate. It's
       | easy to guess at a number of possible outcomes here:
       | 
       | - More junk text moves the public to doubt legitimate information
       | even further than they currently do.
       | 
       | - There is so much human-generated junk text that adding more of
       | it via AI actually doesn't have much of an effect.
       | 
       | - People return to lean on experts, perhaps even more than
       | before. (just as a number of tech-literate folks have now
       | returned to relying on brand name.)
       | 
       | Speculation is easy of course, so who knows what will actually
       | happen.
        
         | hanniabu wrote:
         | > People return to lean on experts
         | 
         | The problem with this is that people look at anybody confirming
         | their bias as an expert. I can't tell you how many FB posts
         | I've seen where some armchair poster claims that a researcher
         | is wrong because of xyz and it's being reposted thousands of
         | times.
        
         | burlesona wrote:
         | I think we may come to see the era of roughly 1990-2010 as the
         | golden age of information: relative abundance creating new
         | opportunity, before the noise drowned it all out.
         | 
         | I suspect that in the future people will, ironically, return
         | more strictly to tribal knowledge, as the media and the
         | internet will be (already is) a vast ocean from which you can
         | pull anything you want to believe. Thus nothing you see or hear
         | from mass media or the internet can be trusted, there are no
         | experts, and you go back to information scarcity as you have to
         | rely on your immediate human network for trust. Actually I
         | think we're already seeing the return to tribal authority, the
         | early waves are already here on Facebook and YouTube... they
         | just haven't devolved to strictly local circles of trust yet.
        
         | api wrote:
         | Concrete prediction: There will be a global cult similar in
         | nature to Qanon driven by an AI spitting out generated bullshit
         | within the next ten years.
         | 
         | That's assuming some percentage of Qanon word salad isn't the
         | output of Markov chain generators. A lot of it resembles low-
         | order statistical text generator output after having been
         | trained on a corpus of 1990s Usenet alt.conspiracy and the
         | Protocols of the Elders of Zion.
        
         | cblconfederate wrote:
         | People believe what's believable (even if backed up by
         | garbage). GPTs dont make believable stuff, but they can be used
         | to flower up some b.s. idea. Nothing that can't be done with a
         | few hired trolls, and the proliferation of garbage will
         | endanger the troll industry, as people will start becoming
         | suspicious. So i doubt its impact can go beyond generating spam
         | and noise.
        
         | isolli wrote:
         | > just as a number of tech-literate folks have now returned to
         | relying on brand name
         | 
         | out of curiosity, what are you referring to?
        
           | everdrive wrote:
           | In the early-ish days of the consumer internet, consumers had
           | a new and huge information advantage over companies. People
           | moved from relying on brand name, to reading online reviews.
           | Often finding niche brands which they had otherwise not heard
           | of.
           | 
           | Now, in 2021, that experience is flipped on its head. Amazon
           | reviews are gamed and cannot be trusted. Companies build
           | niche brands like fly-by-night companies, and the lesser
           | known brands have a very high chance of being both seriously
           | inferior, and also short lived.
           | 
           | At least, this has been my experience, and the experience of
           | some others.
           | 
           | [edit]
           | 
           | And as further anecdotal proof that things have come full
           | circle, my elderly mother in law keeps getting tricked by
           | Amazon purchases. "The reviews were good," she'll say before
           | returning something.
        
             | whimsicalism wrote:
             | I just rely on reviewers like NYT Wirecutter and then buy
             | whatever the reviewers suggest (and is cheap) on Amazon.
        
               | [deleted]
        
         | [deleted]
        
         | ElFitz wrote:
         | True. But a simple API to generate junk text? It can scale,
         | cheaply, beyond measure.
         | 
         | No need for a troll farm, hiring, managing and training tens or
         | hundreds of people.
         | 
         | A reasonable amount of cash, a bit of motivation, some moderate
         | technical skills, and voila! Anyone can compete with the
         | Russian troll farms now and build their own networks of
         | hundreds or hundreds of thousands sufficiently credible (as
         | humans) fake accounts spewing garbage and patting each other on
         | the back via likes, retweets and whatnots.
         | 
         | All with the appropriate fake news blogs and sites happily
         | churning out grammatically correct nonsense that makes (enough)
         | sense.
         | 
         | Basically, this kid's dream:
         | https://www.nbcnews.com/news/world/fake-news-how-partying-ma...
        
           | jackTheMan wrote:
           | But even Russian/Chinese bots can step up as it is much
           | easier now to flood forums (like reddit) etc, where e.g. a
           | China critic article appears to kill any discussion.
        
             | luckylion wrote:
             | I find it easier to identify humans that flood forums
             | though. Especially non-native speakers usually are somewhat
             | easy to spot, I assume that's true in any language. That's
             | different for ML-generated texts. On the other hand, human
             | texts are more "on message", but if all you want to do is
             | create noise, I guess you don't need to have targeted
             | communications.
        
               | TheAdamAndChe wrote:
               | > if all you want to do is create noise, I guess you
               | don't need to have targeted communications.
               | 
               | This is key in anti-extremist operations on anonymous
               | boards. 4chan and other similar sites are absolutely
               | nothing like they were a decade ago, I presume because of
               | such bots flooding them with noise.
        
             | [deleted]
        
             | ElFitz wrote:
             | Oh, definitely. Existing operations also absolutely can
             | leverage that in order to amplify their reach and
             | capabilities.
        
           | TriNetra wrote:
           | And unfortunatley, the web will be forced to move toward
           | verified human identities to fight with such junk and
           | anonymous browsing will become a thing of the history.
        
         | UnFleshedOne wrote:
         | There is a market (well, a need at least) for nonsense
         | detectors that work similarly to the way ad blockers work.
         | Detect internal inconsistencies, non-sequiturs, low information
         | density and other similar reasons to avoid reading the text --
         | and visibly flag or block that.
         | 
         | That should eliminate 80%+ of existing human generated text
         | content and lead to text generators composing useful articles.
        
       | mfDjB wrote:
       | It's very nice to see Eleuther fulfill the Open promise of
       | OpenAI.
       | 
       | I'm scared that more and more big model advancements are being
       | denied access from the general public, which will just make the
       | inequality between big corporations and startups even greater.
        
       ___________________________________________________________________
       (page generated 2021-03-29 23:01 UTC)