hngopher.com

       [HN Gopher] Google says AI generated content is against guidelines
       ___________________________________________________________________
        
       Google says AI generated content is against guidelines
        
       Author : rolph
       Score  : 95 points
       Date   : 2022-04-09 16:50 UTC (6 hours ago)
        
 (HTM) web link (www.searchenginejournal.com)
 (TXT) w3m dump (www.searchenginejournal.com)
        
       | ohashi wrote:
       | AI for me, not for thee.
       | 
       | Google shouldn't be the arbiter of what's ok or not on the
       | internet. They use AI to take away all human recourse with the
       | company but want to tell others not to use it? It's a pretty
       | laughable position. Good luck trying to detect GPT-3 and the like
       | when you compare with non native speakers of languages. Are you
       | suddenly going to just block them too?
       | 
       | If an AI can generate high quality content, why is that any less
       | than human generated content. Human generate a ton of trash
       | content, it's not inherently better.
       | 
       | Those same models on copilot are generating useful and often good
       | code for me on a daily basis. If someone told me it's not OK to
       | use any copilot generated code as unethical/wrong I'd laugh in
       | their face. It's basically saving me a google search to find
       | snippets/examples of things I wasn't sure of.
       | 
       | Maybe that's the threat? If we have access to AI directly (ala
       | copilot) then I am googling less.
        
         | shadowgovt wrote:
         | That's the point. If it _is_ detectable, it 'll get downranked
         | for being low-quality.
        
           | deevolution wrote:
           | What about high quality ai content? What if someday the
           | content generated by ai is actually more useful/higher
           | quality than the human generated content we just shoot
           | outself in the foot because it's ai generated. Seems
           | discriminatory to me!
        
           | hackernewds wrote:
           | Which seems fair. I do not want to be bombarded by AI
           | generated media and content that can be created at infinite
           | volume and speed. Google is taking a massive step here for
           | the good.
        
             | jimmaswell wrote:
             | I've seen a few AI generated pages in top results that felt
             | like having a stroke trying to read. I hope these can be
             | eliminated.
        
       | aaaaaaaaaaab wrote:
       | Fair enough. Google-generated content is likewise against my
       | guidelines.
        
       | Trias11 wrote:
       | If google can randomly penalize people and businesses with no
       | chance of recourse
       | 
       | It's certainly good idea to penalize google with intelligent AI-
       | generating content
        
       | sva_ wrote:
       | So Google would need to build a discriminator that detects
       | machine-generated content. It will be interesting to see these
       | discriminators fight the generators of other big companies.
       | 
       | I'll be taking a front-seat row watching that show, if it
       | happens. Perhaps in the future, we'll have to deal with
       | discriminators that approximate some originality-index. Will be
       | fun fighting with those algorithms, to interact with the internet
       | as a normal user (to some extent we already do - proving that
       | you're human is becoming more and more tedious.)
        
         | notahacker wrote:
         | In practice I think it's more _Google now has another policy
         | reason to banhammer prolific and irritating blogspammers_ than
         | an arms race Google has a chance in.
         | 
         | Google isn't yet effective at detecting blogspam generated by
         | naive scripts that simply swap words in the source material for
         | other words in a thesaurus. I don't think they're going to
         | start picking up continuity errors, factual errors or
         | "weirdness" in GPT-3 - which often satisfies human readers -
         | any time soon.
        
           | notreallyserio wrote:
           | Google engineers can't even filter out GH or SO scraped sites
           | like gitmemory, nor offer a way to let users block these
           | sites. I'm not sure we should expect them to handle more
           | advanced techniques like detecting word swaps any time soon.
        
             | omnicognate wrote:
             | s/can't/won't/
        
               | notreallyserio wrote:
               | Google search has been poor for years. I think it's time
               | to say can't.
        
           | zitterbewegung wrote:
           | I am unsure about that but using GPT-3 in that manner would
           | certainly trigger OpenAI's automatic or manual systems and
           | would violate their ToS and your account would be locked.
        
             | Oras wrote:
             | With GPT-3, yes but how about other generators? There are
             | lots of emerging services that spammers can use.
        
             | htrp wrote:
             | Does OpenAI actually have those systems? My impression is
             | they are happy to take your money and just tell you not to
             | do those un-nice things
        
         | ganzuul wrote:
         | You need to be computationally irreducible through self-
         | attention and conscious self-interaction.
         | 
         | By the same criterion we may need to consider the Wattage cost
         | of intelligence when granting rights to AI. Their kernels needs
         | to be aligned with evolutionary wisdom encoded in our highest
         | motivations.
        
           | mjburgess wrote:
           | a conversation for Q and Commander Data
        
             | ganzuul wrote:
             | Filmed in front of a live audience! [Applause]
        
       | OliverJones wrote:
       | Neal Stephenson's "Fall, or Dodge In Hell" has a subplot driven
       | by a news-story-generating AI run amok. The AI uses reader
       | engagement as the metric (along with really great natural-
       | language stuff). But it doesn't have any truth metric. So, at the
       | time of the book, the widely believed AI's stories have
       | constructed an alternate reality for many people, reinforcing the
       | present polarization of media in the US.
       | 
       | That presents quite a threat profile. It's far more pernicious
       | than SEO script kiddles doing whatever passes for keyword
       | stuffing in 2022.
       | 
       | I hope the search crawler folks at Google are working hard to
       | detect that sort of thing and prevent it from getting into
       | indexes. Let's hope Neal Stephenson isn't as right about that
       | threat as Arthur C. Clarke was about geosynchronous
       | communications satellites.
        
         | syrrim wrote:
         | That's basically what the Q larp, or realrawnews, is already
         | doing today. If it's profitable to do, it doesn't take a rogue
         | AI to make up nonsense and spread it to the masses.
        
       | bestcoder69 wrote:
       | Seems like there should be a carve-out for content clearly marked
       | as AI-generated. I wonder if the SEO hit is why I haven't been
       | able to find too many others posting funny GPT-3 outputs like
       | (Disclosure incoming:) mine.
       | 
       | And if I can peddle my blog's 'content': here's Trump announcing
       | he's (not?) trans.
       | 
       | > I have been so concerned about this issue, I've held back from
       | telling you that I'm transgender. I'm not transgender, but I'm so
       | proud of the transgender community and their rights.
       | 
       | > The Democrats have been so horrible to the transgender
       | community. They've made them live in these little closets, ya'
       | know? And they've tried to force them to use certain restrooms.
       | 
       | Love messin' with gpt-3.
        
       | mkl95 wrote:
       | That AI generated content is automated SEO, which is mostly a
       | bunch of heuristics to please Google's ranking algorithms. Blame
       | the algorithms, not the people who reverse engineer them.
        
         | admax88qqq wrote:
         | Why blame the algorithms rather than the spammers who reverse
         | engineer them to get their affiliate link littered ai generated
         | "reviews" to the top of the search results.
         | 
         | In the arms race between Google and spammers, honest websites
         | sometimes get caught in the crossfire. For some reason lately
         | lots of people want to blame google for this and not the
         | spammers.
        
           | sdoering wrote:
           | Why must I'd be necessarily either/or, black/white,
           | Google/Spammer?
           | 
           | Why couldn't it be that reasonably both sides are to blame.
           | Google enables the commodification of search results. Yes
           | they claim that they want to show the best result. But best
           | for whom? I don't believe that Google is a neutral party as
           | they earn more if they promote sites that lead to additional
           | advertising clicks. How should users know that there are
           | often better results on pages 3 or 4 and below.
           | 
           | And than there are the ones trying to make a living with
           | minimal effort. No need to create great content. Good enough
           | is sufficient. As long as the content is optimized and the
           | page receives relevant traffic from search the revenue via
           | affiliate links is secured.
           | 
           | In this arm's race the other sites loose. They are the
           | collateral between two fighting parties in this arms race.
           | And also people loose. Loose great content and the diversity
           | of the net.
        
             | admax88qqq wrote:
             | Thankfully Google operates in a free market where it is
             | possible to compete by building a better search algorithm.
             | 
             | I don't buy into the position that all this great content
             | and diversity of content is lost because of Googles
             | algorithm. It isn't as discoverable as the mainstream
             | content that Google search returns but that's no different
             | than being in a world where Google didn't exist. Except for
             | perhaps in such a world people would have different content
             | discovery habits.
             | 
             | Complaining that Google doesn't surface your preferred set
             | of "great content" is like complaining that prime time
             | cable TV only shows lame sitcom reruns. It's true, but it
             | doesn't prevent you from buying HBO, or Prime Video.
        
           | supernovae wrote:
           | Because we wouldn't be here if Google had supported other
           | means of natural search engine inclusion through quality
           | metrics vs their current only option of pay to play or
           | cheating.
        
           | adhesive_wombat wrote:
           | I can't see any other reason for spam clone farms to outrank
           | the sites they clone except either more incompetence then I'd
           | ascribe to Google, or because the spam farm is full to the
           | gunwhales with ads which earn money for Google.
        
           | mkl95 wrote:
           | Google make a large chunk of their money from ads. If you
           | create some SaaS where users can automate their site's SEO by
           | using GPT or whatever, and it is reasonably priced, you could
           | end up competing with Google Ads. Google want to prevent
           | that.
        
       | endisneigh wrote:
       | Can someone describe to me a search engine that's immune to such
       | problems and also searches a large variety of old and new sites?
        
         | majormajor wrote:
         | We often talk about this problem as if we expect Google to
         | solve a social and economic people problem purely through math
         | alone.
         | 
         | The appeal of open-to-all search as a way of navigating the web
         | was that there was a huge long tail of interesting stuff that
         | would be hard to manually index and categorize.
         | 
         | If the long tail of interesting stuff has fully drowned in a
         | sea of spam and crap, I'm not sure that it still makes that
         | much sense over something smaller but human-curated.
         | 
         | Perhaps the trick would be human curation with extensive and
         | always-evolving AI tools to speed up the curation. You have to
         | get past the filter to get in, versus being in by default
         | unless you're blatant enough to get banned. There is a layer of
         | human judgement in addition to the algorithm's score of the
         | content, and additionally that gives you an extra scoring
         | factor on the algorithms yourself - the humans should be able
         | to help direct the development of the algorithm to fight spam
         | more preemptively.
         | 
         | Would a mix of that give us a bigger internet than the
         | entirely-manual "web directory" days of 1997 or so, but a less-
         | shit-filled one than today's?
        
       | ocdtrekkie wrote:
       | Oh the irony. The company that lets AI determine what it thinks
       | is factual says using AI to generate content is bad.
        
       | peterisza wrote:
       | They want to use the internet to train their models. AI generated
       | text would contaminate the training data.
        
       | hidroto wrote:
       | perhaps they don't want the datasets that they use to train the
       | AI to be watered down by other AI generated content. after all
       | alot of the text data was sourced from the internet.
        
       | magicalist wrote:
       | > _" For us these would, essentially, still fall into the
       | category of automatically generated content which is something
       | we've had in the Webmaster Guidelines since almost the
       | beginning._
       | 
       | > _And people have been automatically generating content in lots
       | of different ways. And for us, if you're using machine learning
       | tools to generate your content, it's essentially the same as if
       | you're just shuffling words around, or looking up synonyms, or
       | doing the translation tricks that people used to do. Those kind
       | of things._
       | 
       | > _My suspicion is maybe the quality of content is a little bit
       | better than the really old school tools, but for us it's still
       | automatically generated content, and that means for us it's still
       | against the Webmaster Guidelines. So we would consider that to be
       | spam. "_
       | 
       | So are people reacting thinking this is a new policy or...?
        
         | hackernewds wrote:
         | "This is okay since it has been policy" is that same vibe as
         | "We have to bump you off the plane because it is company
         | policy"
        
         | Traster wrote:
         | I think it's difficult to marry this policy with the _other_
         | stuff Google is claiming to be dramatically transformational.
         | If Google came out and said  "Hey, this self-driving stuff
         | isn't really dissimilar from traditional driving assist." there
         | would be some questions to answer. Which of course, is what the
         | regulators actually say.
        
       | npunt wrote:
       | This internal tension between chasing AI tooling and avoiding AI-
       | generated content is just a prelude to the bigger shift of search
       | engines getting reinvented around generated results instead of
       | found results.
       | 
       | Fast forward 10+ years and for knowledge-related queries search
       | is going to be more about generated results personalized to our
       | level of understanding that at best quote pages, and more likely
       | just reference them in footnotes as primary inputs.
       | 
       | These knowledge-related queries are where most content farms, low
       | quality blogs, and even many news sites get traffic from today.
       | If the balance of power between offense (generating AI content)
       | and defense (detecting AI content) continues to favor offense,
       | there will be a strong incentive to just throw the whole thing
       | out and go all-in on generated results.
       | 
       | Big question is how incentives play out for the people gathering
       | the knowledge about the world, which is the basis for generated
       | results. Right now many/most make money with advertising, but so
       | do content farms, and more generation means more starving of that
       | revenue source. Wikipedia is an alternative model for this
       | knowledge, but they only cover a certain portion of factual info
       | that people want to know and if search uses it more, will become
       | more of a single point of failure.
       | 
       | Really interesting stuff ahead.
        
       | asar wrote:
       | Is this even technically possible? To me, this reads like an
       | empty threat.
        
         | ganzuul wrote:
         | Hey would you mind solving some captchas? Nothing personal,
         | just a little paranoid.
        
           | asar wrote:
           | Great point! I think with captchas, some humans might be able
           | to identify obviously made up facts and very poor sentence
           | structure to tag a text as low quality. But I don't think
           | you'd be able to give a reliable assessment on whether a text
           | was AI generated or not. Especially if it's a random group of
           | people (unfamiliar with modern content generation) filling
           | out the captchas.
        
             | ganzuul wrote:
             | Captchas can be made adversarial. Not sure if it's a good
             | idea to try that with text since we don't know how humans
             | react. Maybe that's what the Phenomenon is about?
        
       | sumoboy wrote:
       | So it's ok for google to generate <title> tags, google ad
       | headlines, and email assistance with AI while new agencies for
       | awhile robo generate articles, the real issue is google knows
       | this is gaming seo and they will struggle against this which will
       | only get better.
        
       | [deleted]
        
       | MrPatan wrote:
       | AI for me, not for thee?
        
       | ceejayoz wrote:
       | Can't wait to be delisted from Google over a false positive with
       | zero recourse.
        
         | [deleted]
        
       | Animats wrote:
       | Most of the content Google generates is either AI-generated or
       | plagarized.
        
       | Shadonototra wrote:
       | it is very important to cover the legal aspect of such thing now
       | 
       | otherwise some dumb people will want ai generated content to be
       | allowed everywhere
       | 
       | i'm pretty sure they are doing their move now because it causes
       | them ton of issues with YouTube
       | 
       | allowing ai generated content, would mean allowing ai generated
       | comments on youtube, wich is already happening and causes lot of
       | issues
       | 
       | if you can't tell what is AI generated and use
       | comments/discussions/like/dislike in your algorithms for ranking
       | videos, then it'll be very easy for 3rd parties to push and play
       | the game, including ad revenue
       | 
       | the inevitable will come sooner rather than later, get ready for
       | your online passport!
        
       | fny wrote:
       | There's an interesting variant of the Turing test here: develop
       | an AI sufficiently intelligent to distinguish human content from
       | AI generated content.
       | 
       | I might be wrong, but I think this might be a more difficult task
       | than generating convincing dialogue: its very easy to generate
       | text that statistically resembles human writing, and generators
       | trained on certain topics (i.e. some science niche) might be
       | impossible to flag.
        
       | mensetmanusman wrote:
       | In theory GTP-3 could fill the entire internet with approximately
       | true but false information. Wikipedia, comment boxes, blogs,
       | Wordpress, everything.
       | 
       | When the percentage of human generated content approaches
       | 0.0000%, what does that internet look like?
        
         | aaaaaaaaaaab wrote:
         | We'll go back to invite-only forums.
        
         | WithinReason wrote:
         | "There is another theory which states that this has already
         | happened."
        
           | imranq wrote:
           | So your post was generated by a bot? And I suppose this reply
           | was also generated by a bot? Its bots all the way down
        
             | ChrisGranger wrote:
             | It's a _Hitchhiker 's Guide to the Galaxy_ reference.
        
               | EarlKing wrote:
               | "It's turtles all the way down."
               | 
               | In the process of remembering that little tidibt it
               | reminded me of a short story I read once that was
               | circulated without attribution for a few decades, i.e.
               | Terry Bissom's "They're Made Out of Meat"[1] published in
               | OMNI in 1990. "They're meat all the way through."
               | 
               | [1] https://web.archive.org/web/20190501130711/http://www
               | .terryb...
        
         | amelius wrote:
         | And how would GPT-3 evolve when it starts feeding on its own
         | output?
        
           | visarga wrote:
           | Use another model to filter out generated text?
        
             | amelius wrote:
             | I assume they will use a GAN to evade Google's policy.
        
         | BlueTemplar wrote:
         | For the Web, we'll just go back to webrings and directly
         | sharing links with people we know to be real ?
         | 
         | (Another risk might be governments-enforced identification, no
         | more pseudonymity !)
        
         | dragonwriter wrote:
         | > In theory GTP-3 could fill the entire internet with
         | approximately true but false information.
         | 
         | Much of the internet is currently full of not-even-
         | approximately true (and often maliciously false) information,
         | so I'm not particular worried about that.
        
         | EarlKing wrote:
         | Exactly like the one we have now: A cacophonous cesspit filled
         | with the mental diarrhea of a million bots sharting their
         | opinions into the void in a desperate bid to either sway your
         | opinion to their cause (whether commerce or politics), or bury
         | you under such an avalanche of bullshit that you remain
         | paralyzed with indecision.
         | 
         | And the result of this? Genuine conversation progressively
         | retreats from the internet at large as it becomes overrun with
         | the intellectual flatus of the bot wars, and people move
         | further and further into silos in which bot behavior can be
         | spotted and eliminated.
         | 
         | Welcome to the future. How do you like it, gentlemen? All your
         | posts are belong to us.
        
           | mjburgess wrote:
           | We find ways to route around it. You're presently using one.
        
             | EarlKing wrote:
             | Reread the second paragraph. We're in agreement.
        
             | eslaught wrote:
             | Really? I think HN has even fewer safeguards against AI-
             | generated content than Google. No offense to dang and co.,
             | and I'm sure there's more going on there than I'm aware of.
             | But still, I'm pretty sure it would be trivial to set up an
             | account and use GPT-3 to produce the content. The only
             | reason I'd suspect this isn't happening is because there
             | isn't a strong financial incentive to do so. In other
             | words, HN avoids the spam because it's still too small to
             | matter.
        
               | gus_massa wrote:
               | They will be downvoted into Oblidon. And then banned,
               | shadow-banned, hell-banned and a few more creative ways
               | of banning. The account, the IP, the site, and perhaps
               | anyone that a bayesian filter put in the same bucket.
        
               | lordnacho wrote:
               | Can't be that hard to train GPT on HN comments. Plus if
               | someone were to do it, they probably know of HN.
               | 
               | I could definitely see someone already having trained a
               | bot to write HN comments and posting them.
               | 
               | What's anyone going to do about it? It's super hard to
               | write a discriminator that works well enough to not
               | destroy the site for everyone.
        
               | dragonwriter wrote:
               | > Can't be that hard to train GPT on HN comments.
               | 
               | Yes, which will produce things that are stylistically
               | similar to HN comments, but without any connection to
               | external reality beyond the training data and prompt.
               | 
               | That _might_ provide believable comments, but not things
               | likely to be treated as _high-quality_ ones, and not
               | virtual posters that respond well to things like
               | moderation warnings from dang.
        
         | babyshake wrote:
         | We will likely find out. And it may happen in a way that is
         | fast enough to be very disorienting.
        
         | Hayarotle wrote:
         | Above a certain percentage it's going to poison human-generated
         | content too. You will have to discern between ai-generated
         | content, ai-influenced-human generated content and genuine
         | human-generated content.
         | 
         | One could argue it's already happening. How many of the people
         | we talk to everyday get their facts from SEO-spam websites and
         | Google instant answers (which often sources its content from
         | such websites)? Even if we avoid AI-generated content, we might
         | be gettting fed it by proxy.
        
           | ganzuul wrote:
           | Human filtering of AI creativity might work, but deepfakes
           | mismatch with that. Personally I decided to make it a pattern
           | that I unsubscribe from channels that use deepfakes since I
           | saw Internet Historian using it and possibly adding to the
           | already crippling confusion regarding UAPs. - IH is not a
           | credible source anyway, but you can easily use the clips they
           | produced without attribution.
           | 
           | I think the world will be a better place if everybody follows
           | a similar pattern. The only reason to use deepfakes is if the
           | victim who's identity is being stolen is not cooperating with
           | you. - It's a new way to violate a person's integrity and
           | their right to agency in our already fading grasp of reality.
           | You could probably gaslight your girlfriend with it, if you
           | are incomprehensibly evil.
        
             | DaltonCoffee wrote:
             | >You could probably gaslight your girlfriend with it, if
             | you are incomprehensibly evil.
             | 
             | I'm honestly a bit more concerned with people gaslighting
             | courts.
             | 
             | The technology isn't going away, unfortunately. Society
             | will have to adapt to these new invasive norms, as they
             | already have time and again in the past.
        
         | ajsnigrutin wrote:
         | > approximately true but false
         | 
         | > When the percentage of human generated content approaches
         | 0.0000%, what does that internet look like?
         | 
         | ...approximately the same, but less false :)
        
           | josefx wrote:
           | Given that the AI are trained on human generated content to
           | be human like without having understanding of the content, I
           | would think approximately the same but even less correct.
        
           | EarlKing wrote:
           | A bot would say that.
        
             | ajsnigrutin wrote:
             | Are you accusing me of being a bot?!
             | 
             | This is a lie, i sw
             | 
             | <NUL>
             | 
             | <NUL>
             | 
             | Segmentation fault
        
         | hprotagonist wrote:
         | This is a minor plot-point in Anathem.
        
       | thorum wrote:
       | If I'm understanding the quoted interview correctly, Google is
       | talking about AI generated spam - like when you ask GPT-3 to
       | write you an article about XYZ topic and it spits out 500 words
       | of well-written, plausible sounding gibberish - that you throw up
       | on your website to try to rank in the search engines.
       | 
       | However, they seem to be leaving open the possibility of AI-
       | assisted writing, where a human comes up with the information and
       | guides the AI as it puts that information into words.
       | 
       | > _From our recommendation we still see it as automatically
       | generated content. I think over time maybe this is something that
       | will evolve in that it will become more of a tool for people.
       | Kind of like you would use machine translation as a basis for
       | creating a translated version of a website, but you still work
       | through it manually._
       | 
       | > _And maybe over time these AI tools will evolve in that
       | direction that you use them to be more efficient in your writing
       | or to make sure that you're writing in a proper way like the
       | spelling and the grammar checking tools, which are also based on
       | machine learning. But I don't know what the future brings there._
       | 
       | In my opinion GPT-3 has already reached the point of being useful
       | for this purpose - there are several GPT-3 based apps that do
       | exactly what he's describing.
        
         | amelius wrote:
         | I'm glad that I'm not one of the human moderators having to
         | read GPT-3 gibberish on a daily basis.
        
           | uuyi wrote:
           | I was a moderator for a large Internet forum. GPT-3 is far
           | more coherent than a lot of humans.
        
             | DonHopkins wrote:
             | If it's simply a quality issue, then at the point that AI
             | generated content becomes better than human generated
             | content, will Google ban human generated content?
        
               | thorum wrote:
               | At that point they'll probably just have their own AI
               | that generates the perfect response to any search query,
               | and return it as the top result.
        
               | MiddleEndian wrote:
               | After getting this result
               | https://i.redd.it/4wrj2xpp75s81.png a couple days ago, I
               | am not confident that will be any time soon.
        
               | [deleted]
        
               | [deleted]
        
         | neilv wrote:
         | A friend (university science lab technician, in a field that
         | didn't pay like CS) would make approx. $2/hour extra money in
         | the evenings, from some Web site that directed her what topics
         | to write articles about. Google the topic, rapidly skim,
         | distill/rephrase it to a certain word count. (I suggested she
         | could make a lot more working in a cafe, but she was physically
         | exhausted from being on feet all day in lab, with lots of
         | moving around heavy objects.)
         | 
         | I guessed it was used for filler content for SEO sites.
         | 
         | A question is whether that company would save money by using
         | "AI" text generation, when they were paying real humans so
         | little, for arguably higher quality.
        
       | walrus01 wrote:
       | Maybe this wouldn't be a problem if Google did better at
       | distinguishing between seo stuffing content farms and actual
       | websites.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-04-09 23:00 UTC)