[HN Gopher] Google says AI generated content is against guidelines ___________________________________________________________________ Google says AI generated content is against guidelines Author : rolph Score : 95 points Date : 2022-04-09 16:50 UTC (6 hours ago) (HTM) web link (www.searchenginejournal.com) (TXT) w3m dump (www.searchenginejournal.com) | ohashi wrote: | AI for me, not for thee. | | Google shouldn't be the arbiter of what's ok or not on the | internet. They use AI to take away all human recourse with the | company but want to tell others not to use it? It's a pretty | laughable position. Good luck trying to detect GPT-3 and the like | when you compare with non native speakers of languages. Are you | suddenly going to just block them too? | | If an AI can generate high quality content, why is that any less | than human generated content. Human generate a ton of trash | content, it's not inherently better. | | Those same models on copilot are generating useful and often good | code for me on a daily basis. If someone told me it's not OK to | use any copilot generated code as unethical/wrong I'd laugh in | their face. It's basically saving me a google search to find | snippets/examples of things I wasn't sure of. | | Maybe that's the threat? If we have access to AI directly (ala | copilot) then I am googling less. | shadowgovt wrote: | That's the point. If it _is_ detectable, it 'll get downranked | for being low-quality. | deevolution wrote: | What about high quality ai content? What if someday the | content generated by ai is actually more useful/higher | quality than the human generated content we just shoot | outself in the foot because it's ai generated. Seems | discriminatory to me! | hackernewds wrote: | Which seems fair. I do not want to be bombarded by AI | generated media and content that can be created at infinite | volume and speed. Google is taking a massive step here for | the good. | jimmaswell wrote: | I've seen a few AI generated pages in top results that felt | like having a stroke trying to read. I hope these can be | eliminated. | aaaaaaaaaaab wrote: | Fair enough. Google-generated content is likewise against my | guidelines. | Trias11 wrote: | If google can randomly penalize people and businesses with no | chance of recourse | | It's certainly good idea to penalize google with intelligent AI- | generating content | sva_ wrote: | So Google would need to build a discriminator that detects | machine-generated content. It will be interesting to see these | discriminators fight the generators of other big companies. | | I'll be taking a front-seat row watching that show, if it | happens. Perhaps in the future, we'll have to deal with | discriminators that approximate some originality-index. Will be | fun fighting with those algorithms, to interact with the internet | as a normal user (to some extent we already do - proving that | you're human is becoming more and more tedious.) | notahacker wrote: | In practice I think it's more _Google now has another policy | reason to banhammer prolific and irritating blogspammers_ than | an arms race Google has a chance in. | | Google isn't yet effective at detecting blogspam generated by | naive scripts that simply swap words in the source material for | other words in a thesaurus. I don't think they're going to | start picking up continuity errors, factual errors or | "weirdness" in GPT-3 - which often satisfies human readers - | any time soon. | notreallyserio wrote: | Google engineers can't even filter out GH or SO scraped sites | like gitmemory, nor offer a way to let users block these | sites. I'm not sure we should expect them to handle more | advanced techniques like detecting word swaps any time soon. | omnicognate wrote: | s/can't/won't/ | notreallyserio wrote: | Google search has been poor for years. I think it's time | to say can't. | zitterbewegung wrote: | I am unsure about that but using GPT-3 in that manner would | certainly trigger OpenAI's automatic or manual systems and | would violate their ToS and your account would be locked. | Oras wrote: | With GPT-3, yes but how about other generators? There are | lots of emerging services that spammers can use. | htrp wrote: | Does OpenAI actually have those systems? My impression is | they are happy to take your money and just tell you not to | do those un-nice things | ganzuul wrote: | You need to be computationally irreducible through self- | attention and conscious self-interaction. | | By the same criterion we may need to consider the Wattage cost | of intelligence when granting rights to AI. Their kernels needs | to be aligned with evolutionary wisdom encoded in our highest | motivations. | mjburgess wrote: | a conversation for Q and Commander Data | ganzuul wrote: | Filmed in front of a live audience! [Applause] | OliverJones wrote: | Neal Stephenson's "Fall, or Dodge In Hell" has a subplot driven | by a news-story-generating AI run amok. The AI uses reader | engagement as the metric (along with really great natural- | language stuff). But it doesn't have any truth metric. So, at the | time of the book, the widely believed AI's stories have | constructed an alternate reality for many people, reinforcing the | present polarization of media in the US. | | That presents quite a threat profile. It's far more pernicious | than SEO script kiddles doing whatever passes for keyword | stuffing in 2022. | | I hope the search crawler folks at Google are working hard to | detect that sort of thing and prevent it from getting into | indexes. Let's hope Neal Stephenson isn't as right about that | threat as Arthur C. Clarke was about geosynchronous | communications satellites. | syrrim wrote: | That's basically what the Q larp, or realrawnews, is already | doing today. If it's profitable to do, it doesn't take a rogue | AI to make up nonsense and spread it to the masses. | bestcoder69 wrote: | Seems like there should be a carve-out for content clearly marked | as AI-generated. I wonder if the SEO hit is why I haven't been | able to find too many others posting funny GPT-3 outputs like | (Disclosure incoming:) mine. | | And if I can peddle my blog's 'content': here's Trump announcing | he's (not?) trans. | | > I have been so concerned about this issue, I've held back from | telling you that I'm transgender. I'm not transgender, but I'm so | proud of the transgender community and their rights. | | > The Democrats have been so horrible to the transgender | community. They've made them live in these little closets, ya' | know? And they've tried to force them to use certain restrooms. | | Love messin' with gpt-3. | mkl95 wrote: | That AI generated content is automated SEO, which is mostly a | bunch of heuristics to please Google's ranking algorithms. Blame | the algorithms, not the people who reverse engineer them. | admax88qqq wrote: | Why blame the algorithms rather than the spammers who reverse | engineer them to get their affiliate link littered ai generated | "reviews" to the top of the search results. | | In the arms race between Google and spammers, honest websites | sometimes get caught in the crossfire. For some reason lately | lots of people want to blame google for this and not the | spammers. | sdoering wrote: | Why must I'd be necessarily either/or, black/white, | Google/Spammer? | | Why couldn't it be that reasonably both sides are to blame. | Google enables the commodification of search results. Yes | they claim that they want to show the best result. But best | for whom? I don't believe that Google is a neutral party as | they earn more if they promote sites that lead to additional | advertising clicks. How should users know that there are | often better results on pages 3 or 4 and below. | | And than there are the ones trying to make a living with | minimal effort. No need to create great content. Good enough | is sufficient. As long as the content is optimized and the | page receives relevant traffic from search the revenue via | affiliate links is secured. | | In this arm's race the other sites loose. They are the | collateral between two fighting parties in this arms race. | And also people loose. Loose great content and the diversity | of the net. | admax88qqq wrote: | Thankfully Google operates in a free market where it is | possible to compete by building a better search algorithm. | | I don't buy into the position that all this great content | and diversity of content is lost because of Googles | algorithm. It isn't as discoverable as the mainstream | content that Google search returns but that's no different | than being in a world where Google didn't exist. Except for | perhaps in such a world people would have different content | discovery habits. | | Complaining that Google doesn't surface your preferred set | of "great content" is like complaining that prime time | cable TV only shows lame sitcom reruns. It's true, but it | doesn't prevent you from buying HBO, or Prime Video. | supernovae wrote: | Because we wouldn't be here if Google had supported other | means of natural search engine inclusion through quality | metrics vs their current only option of pay to play or | cheating. | adhesive_wombat wrote: | I can't see any other reason for spam clone farms to outrank | the sites they clone except either more incompetence then I'd | ascribe to Google, or because the spam farm is full to the | gunwhales with ads which earn money for Google. | mkl95 wrote: | Google make a large chunk of their money from ads. If you | create some SaaS where users can automate their site's SEO by | using GPT or whatever, and it is reasonably priced, you could | end up competing with Google Ads. Google want to prevent | that. | endisneigh wrote: | Can someone describe to me a search engine that's immune to such | problems and also searches a large variety of old and new sites? | majormajor wrote: | We often talk about this problem as if we expect Google to | solve a social and economic people problem purely through math | alone. | | The appeal of open-to-all search as a way of navigating the web | was that there was a huge long tail of interesting stuff that | would be hard to manually index and categorize. | | If the long tail of interesting stuff has fully drowned in a | sea of spam and crap, I'm not sure that it still makes that | much sense over something smaller but human-curated. | | Perhaps the trick would be human curation with extensive and | always-evolving AI tools to speed up the curation. You have to | get past the filter to get in, versus being in by default | unless you're blatant enough to get banned. There is a layer of | human judgement in addition to the algorithm's score of the | content, and additionally that gives you an extra scoring | factor on the algorithms yourself - the humans should be able | to help direct the development of the algorithm to fight spam | more preemptively. | | Would a mix of that give us a bigger internet than the | entirely-manual "web directory" days of 1997 or so, but a less- | shit-filled one than today's? | ocdtrekkie wrote: | Oh the irony. The company that lets AI determine what it thinks | is factual says using AI to generate content is bad. | peterisza wrote: | They want to use the internet to train their models. AI generated | text would contaminate the training data. | hidroto wrote: | perhaps they don't want the datasets that they use to train the | AI to be watered down by other AI generated content. after all | alot of the text data was sourced from the internet. | magicalist wrote: | > _" For us these would, essentially, still fall into the | category of automatically generated content which is something | we've had in the Webmaster Guidelines since almost the | beginning._ | | > _And people have been automatically generating content in lots | of different ways. And for us, if you're using machine learning | tools to generate your content, it's essentially the same as if | you're just shuffling words around, or looking up synonyms, or | doing the translation tricks that people used to do. Those kind | of things._ | | > _My suspicion is maybe the quality of content is a little bit | better than the really old school tools, but for us it's still | automatically generated content, and that means for us it's still | against the Webmaster Guidelines. So we would consider that to be | spam. "_ | | So are people reacting thinking this is a new policy or...? | hackernewds wrote: | "This is okay since it has been policy" is that same vibe as | "We have to bump you off the plane because it is company | policy" | Traster wrote: | I think it's difficult to marry this policy with the _other_ | stuff Google is claiming to be dramatically transformational. | If Google came out and said "Hey, this self-driving stuff | isn't really dissimilar from traditional driving assist." there | would be some questions to answer. Which of course, is what the | regulators actually say. | npunt wrote: | This internal tension between chasing AI tooling and avoiding AI- | generated content is just a prelude to the bigger shift of search | engines getting reinvented around generated results instead of | found results. | | Fast forward 10+ years and for knowledge-related queries search | is going to be more about generated results personalized to our | level of understanding that at best quote pages, and more likely | just reference them in footnotes as primary inputs. | | These knowledge-related queries are where most content farms, low | quality blogs, and even many news sites get traffic from today. | If the balance of power between offense (generating AI content) | and defense (detecting AI content) continues to favor offense, | there will be a strong incentive to just throw the whole thing | out and go all-in on generated results. | | Big question is how incentives play out for the people gathering | the knowledge about the world, which is the basis for generated | results. Right now many/most make money with advertising, but so | do content farms, and more generation means more starving of that | revenue source. Wikipedia is an alternative model for this | knowledge, but they only cover a certain portion of factual info | that people want to know and if search uses it more, will become | more of a single point of failure. | | Really interesting stuff ahead. | asar wrote: | Is this even technically possible? To me, this reads like an | empty threat. | ganzuul wrote: | Hey would you mind solving some captchas? Nothing personal, | just a little paranoid. | asar wrote: | Great point! I think with captchas, some humans might be able | to identify obviously made up facts and very poor sentence | structure to tag a text as low quality. But I don't think | you'd be able to give a reliable assessment on whether a text | was AI generated or not. Especially if it's a random group of | people (unfamiliar with modern content generation) filling | out the captchas. | ganzuul wrote: | Captchas can be made adversarial. Not sure if it's a good | idea to try that with text since we don't know how humans | react. Maybe that's what the Phenomenon is about? | sumoboy wrote: | So it's ok for google to generate <title> tags, google ad | headlines, and email assistance with AI while new agencies for | awhile robo generate articles, the real issue is google knows | this is gaming seo and they will struggle against this which will | only get better. | [deleted] | MrPatan wrote: | AI for me, not for thee? | ceejayoz wrote: | Can't wait to be delisted from Google over a false positive with | zero recourse. | [deleted] | Animats wrote: | Most of the content Google generates is either AI-generated or | plagarized. | Shadonototra wrote: | it is very important to cover the legal aspect of such thing now | | otherwise some dumb people will want ai generated content to be | allowed everywhere | | i'm pretty sure they are doing their move now because it causes | them ton of issues with YouTube | | allowing ai generated content, would mean allowing ai generated | comments on youtube, wich is already happening and causes lot of | issues | | if you can't tell what is AI generated and use | comments/discussions/like/dislike in your algorithms for ranking | videos, then it'll be very easy for 3rd parties to push and play | the game, including ad revenue | | the inevitable will come sooner rather than later, get ready for | your online passport! | fny wrote: | There's an interesting variant of the Turing test here: develop | an AI sufficiently intelligent to distinguish human content from | AI generated content. | | I might be wrong, but I think this might be a more difficult task | than generating convincing dialogue: its very easy to generate | text that statistically resembles human writing, and generators | trained on certain topics (i.e. some science niche) might be | impossible to flag. | mensetmanusman wrote: | In theory GTP-3 could fill the entire internet with approximately | true but false information. Wikipedia, comment boxes, blogs, | Wordpress, everything. | | When the percentage of human generated content approaches | 0.0000%, what does that internet look like? | aaaaaaaaaaab wrote: | We'll go back to invite-only forums. | WithinReason wrote: | "There is another theory which states that this has already | happened." | imranq wrote: | So your post was generated by a bot? And I suppose this reply | was also generated by a bot? Its bots all the way down | ChrisGranger wrote: | It's a _Hitchhiker 's Guide to the Galaxy_ reference. | EarlKing wrote: | "It's turtles all the way down." | | In the process of remembering that little tidibt it | reminded me of a short story I read once that was | circulated without attribution for a few decades, i.e. | Terry Bissom's "They're Made Out of Meat"[1] published in | OMNI in 1990. "They're meat all the way through." | | [1] https://web.archive.org/web/20190501130711/http://www | .terryb... | amelius wrote: | And how would GPT-3 evolve when it starts feeding on its own | output? | visarga wrote: | Use another model to filter out generated text? | amelius wrote: | I assume they will use a GAN to evade Google's policy. | BlueTemplar wrote: | For the Web, we'll just go back to webrings and directly | sharing links with people we know to be real ? | | (Another risk might be governments-enforced identification, no | more pseudonymity !) | dragonwriter wrote: | > In theory GTP-3 could fill the entire internet with | approximately true but false information. | | Much of the internet is currently full of not-even- | approximately true (and often maliciously false) information, | so I'm not particular worried about that. | EarlKing wrote: | Exactly like the one we have now: A cacophonous cesspit filled | with the mental diarrhea of a million bots sharting their | opinions into the void in a desperate bid to either sway your | opinion to their cause (whether commerce or politics), or bury | you under such an avalanche of bullshit that you remain | paralyzed with indecision. | | And the result of this? Genuine conversation progressively | retreats from the internet at large as it becomes overrun with | the intellectual flatus of the bot wars, and people move | further and further into silos in which bot behavior can be | spotted and eliminated. | | Welcome to the future. How do you like it, gentlemen? All your | posts are belong to us. | mjburgess wrote: | We find ways to route around it. You're presently using one. | EarlKing wrote: | Reread the second paragraph. We're in agreement. | eslaught wrote: | Really? I think HN has even fewer safeguards against AI- | generated content than Google. No offense to dang and co., | and I'm sure there's more going on there than I'm aware of. | But still, I'm pretty sure it would be trivial to set up an | account and use GPT-3 to produce the content. The only | reason I'd suspect this isn't happening is because there | isn't a strong financial incentive to do so. In other | words, HN avoids the spam because it's still too small to | matter. | gus_massa wrote: | They will be downvoted into Oblidon. And then banned, | shadow-banned, hell-banned and a few more creative ways | of banning. The account, the IP, the site, and perhaps | anyone that a bayesian filter put in the same bucket. | lordnacho wrote: | Can't be that hard to train GPT on HN comments. Plus if | someone were to do it, they probably know of HN. | | I could definitely see someone already having trained a | bot to write HN comments and posting them. | | What's anyone going to do about it? It's super hard to | write a discriminator that works well enough to not | destroy the site for everyone. | dragonwriter wrote: | > Can't be that hard to train GPT on HN comments. | | Yes, which will produce things that are stylistically | similar to HN comments, but without any connection to | external reality beyond the training data and prompt. | | That _might_ provide believable comments, but not things | likely to be treated as _high-quality_ ones, and not | virtual posters that respond well to things like | moderation warnings from dang. | babyshake wrote: | We will likely find out. And it may happen in a way that is | fast enough to be very disorienting. | Hayarotle wrote: | Above a certain percentage it's going to poison human-generated | content too. You will have to discern between ai-generated | content, ai-influenced-human generated content and genuine | human-generated content. | | One could argue it's already happening. How many of the people | we talk to everyday get their facts from SEO-spam websites and | Google instant answers (which often sources its content from | such websites)? Even if we avoid AI-generated content, we might | be gettting fed it by proxy. | ganzuul wrote: | Human filtering of AI creativity might work, but deepfakes | mismatch with that. Personally I decided to make it a pattern | that I unsubscribe from channels that use deepfakes since I | saw Internet Historian using it and possibly adding to the | already crippling confusion regarding UAPs. - IH is not a | credible source anyway, but you can easily use the clips they | produced without attribution. | | I think the world will be a better place if everybody follows | a similar pattern. The only reason to use deepfakes is if the | victim who's identity is being stolen is not cooperating with | you. - It's a new way to violate a person's integrity and | their right to agency in our already fading grasp of reality. | You could probably gaslight your girlfriend with it, if you | are incomprehensibly evil. | DaltonCoffee wrote: | >You could probably gaslight your girlfriend with it, if | you are incomprehensibly evil. | | I'm honestly a bit more concerned with people gaslighting | courts. | | The technology isn't going away, unfortunately. Society | will have to adapt to these new invasive norms, as they | already have time and again in the past. | ajsnigrutin wrote: | > approximately true but false | | > When the percentage of human generated content approaches | 0.0000%, what does that internet look like? | | ...approximately the same, but less false :) | josefx wrote: | Given that the AI are trained on human generated content to | be human like without having understanding of the content, I | would think approximately the same but even less correct. | EarlKing wrote: | A bot would say that. | ajsnigrutin wrote: | Are you accusing me of being a bot?! | | This is a lie, i sw | | <NUL> | | <NUL> | | Segmentation fault | hprotagonist wrote: | This is a minor plot-point in Anathem. | thorum wrote: | If I'm understanding the quoted interview correctly, Google is | talking about AI generated spam - like when you ask GPT-3 to | write you an article about XYZ topic and it spits out 500 words | of well-written, plausible sounding gibberish - that you throw up | on your website to try to rank in the search engines. | | However, they seem to be leaving open the possibility of AI- | assisted writing, where a human comes up with the information and | guides the AI as it puts that information into words. | | > _From our recommendation we still see it as automatically | generated content. I think over time maybe this is something that | will evolve in that it will become more of a tool for people. | Kind of like you would use machine translation as a basis for | creating a translated version of a website, but you still work | through it manually._ | | > _And maybe over time these AI tools will evolve in that | direction that you use them to be more efficient in your writing | or to make sure that you're writing in a proper way like the | spelling and the grammar checking tools, which are also based on | machine learning. But I don't know what the future brings there._ | | In my opinion GPT-3 has already reached the point of being useful | for this purpose - there are several GPT-3 based apps that do | exactly what he's describing. | amelius wrote: | I'm glad that I'm not one of the human moderators having to | read GPT-3 gibberish on a daily basis. | uuyi wrote: | I was a moderator for a large Internet forum. GPT-3 is far | more coherent than a lot of humans. | DonHopkins wrote: | If it's simply a quality issue, then at the point that AI | generated content becomes better than human generated | content, will Google ban human generated content? | thorum wrote: | At that point they'll probably just have their own AI | that generates the perfect response to any search query, | and return it as the top result. | MiddleEndian wrote: | After getting this result | https://i.redd.it/4wrj2xpp75s81.png a couple days ago, I | am not confident that will be any time soon. | [deleted] | [deleted] | neilv wrote: | A friend (university science lab technician, in a field that | didn't pay like CS) would make approx. $2/hour extra money in | the evenings, from some Web site that directed her what topics | to write articles about. Google the topic, rapidly skim, | distill/rephrase it to a certain word count. (I suggested she | could make a lot more working in a cafe, but she was physically | exhausted from being on feet all day in lab, with lots of | moving around heavy objects.) | | I guessed it was used for filler content for SEO sites. | | A question is whether that company would save money by using | "AI" text generation, when they were paying real humans so | little, for arguably higher quality. | walrus01 wrote: | Maybe this wouldn't be a problem if Google did better at | distinguishing between seo stuffing content farms and actual | websites. | [deleted] ___________________________________________________________________ (page generated 2022-04-09 23:00 UTC)