[HN Gopher] New AI classifier for indicating AI-written text
       ___________________________________________________________________
        
       New AI classifier for indicating AI-written text
        
       Author : davidbarker
       Score  : 236 points
       Date   : 2023-01-31 18:11 UTC (4 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | peter303 wrote:
       | OpenAI archives every request and output text. Why not compare
       | suspected A.I. text against this?
        
       | nothrowaways wrote:
       | is this the new Anti virus?
        
       | minimaxir wrote:
       | > Our classifier is not fully reliable. In our evaluations on a
       | "challenge set" of English texts, our classifier correctly
       | identifies 26% of AI-written text (true positives) as "likely AI-
       | written," while incorrectly labeling human-written text as AI-
       | written 9% of the time (false positives).
       | 
       | That is an interesting mathematical description of "not fully
       | reliable".
        
       | thatcherthorn wrote:
       | Similarly to deep fakes, it seems that creating tools to
       | distinguish between human and AI generated data will cause more
       | harm then good if. Models to distinguish will never be perfect
       | and an actor that can fool such a model will be very effective.
        
       | LanceJones wrote:
       | What about introducing a new code of ethics that students sign?
       | They agree to disclose the level of help (1 - 10) provided by GPT
       | and the teacher/instructor/prof grades accordingly. Silly?
        
       | Magi604 wrote:
       | Just filter your text through Quillbot to get around "AI
       | Detection".
       | 
       | https://quillbot.com/
       | 
       | Demonstration: https://youtu.be/gp64fukhBaU?t=197
       | 
       | The arms race continues...
        
       | GOONIMMUNE wrote:
       | This seems like a sort of unwinnable arms race. Can't the people
       | who work on generative text models use this classifier as a
       | feedback mechanism so that their output doesn't flag it? I'm not
       | an AI expert, but I believe this is even the core mechanism
       | behind Generative Adversarial Networks.
        
         | londons_explore wrote:
         | Detectors can be a black box "pay $5 per detection" type
         | service.
         | 
         | That way, you can't fire thousands of texts at it to retrain
         | your generative net.
         | 
         | Plagiarism detectors in schools and universities work the same.
         | In fact, some plagiarism detection companies now offer the same
         | software to students to allow them to pay some money to pre-
         | scan their classwork to see if it will be detected...
        
           | Buttons840 wrote:
           | Make a model to detect cheating. Market it as "a custom built
           | and unique model to detect cheating; able to catch cheating
           | that other models miss!" It's all 100% true. Market and
           | profit.
        
           | telotortium wrote:
           | $5 is way too high of a price to use regularly. In any case,
           | if it's only available to education institutions, teachers
           | and grad students are poor enough to sell access to it to
           | people on the dark web for the right price.
        
         | mritchie712 wrote:
         | There's also always going to be more capital going towards
         | building better generators than better detectors.
        
       | standardly wrote:
       | IMO we need a mass-adopted digital signature solution using
       | biometric identifiers such that publishing an article, or even a
       | comment, can be signed by and only by a biological human.
        
         | khyryk wrote:
         | Any ideas on how the "only by" would work? I'm not seeing a way
         | around pasting generated text and signing it as one's own work.
         | Proof of work solutions would have to have a high cost for
         | anyone to care, otherwise there would be bots "proving" the
         | work of writing an essay by generating it in phases like a
         | human would edit drafts.
        
           | standardly wrote:
           | This is the really first time I've thought about this, so
           | uh... No, no ideas.
           | 
           | One's identity would need verified concurrent with creation
           | of a text. I am not really satisfied with the idea of a
           | specialized word processor or input device that does
           | biometric validation, I'd rather have a specific,
           | standardized protocol. I wonder if this is already deemed
           | impossible, or if someone is working on the problem.
        
         | Someone1234 wrote:
         | You want to ban privacy?
        
           | standardly wrote:
           | Proving you are a human and not a computer doesn't have to
           | publicly reveal a single thing about you (other than the
           | crypto signature). Think of it like an SSL cert for a person
           | rather than a server. I'm purely spitballing man. It's a
           | problem someone will eventually have to come up with a
           | solution for and I think we already have the tools.
        
             | GMoromisato wrote:
             | The problem is that one can't be trusted to sign their own
             | work--otherwise they could sign AI-generated text. This
             | only works if a trusted human signs your work after
             | watching you generate it.
        
               | standardly wrote:
               | Easy. We just need an AI that serves as a public notary.
               | Wait a minute..
               | 
               | It really is an interesting problem to think about. The
               | other commenter pointed out you could just sign an AI
               | text - I see all the issues, but my gut feeling tells me
               | there is an elegant solution somewhere.
        
       | GMoromisato wrote:
       | 1. This is an arms race. You can build a generative AI that
       | avoids generating text caught by the classifier.
       | 
       | 2. Maybe teachers will assign rare or even fictional topics that
       | cannot be found in the AI training corpus. Maybe a teacher could
       | use an AI to generate essay prompts that are hard for other AIs
       | to write essays for.
       | 
       | 3. Is this a problem long term? If an AI can generate an essay
       | that's indistinguishable from a human-generated one, then why do
       | we need to learn how to write essays? Maybe we should just learn
       | how to write good prompts.
       | 
       | See also: "Should calculators be banned in school?", "Do students
       | need to learn cursive?", "Why should I learn Greek instead of
       | just reading a translation of Homer?"
        
       | w_for_wumbo wrote:
       | So for me, I like to write my own ideas but use AI to reword them
       | to be succinct and readable. I'm worried that usage would flag as
       | AI text.
        
       | BulgarianIdiot wrote:
       | I wrote some text about the subjectivity of communication and the
       | nature of natural language, and I kept it very neutral, formal
       | and verbose. And it said "this text is likely AI".
       | 
       | So, as honestly was predictable, people who rely on this tool
       | being accurate, will inflict a lot of pain on unsuspecting
       | individuals who simply write like GPT writes.
        
       | gzer0 wrote:
       | How good is this really?
       | 
       | I input an article that was written, directly by chatGPT, and it
       | came back as "The classifier considers the text to be unclear if
       | it is AI-generated." This article was not edited, not put through
       | any paraphrasers, or anything. Interesting.
       | 
       | Furthermore, these efforts are quite futile. One can just go to
       | numerous paraphrasers such as quillbot.com, run it through there,
       | and then for added obfuscation, either use an entirely different
       | paraphraser (Microsoft Word now has this capability, natively in
       | the beta channels at least, btw).
       | 
       | Yeah, for someone who has intentions of bypassing this, there
       | will always be a way. It's a good effort, for sure. But, I don't
       | see this doing much in terms of truly distinguishing AI vs non AI
       | generated outputs.
        
         | moneywoes wrote:
         | 26% good
        
           | [deleted]
        
       | cjrd wrote:
       | This is all predicated on existing conditions, where AI-written
       | text hasn't influenced the way that humans write. As the years
       | pass and these tools become a common way to at least "spot check"
       | your own writing, I imagine that we will all begin to write in
       | styles that are increasingly similar to AI-written text.
        
       | felipelalli wrote:
       | The irony here is that tool can be used by the AI in the future
       | to self-training and be more and more like a human.
        
         | antiterra wrote:
         | Heck, you can use it as a manual adversarial output filter as
         | it is right now.
        
       | O__________O wrote:
       | Related option that has benchmarks and research paper; appears
       | they intend to release code & datasets too.
       | 
       | DetectGPT: Zero-Shot Machine-Generated Text Detection
       | 
       | - https://news.ycombinator.com/item?id=34557189
        
       | ilaksh wrote:
       | It can't possibly work reliably. It's going to be very
       | challenging for honest kids because almost everyone is going to
       | be cheating.
       | 
       | The reality is that learning to think and write will be harder
       | because of the ubiquity of text generation AI. This may be the
       | last generation of kids where most are good at doing it on their
       | own.
       | 
       | On the other hand, at least a few will be able to use this as an
       | instant feedback mechanism or personal tutor, so the potential
       | for some carefully supervised students to learn faster is there.
       | 
       | And it should increase the quality of writing overall if people
       | start taking advantage of these tools. It's going to fairly
       | quickly become somewhat like using a calculator.
       | 
       | Actually it probably means that informal text will really stand
       | out more.
       | 
       | I am giving it the ability to do simple tasks given commands like
       | !!create filename file content etc.
       | 
       | It's actually now very important for kids to adapt quickly and
       | learn how to take advantage of these tools if they are going to
       | be able to find jobs or just adapt in general even if they don't
       | have jobs. It actually is starting to look like everyone is
       | either an entrepreneur or just unemployed.
       | 
       | Learning about all the ways to use these tools and the ones
       | coming up in the next few years could be quite critical for
       | children's education.
       | 
       | There are always going to be luddites of course. But it's looking
       | like ChatGPT etc. are going to be the least of our problems. It
       | is not hard to imagine that within twenty years or so, anyone
       | without a high bandwidth connection to an advanced AI will be
       | essentially irrelevant because their effective IQ will be less
       | than half of those who are plugged in.
        
         | logifail wrote:
         | > it should increase the quality of writing overall if people
         | start taking advantage of these tools
         | 
         | Perversely, it might also dramatically decrease reading, if
         | there's no incentive for anyone to need to properly understand
         | anything.
         | 
         | A pretty dire scenario :(
        
       | LanceJones wrote:
       | Feels like the battle between computer virus creators and anti-
       | virus software all over again.
        
       | ipnon wrote:
       | ChatGPT is already quite effective at deceiving these models with
       | simple prompts like, "write the output in a way that seems human
       | and not AI generated, so as to bypass AI-written text detectors."
        
       | gunshai wrote:
       | Or educators could be forced to evolve around a new tool that
       | _gasp_ requires a different measurement of skill one that is much
       | harder to fake.
       | 
       | The obvious one that already exists ... ORAL EXAMS.
        
       | botplaysdice wrote:
       | Is this new Turing test? Who can verify the classifier itself?
        
       | siliconc0w wrote:
       | Horrible idea, you can't eliminate the false positives and these
       | are going to impact innocent students or used to re-enforce
       | teacher biases.
        
       | lumost wrote:
       | I don't see why teachers don't use this as an opportunity to
       | accelerate curriculum. Every student now has a cheap personal
       | instructor. Why not raise the bar on difficulty and quality
       | expectations for assignments?
        
       | odipar wrote:
       | As always it is the journey that matters (writing), not the
       | outcome (the essay).
       | 
       | For example, students could record their writing of an essay with
       | a keylogger or something.
       | 
       | Additionally - with the use of some advanced zero-knowledge algos
       | or crypto timestamp provenance - it should be possible to prove
       | that they have written the essay, without revealing their
       | recording.
        
         | ekanes wrote:
         | Yes, sort of, though if the economic incentive was high enough,
         | someone could connect the AI to input through the keyboard and
         | "type" out the essay. At scale it would be cheap. You could
         | record video of you typing, which would work for some time
         | until at some point video fakes get advanced enough... sigh.
        
       | barbazoo wrote:
       | > Our classifier is not fully reliable. In our evaluations on a
       | "challenge set" of English texts, our classifier correctly
       | identifies 26% of AI-written text (true positives) as "likely AI-
       | written," while incorrectly labeling human-written text as AI-
       | written 9% of the time (false positives).
       | 
       | > The classifier is very unreliable on short texts (below 1,000
       | characters). Even longer texts are sometimes incorrectly labeled
       | by the classifier.
        
       | beefman wrote:
       | We've lined up a fabulous type of gorilla...
        
       | animanoir wrote:
       | What's the point of launching this when they admit it doesn't
       | works most of the time and adds to the confusion? We should just
       | embrace the AI Chaos.
        
       | dukeofdoom wrote:
       | I've used ChatGTP to generate some code for me and almost every
       | time it was a learning experience. I saved a lot of time
       | searching, and it just gave me what I was after. Observing how
       | someone or something like AI can solve a problem, is fast way to
       | learn. I don't see a problem with this. Teachers can always just
       | use in person tests to check if a student mastered the concepts.
       | Math teachers got over students using calculators for homework,
       | and can check understanding just fine on tests. It used to be
       | that students would solve home work problems by candle light,
       | with abacus and look up tables. Yet no one want to mandate back
       | to that, just because it made homework harder.
        
       | gibsonf1 wrote:
       | Wow, that is truly not a good classifier with success that low.
        
       | gzer0 wrote:
       | On a side note:
       | 
       | My online MBA has switched from TurnItIn to this website:
       | https://unicheck.com
       | 
       | And the benefit of this is... incredible. It allows students to
       | purchase X amount of pages to check the plagiarism. Full reports
       | on your work before hand.
       | 
       | Not sure why this move was made, but it will be interesting to
       | see once they integrate "possible AI detection" into UniCheck.
        
         | e_i_pi_2 wrote:
         | TurnItIn also let's students submit beforehand last I heard -
         | if people are going to use tools like this then students should
         | also get full access to make sure they won't get flagged ahead
         | of time.
         | 
         | I had some professors where you could fully grade your own
         | assignments before submitting and it was the best courses I've
         | ever taken - you're fully given all the knowledge to figure out
         | what you know and what you don't
        
       | nerdponx wrote:
       | Do we have GANs for text yet?
        
       | supernova87a wrote:
       | I was thinking that there have been swings of what is valued (or
       | trusted) in education and testing (or voting, promoting) to prove
       | that someone has the goods.
       | 
       | At one time it was live oration skill, and then people thought,
       | "maybe that disfavors people who are introverted or whose talent
       | comes from thinking and writing".
       | 
       | Then, at another time, it was thought, "well you have to test
       | because sometimes time pressure and not being able to go away to
       | think about something for as long as you have time to work on it
       | produces something valuable".
       | 
       | Yet another time, "let people do homework to prove their value
       | through effort who don't test well" but now who knows whether
       | they actually were the ones doing the work?
       | 
       | I wonder what this development will produce?
        
       | Logans_Run wrote:
       | In my {semi-tongue-in-cheek} opinion - Thus begins the origins of
       | Arnies' Skynet.
       | 
       | The {semi-cynical} part of my corporate soul screams 'oooh, what
       | a great way to boot-strap your own ML/AI and have marketing
       | trumpet it as 'So good that it was trained on OpenAi data and
       | Human(tm) Error Labelling!'.
       | 
       | The Futurist (Luddite???) in me shudders at the thought of two
       | very powerful computer systems (models) working to out compete
       | each other in a way that turns out to be 'rather unfortunate'
       | a.k.a 'Oh shit! We should have thought about how we (the human
       | race) can somehow be able to tell machine output vs. human
       | output'. But that is a discussion I will leave to the lawyers and
       | ethicists to thrash out a solution/definition that outputs a
       | simple binary Y/N with a Five-Nines certainty.
       | 
       | But Meh - A) The above is a rather random comment and B), Time
       | will tell and hopefully this and other similar efforts remain
       | 100% Libre as in 'free to all individuals forever and is non-
       | revocable'
        
       | omalleyt wrote:
       | I bet you can utterly defeat this by adding one or two typos into
       | the text
        
       | Sol- wrote:
       | Given the weak accuracy - which is of course understandable given
       | the difficulty of the task - this mostly seems like a fig leaf
       | that lets them pretend to do something about the potential
       | problems of AI generated text becoming more and more pervasive.
       | 
       | Probably one shouldn't fault them for trying, but the cat is out
       | of the bag I think.
        
       | [deleted]
        
       | kiru_io wrote:
       | It would be interesting to know how this compares againast
       | GPTZeor [0].
       | 
       | [0] https://gptzero.me/
        
       | ulizzle wrote:
       | Does anyone actually even believe that this A.I generated writing
       | is any good? The standards seem extremely low.
       | 
       | Can it beat Tolkien or Asimov? No. Then what is even the point of
       | all this propaganda?
        
         | swatcoder wrote:
         | It's not good by any means, but neither is most assigned,
         | casual, or rote writing.
         | 
         | 10,000 students are writing some crappy essay on the Great
         | Depression every day, and ChatGPT has probably trained on a
         | zillion of these. It's optimized to produce those mediocre
         | essays really efficiently, and that's very disruptive to how
         | teachers have been working with students for the last century
         | or so. The internet (and fraternity filing cabinets) were
         | already straining this kind of pedagogy, but ChatGPT breaks it
         | wholesale.
        
           | gunshai wrote:
           | What I find interesting about your comment, is that while it
           | can produce that mediocre essay. It can also produce a much
           | better one.
           | 
           | How? Well it's all about how you interact with it. But the
           | majority of use as you said will be taking the first output
           | given the input. What's amazing to me is learning to reject
           | the output in favor of our own vision or conflicting ideas.
           | 
           | If chatGPT helps people get past blank page syndrome and
           | interact with their own ideas better to see the limits of
           | what is returned contrast to what they think. That would be
           | an incredibly useful tool for anyone trying to learn.
        
         | ropintus wrote:
         | It writes better than me (I am an ESL) in writing, and that's
         | enough reason for me to use it.
         | 
         | It might not be better than Tolkien, but so what, 99.99% people
         | are also not better than Tolkien and ChatGPT can add value to
         | the life of these people.
        
         | sebzim4500 wrote:
         | No offence, but I am confident that you also can't write better
         | than Tolkien or Asimov.
         | 
         | Does that mean you should delete all your comments and stop
         | posting?
        
         | dqpb wrote:
         | This is such a comical point of view I can't tell if it's
         | sarcasm or a genuine question.
         | 
         | Yes, I would wager that ChatGPT can write better than at least
         | 90% of living human beings.
        
           | jfk13 wrote:
           | And yet I'd rather hear what the living human beings have to
           | say. They may write poorly, but at least they have actual
           | thoughts and ideas -- no matter how misguided or bizarre --
           | that they're trying to communicate.
        
       | urbandw311er wrote:
       | This all feels a little like OpenAI trying to get a head start on
       | plausible deniability.
       | 
       | A bit like Apple ensuring its consumer devices can't be hacked to
       | bypass completely any arguments about whether they
       | should/shouldn't aid the state in providing a back door.
        
       | mindcrime wrote:
       | This is just going to lead to an arms-race like with CAPTCHA.
       | Next project announcement: an AI text generator that can evade
       | the AI-text-detector... and so on.
        
       | WestCoastJustin wrote:
       | Great growth hacking idea. Feed ChatGPT into this and test if it
       | is getting detected. You'll increase usage of both products.
        
       | keepquestioning wrote:
       | [dead]
        
       | dakiol wrote:
       | Isn't this a poor business move from OpenAI? I mean, if they make
       | possible to distinguish (100% in the future) between AI-written
       | text and human-written text... then a big chunk of potential
       | OpenAI's customers will not use ChatGPT and similars because
       | "they are gonna be caught" (e.g., students, writers, social media
       | writers, etc.)
        
       | anhner wrote:
       | 1. Create AI capable of writing almost human-level text and make
       | it generally available.
       | 
       | 2. Make said AI generate text in a way that makes it possible to
       | detect that it was written by a machine.
       | 
       | 3. Create another AI that detects text written by above AI
       | 
       | <--- You are here
       | 
       | 4. Put your detector service behind a paywall
       | 
       | 5. Every time a competitor appears for your generator, change its
       | steganography so that only your detector correctly classifies it
       | 
       | 6. Profit
        
         | gunshai wrote:
         | Talk about a local maxima, yuck.
        
       | michaericalribo wrote:
       | I foresee a dystopian education outcome:
       | 
       | 1. Classifiers like this are used to flag _possible_ AI-generated
       | text
       | 
       | 2. Non-technical users (teachers) treat this like a 100%
       | certainty
       | 
       | 3. Students pay the price.
       | 
       | Especially with a true positive rate of only 26% and a false
       | positive rate of 9%, this seems next to useless.
        
         | saltysnowball wrote:
         | This is already an issue, I'm a student in college right now
         | and even technical professors are operating with full
         | confidence in systems like turnitin which try their hand at
         | plagiarism detection (with often much higher false
         | negative/false positive rates). The problem was even more
         | prevalent in high school where teachers would treat it as a
         | 100% certainty. Thus, I think that OpenAI making atleast a
         | slightly better classification algorithm won't make the state
         | of affairs any worse.
        
         | Kiro wrote:
         | Funny how everyone praised GPTZero that has even worse rates
         | but starts being skeptical when it's OpenAI, the new bad guy.
        
           | [deleted]
        
           | dns_snek wrote:
           | "Everyone" didn't. In fact, the 5 top comments in that
           | thread[1] all called it useless or pointed out serious flaws.
           | 
           | [1] https://news.ycombinator.com/item?id=34556681
        
         | janalsncm wrote:
         | I urge anyone with time to write to tech journalists explaining
         | why this is so bad. Given previous coverage of GPTZero they
         | don't seem to be asking the right questions.
        
         | tremon wrote:
         | I dare hope for a less dystopian outcome:
         | 
         | - teachers will assign less mind-numbing essay homework
         | assignments and focus more on oral interviews.
        
         | screye wrote:
         | Hilariously, this has already happened with music composition.
         | Especially drumming.
         | 
         | Since the advent of drum machines, a lot of younger players
         | have started playing with the sort of precision that drum
         | machines enable. eg: The complete absence of swing, and clean
         | high-tempo blasts/rides.
         | 
         | So you'd get accusations of drummers not being able to play
         | their own songs, because traditional drummers think such
         | technically complex and 'soulless' performances couldn't
         | possibly be human. Only to then be proven wrong, when it turns
         | out that younger players can in fact do it.
         | 
         | The machine conditions man.
        
         | TheRealPomax wrote:
         | So, status quo then? This is already the case for educational
         | software that's used to detect plagiarism. People get wrongly
         | flagged, and then you'll have to plead your case.
         | 
         | But the times software like this finds actual problems vastly
         | outnumbers of times it doesn't, and when you choice is between
         | "passing kids/undergrads who cheat the system" and "the
         | occasional arbitration", you go with the latter. Schools don't
         | pay teachers anywhere _near_ enough to not use these tools.
        
           | michaericalribo wrote:
           | Given the published true and false positive rates, it's clear
           | that the true positives do not "vastly outnumber" false
           | positives.
        
           | PeterisP wrote:
           | Currently the false positive rate is _far_ lower. E.g. I get
           | 500-ish submissions over a school year then a 1% false
           | positive rate would mean I 'd falsely accuse 5 innocent
           | students annually, which isn't acceptable at all - and a 9%
           | FP rate is _so_ high that 's even not worth investigating; do
           | you know of any grader who has the spare time to begin formal
           | proceedings/extra reviews/investigation for 9% of their
           | homework?
           | 
           | For plagiarism suspicions at least the verification is simple
           | and _quick_ (just take a look at the identified likely
           | source, you can get a reasonable impression in minutes) - I
           | can 't even imagine what work would be required to properly
           | verify ones flagged by this classifier..
        
             | TheRealPomax wrote:
             | > I can't even imagine what work would be required to
             | properly verify ones flagged by this classifier.
             | 
             | Yet.
        
               | flatline wrote:
               | At the same time the classifier is improving, the
               | generative models are improving. It's a classic arms race
               | and this equilibrium is not likely to shift much either
               | way. We are talking about models that approximate human
               | behavior with a high degree of accuracy, I think the goal
               | would be to make them indistinguishable in any meaningful
               | way.
        
           | notahacker wrote:
           | > This is already the case for educational software that's
           | used to detect plagiarism. People get wrongly flagged, and
           | then you'll have to plead your case.
           | 
           | How often is that the case though? A while since I've had to
           | worry about it, but I thought plagiarism detection generally
           | worked on the principle of looking for the majority of the
           | content being literal matches with existing material out
           | there with only a few small edits, which - unlike using some
           | "AIish" turns of phrase a bot wrongly attributes to humans 9%
           | of the time and correctly attributes to AI with a not much
           | better success rate - is pretty hard to do accidentally.
        
             | i_have_an_idea wrote:
             | A long time ago when I was a student, I would run my papers
             | through Turnitin before submitting. The tool would
             | sometimes mark my (completely original) work as high as mid
             | 20% similarity.
             | 
             | As a result, I have taken out quotes and citations to
             | appease it and not have to deal with the hassle.
             | 
             | I expect modern day students will resort to similar
             | measures.
        
               | notahacker wrote:
               | IIRC the marker got the same visualization that you used
               | to take out quotes and citations that highlighted that
               | the similar bits were in fact quotes and citations!
               | 
               | Maybe high school is a different matter, but I'm pretty
               | sure even the most technophobic academic knows that
               | jargon, terse definitions and the odd citation
               | overlapping with stuff other people have written is going
               | to make a similarity of at least 10% pretty much
               | inevitable, especially when the purpose of the exercise
               | is to show you understand the core material well enough
               | to cite and paraphrase and compare it, not to generate
               | novel academic insight or show you understood the field
               | so well you didn't need to refer back to the source
               | material. The people they were actually after were the
               | ones that downloaded something off essaybank, removed a
               | couple of paragraphs and rewrote the intro to match the
               | given title and ended up with 80%+ similarity
        
         | ren_engineer wrote:
         | >false positive rate of 9%
         | 
         | bringing the Roman decimation to the classroom based on AI,
         | this is the future
        
         | kmkemp wrote:
         | Any solution here is just an arms race. The better AI's get at
         | generating text, the more impossible the job of identifying if
         | an AI was responsible for writing a given text sample.
        
           | e_i_pi_2 wrote:
           | You could even just set up a GAN to make the AI better at not
           | being detected as something written by an AI, I don't see a
           | good general solution to this, but I also see it as a non-
           | issue - if students have better tools they should be able to
           | use them, just like a calculator on a test - that's allowed
           | on tests because you still need to understand the concepts to
           | put it to use
        
         | tshaddox wrote:
         | It's almost as if you need to give exams in person and watch
         | the students if you don't want them to cheat. This is
         | fundamentally no different than cheating by writing notes on
         | your hand in an exam or paying someone to write a take-home
         | essay for you. It's cheaper than the latter, but that just
         | means the lazy curriculum finally needs to be updated.
        
         | dougmwne wrote:
         | The cheating students who know how to use the classifier will
         | be the big winners.
        
         | cjbgkagh wrote:
         | > false positive rate of 9%
         | 
         | Yeah, that is useless. You couldn't punish based on that alone
         | and students will quickly figure out to never confess.
        
         | sometimeshuman wrote:
         | Sorry for the tangent but a surprising number the general
         | public doesn't know the meaning of percent[1]. So even if a
         | teacher is told those percentages many wouldn't know what to
         | conclude.
         | 
         | [1] Me, giving young adults that worked for me a commission
         | rate. Then asking if their commission rate is 15% and they sell
         | $100 of goods what is their payment. Many failed to provide an
         | answer.
        
         | LarryMullins wrote:
         | > _2. Non-technical users (teachers) treat this like a 100%
         | certainty_
         | 
         | This is the part that needs to be addressed the most. Teachers
         | can't offload their critical reasoning to the computer. They
         | should ask their students to write things in class and get a
         | feeling for what those individual students are capable of. Then
         | those that turn in essays written at 10x their normal writing
         | level will be obvious, without the use of any automated cheat
         | detectors.
         | 
         | I was once accused of cheating by a computer; my friend and I
         | both turned in assignments that used do-while loops, which the
         | computer thought was so statistically unlikely that we surely
         | must have worked together on the assignment. But the
         | explanation was straight forward; I had been evangelizing the
         | aesthetic virtue of do-while loops to anybody that would listen
         | to me, and my friend had been persuaded. Thankfully the
         | professor understood this once he compared the two submissions
         | himself and realized we didn't even use the do-while loop in
         | the same part of the program. There was almost no similarity
         | between the two submissions besides the statistically unlikely
         | but completely innocuous use of do-while loops. It's a good
         | thing my professor used common sense instead of blindly
         | trusting the computer.
        
           | londons_explore wrote:
           | > blindly trusting the computer.
           | 
           | Professors blindly trust the computer not out of laziness,
           | but to protect themselves from accusations of unfairness...
           | 
           | "The work was detected as plagiarism, but the professor
           | overrode it for the pretty girl in class, but not for me"
        
             | mitchdoogle wrote:
             | Seems like something like this should only be used as a
             | first-level filter. If the writing doesn't pass, it
             | warrants more investigation. If no proof of plagiarism is
             | found, then there's nothing else to do and professor must
             | pass the student
        
               | TchoBeer wrote:
               | with a 26% true positive rate that seems flawed.
        
           | asah wrote:
           | seems like this is the future... 1. first day of class, write
           | a N word essay and sign a release permitting this to be used
           | to detect cheating. The essay topic is chosen at random.
           | 
           | 2. digitize & feed to learning model, which detects that YOU
           | are cheating.
           | 
           | upside: this also helps detect students who are getting help
           | (e.g. parents)
           | 
           | downside: arms race as students feed their cheat-essays
           | (memorize their essays?) into AI-detection models that are
           | similarly trained.
        
             | feanaro wrote:
             | There are also some countries that don't fetishize cheating
             | this much so perhaps they will just continue not caring.
        
             | kaibee wrote:
             | The funniest implication here is that the student's writing
             | skill isn't expected to improve.
        
               | eh9 wrote:
               | I was just asking my partner who's a writer if it would
               | even be fair to train a model based on a student at _N_
               | th grade if the whole point is to measure growth. Would
               | there be enough "stylistic tokens" developed in a young
               | person's writing style?
        
               | AlexAndScripts wrote:
               | Surely you could continuously add data about their latest
               | essays to the model, meaning any gradual improvements
               | would be factored in?
        
               | ask_b123 wrote:
               | Personally, I feel mildly embarrassed when reading my
               | essays from years prior. And I probably still count as a
               | 'young person'.
               | 
               | That said, there's no need to consider changes in years
               | when stylistic choices can change from one day to another
               | depending on one's mood, recent thoughts, relationship
               | with the teacher, etc.
               | 
               | That's why I've always been a little confused about how
               | some (philologists?) treat certain ancient texts as not
               | being written by some authors due to the text's style, as
               | if ancient people could not significantly deviate from
               | their usual style.
        
             | Aransentin wrote:
             | > first day of class, write a N word essay
             | 
             | Initially I thought you meant having the student write an
             | essay about slurs, as the AI will refuse to output anything
             | like that. Then I realized you meant "N" as in "Number of
             | words".
             | 
             | Still, that first idea might actually work; make the
             | students write about hotwiring cars or something that's
             | controversial enough for the AI to ban it but not
             | controversial enough that anybody will actually care.
        
             | JumpCrisscross wrote:
             | > _first day of class, write a N word essay and sign a
             | release permitting this to be used to detect cheating_
             | 
             | Why once? Most students need writing skills more than half
             | the high-school curriculum.
        
           | TheDudeMan wrote:
           | You are asking teachers to be good at their job. But is
           | teaching a merit-based profession?
        
           | busyant wrote:
           | I asked chatgpt to write an essay as if it were written by a
           | mediocre 10th grader. It did a reasonably good job. It threw
           | in a little bit of slang and wasn't particularly formal.
           | 
           | Edit. I sometimes tell my students "if you're going to cheat,
           | don't give yourself a perfect score, especially if you've
           | failed the first exam. It fires off alarm bells."
           | 
           | But the students who struggle usually can't calibrate a non-
           | suspicious performance.
           | 
           | I guess the same applies here.
        
             | Baeocystin wrote:
             | You've touched upon a central issue that is not often
             | addressed in these conversations. People who have
             | difficulty comprehending and composing essays also struggle
             | to work with repeated prompts in AI systems like ChatGPT to
             | reach a solution. I've found in practice that when showing
             | someone how prompting works, their understanding either
             | clicks instantly, or they fail to grasp it at all. There
             | appears to be very little in between.
        
           | geph2021 wrote:
           | ask their students to write things in class and get a feeling
           | for what those individual students are capable of. Then those
           | that turn in essays written at 10x their normal writing level
           | will be obvious
           | 
           | I think that's a flawed approach. Plenty of people simply
           | don't perform or think well under imposed time-limited
           | situations. I believe I can write close to 10x better with
           | 10x the time. To be clear, I don't mean writing more, or a
           | longer essay, given more time. Personally, the hardest part
           | of writing is distilling your thoughts down to the most
           | succinct, cogent and engaging text.
        
             | deepspace wrote:
             | > Plenty of people simply don't perform or think well under
             | imposed time-limited situations
             | 
             | From first-hand experience, the difference between poor
             | stress-related performance and a total lack of knowledge is
             | night and day.
             | 
             | I have personally witnessed students who could not speak or
             | understand the simplest English, and were unable to come up
             | with two coherent sentences in a classroom situation, but
             | turned in graduate level essays. The difference is
             | blindingly obvious.
        
               | giovannibonetti wrote:
               | > I have personally witnessed students who could not
               | speak or understand the simplest English, and were unable
               | to come up with two coherent sentences in a classroom
               | situation, but turned in graduate level essays. The
               | difference is blindingly obvious.
               | 
               | Maybe someone helped them with their homework?
        
               | remexre wrote:
               | Unless their in-class performance increases as well,
               | isn't that help "probably cheating"? (That's the "moral
               | benchmark" I'd use, at least; if your collaboration
               | resulted in you genuinely learning the material, it's
               | probably not cheating.)
        
             | runarberg wrote:
             | The point is for the teacher to get a sense of the students
             | style and capabilities. Even if your home essay is 10x
             | better and 10x more concise as your in class work, a good
             | teacher that knows you--unlike an inference model--will be
             | able to extrapolate and spot commonalities. Also a good
             | teacher (that isn't overworked) will also talk to students
             | and get a sense of their style and capabilities that way,
             | this allows them to extrapolate even better then a computer
             | could ever hope to.
        
               | zopa wrote:
               | Sure, but what about all the students with mediocre
               | and/or overworked teachers? If our plan assumes the best-
               | case scenario, we're going to have problems.
        
               | runarberg wrote:
               | Honestly if we can't have nice things and we keep
               | skimping out on education, I'd rather we just accept the
               | fact that some will students cheat, then to introduce
               | another subpar technical solution to a societal problem.
        
           | runarberg wrote:
           | So the computer's evaluation model assumed that each
           | student's learning is independent? That seems like a
           | ludicrous assumption to put in a model like this, unless the
           | model authors have never been in a class setting (which I
           | doubt).
        
           | munificent wrote:
           | I think you're misunderstanding the primary purpose of
           | essays.
           | 
           | Teachers don't have the time to do deep critical reasoning
           | about each student's essay. An essay is only partially an
           | evaluation tool.
           | 
           | The primary purpose of an essay is that the act of writing an
           | essay teaches the student critical reasoning and structured
           | thought. Essays would be an effective tool even if they
           | weren't graded at all. Just writing them is most of the
           | value. A big part of the reason they're graded at all is just
           | to force students to actually write them.
           | 
           | The main problem with AI generated essays isn't that teachers
           | will lose out on the ability to evaluate their students. It's
           | that students won't do the work and learn the skills they get
           | from doing the work itself.
           | 
           | It's like building a robot to do push ups for you. Not only
           | does the teacher no longer know how many push ups you can do,
           | you're no longer exercising your muscles.
        
             | YeGoblynQueenne wrote:
             | >> The primary purpose of an essay is that the act of
             | writing an essay teaches the student critical reasoning and
             | structured thought. Essays would be an effective tool even
             | if they weren't graded at all. Just writing them is most of
             | the value. A big part of the reason they're graded at all
             | is just to force students to actually write them.
             | 
             | That's our problem, I think. Education keeps failing to
             | convince students of the need to be educated.
        
               | [deleted]
        
             | thelock85 wrote:
             | For this exact reason, I feel like education systems and
             | curriculum providers (teachers are just point of contact
             | from a requirements perspective) should develop much more
             | complex essay prompts and invite students to use AI tools
             | in crafting their responses.
             | 
             | Then it's less about the predetermined structure (5
             | paragraphs) and limited set of acceptable reasoning
             | (whatever is on the rubric), and more about using creative
             | and critical thinking to form novel and interesting
             | perspectives.
             | 
             | I feel like this is what a lot of universities and
             | companies currently claim they want from HS and college
             | grads.
        
               | desro wrote:
               | This is what I'm doing as an instructor at some local
               | colleges. A lot of the students are completely unaware of
               | these tools, and I really want to make sure they have
               | some sense of how things are changing (inasmuch as any of
               | us can tell...)
               | 
               | So I invite them to use chatGPT or whatever they like to
               | help generate ideas, think things out, or learn more. The
               | caveat is that they have to submit their chat transcript
               | along with the final product; they have to show their
               | work.
               | 
               | I don't teach any high-stakes courses, so this won't work
               | for everyone. But educators are deluded if they think
               | anyone is served by pretending that (A) this
               | doesn't/shouldn't exist, and that (B) this and its
               | successors are going away.
               | 
               | All of this stuff is going to change so much. It _might_
               | be a bigger deal than the Internet. Time will tell.
        
         | nonrandomstring wrote:
         | A more likely outcome is that teachers will pay the price [1].
         | 
         | [1] https://www.timeshighereducation.com/opinion/ai-will-
         | replace...
         | 
         | (turn off js to jump signup-wall)
        
         | ibejoeb wrote:
         | I think there is a more dystopian near future:
         | 
         | 1. There will be commercial products to tune per-student
         | writing models.
         | 
         | 2. Those models will be used to evaluate progress and
         | contribute directly to scores, grades, and rankings. They may
         | also serve to detect collaboration.
         | 
         | 3. The models persist indefinitely and will be sold to industry
         | for all sorts of purposes, like hiring.
         | 
         | 4. Thy will certainly be sold to the state for law enforcement
         | and identity cataloging.
        
         | e_i_pi_2 wrote:
         | I can't remember the keyword to look it up, but there's a
         | problem of statistics you run into with stuff like terrorism
         | detection algorithms
         | 
         | If we have 300M people in the US and only 1k terrorists, then
         | you need 99.9999% accuracy before you start getting more true
         | positives than false positives. If you use this in a classroom
         | where no one is actually using AI you'll get false positives,
         | and in a class where the usage is average you'll still get more
         | false positives than true ones, which makes the test do more
         | harm than good unless it's just a reason to look into it more -
         | and the teacher is presumably already reading the text so if
         | that doesn't help than this surely won't
        
           | xmddmx wrote:
           | It's the False Positive Paradox: https://en.wikipedia.org/wik
           | i/Base_rate_fallacy#False_positi...
        
         | mitchdoogle wrote:
         | 4. Parents sue schools 5. Admins eliminate all writing
         | requirements
        
         | kilgnad wrote:
         | This isn't that dystopian. The dystopian outcome is when
         | there's a classifier that rates the quality of the text and
         | that this classifier becomes indistinguishable from the AI-
         | generated classifier because AI generated text is beginning to
         | be superior to human generated text.
        
         | thewataccount wrote:
         | Hopefully they just flag relevant sections. Essay/Plagiarism
         | checkers already exist, although in my experience professors
         | were reasonable.
         | 
         | For example I had a paragraph or two get flagged as being very
         | similar to another paper - but both papers were about a fairly
         | niche topic (involving therapy animals) and we had both used
         | the relevant quotes from the study conclusions from one of only
         | a few decent sources at the time - so of course they were going
         | to be very similar.
         | 
         | Given that most essays are about roughly the same set of
         | topics, and there are literally hundreds of thousands of
         | students writing these - I wonder how many variations are even
         | possible for humans to write as I would expect us to converge
         | on similar essays?
        
           | michaericalribo wrote:
           | Plagiarism is easier to verify, because you can directly
           | compare with the plagiarized source material
        
             | thewataccount wrote:
             | Absolutely. I think it may have to end up more as a
             | statistics thing with behaviour. For example:
             | 
             | "Tom had a single paragraph flag as possibly generated" vs
             | "Every single paper Tom writes has paragraphs flag"
             | 
             | Basically we might have to move to detecting statistical
             | outliers as cheating. Now whether the tools/teachers will
             | understand/actually do that - we can only hope....
        
         | amelius wrote:
         | Solution: just write your texts with a bit less confidence than
         | gpt3 would.
        
         | Verdex wrote:
         | I wonder if I should help my kids setup a server + webcam +
         | screen capture tool so they can document 100% of their essay
         | writing experience. That way if they ever get hit with a false
         | positive they can just respond with hundreds of hours of video
         | evidence that shows them as the unique author of every essay
         | they've ever written.
        
           | anotherjesse wrote:
           | You will certainly have a lot of training video to create a
           | "essay writing video generator" ml product
        
           | causalmodels wrote:
           | You could always teach them how to use git and have them
           | commit frequently. Seems like it would be less intrusive than
           | a webcam.
        
             | Verdex wrote:
             | Source control would certainly help establish a history of
             | incrementally performing school work by _someone_ when
             | viewed by a highly technical examiner and when periodically
             | stored someplace where a trusted 3rd party can confirm it
             | wasn 't all generated the night after a supposed false
             | positive.
             | 
             | However, hundreds of hours of video is compelling to non-
             | technical audiences and even more importantly is a
             | preponderance of evidence that's going to be particularly
             | damning if played in front of a PTA meeting.
             | 
             | With a git history it's going to come down to who can spin
             | the better story. The video is the story and everyone
             | recognizes it, so I expect fewer people would bother even
             | challenging its authenticity.
        
               | causalmodels wrote:
               | I guess that's fair. I just personally don't think the
               | additional gain is worth taking away your child's
               | privacy.
        
               | Verdex wrote:
               | It's only taking away their privacy if they're falsely
               | accused.
               | 
               | And properly used you might not even have to relinquish
               | privacy if falsely accused. A quick montage video demo
               | and a promise to show the full hundreds of hours of video
               | of "irrefutable" proof to embarrass the school district
               | at the next PTA meeting might be sufficient to get the
               | appropriate response.
        
           | tshaddox wrote:
           | You could still cheat quite easily and inexpensively with an
           | earpiece, as long as you know how to write down what you
           | hear.
        
             | Verdex wrote:
             | It's about building a narrative. Yeah, you could still
             | cheat, but who would go through the effort of generating
             | hundreds of hours of fake videos proving yourself innocent.
             | For that amount of effort you might as well have done the
             | work yourself.
             | 
             | Of course there are some people who put insane amounts of
             | effort into not doing "real" work. However, anyone trying
             | to prove that your child is in that position is going to
             | find themselves in an uphill battle.
             | 
             | Which is the ultimate goal here. Make people realize that
             | falsely accusing my children using dubious technology is
             | going to be a lot more work than just giving up and leaving
             | them alone.
        
         | claytonjy wrote:
         | Is there a longer-form paper on this yet? TPR (P(T|AI)) and FPR
         | (P(T|H)) are useful, but what I really want is the probability
         | that a piece flagged as AI-generated is indeed AI-generated,
         | i.e. P(AI|T). Per Bayes rule I'm missing P(AI), the portion of
         | the challenger set that was produced by AI.
         | 
         | If we assume the challenger set is evenly split 50-50, that
         | means                   P(AI|T) = P(T|AI)P(AI)/P(T) =
         | (0.26)(0.5)/(0.26+0.09) ~ 37%
         | 
         | So slightly better than a 1/3 chance of the flagged text
         | actually being AI-generated.
         | 
         | They say the web-app uses a confidence threshold to keep the
         | FPR low, so maybe these numbers get a bit better, but very far
         | from being used as a detector anywhere it matters.
        
           | TchoBeer wrote:
           | >Per Bayes rule I'm missing P(AI), the portion of the
           | challenger set that was produced by AI
           | 
           | This will obviously depend on your circumstances.
        
         | adamsmith143 wrote:
         | Or we realize that essays aren't that important and technical
         | skills will become more highly valued. Either way, ChatGPT
         | can't do your exams for you so the truth will come out anyway.
        
           | mitchdoogle wrote:
           | Writing is very important for understanding a topic and long-
           | term recall. I still remember topics from papers I did 15
           | years ago because I spent 10s of hours researching and
           | writing and forming ideas about each topic.
           | 
           | Instead of being overzealous about catching cheaters,
           | teachers should learn to express the importance of writing
           | and why it is done. Convince the students that they should do
           | it to be a smarter person, not just to get a grade, and they
           | will care more about doing it honestly.
        
         | flandish wrote:
         | In the same way deepfake video should not be allowed as
         | evidence, thereby ensuring _no_ video is allowed... we can
         | apply that to text as well.
         | 
         | We're entering an uncanny valley before a period of "reset"
         | with self taught (to stay on subject here) people re-learning
         | for the sake of learning.
         | 
         | In 30 years we will be in an educational renaissance of people
         | learning "like the old masters did in the 1900's."
        
           | EGreg wrote:
           | Nah. In 30 years it will be as useless to learn most subjects
           | as it is right now to learn crocheing and knitting, or
           | learning times tables or using an abacus.
           | 
           | People are wayyyy too optimistic, just like in the 1900s they
           | thought people would have flying cars but not the Internet,
           | or how Star Trek's android Data is so limited and lame.
           | 
           | Bots will be doing most of the work AND have the best lines
           | to say, AND make the best arguments in court etc.
           | 
           | You don't even need to look to AI for that. The best
           | algorithms are simply uploaded to all the bots and they are
           | able to do 800 things, in superhuman ways, and have access to
           | the internet for whatever extra info they need.
           | 
           | When they swarm, they'll easily outcompete any group of
           | humans. For example they can enter this HN thread and
           | overwhelm it with arguments.
           | 
           | No, the old masters were _needed_. Studying will not be. The
           | Eloi and Morlocks is closer to what we can expect.
        
             | flandish wrote:
             | As someone who's known how to crochet and knit since he as
             | 6... I disagree.
        
             | tokai wrote:
             | Apparently knitwear is forecasted to have a CAGR of 12% the
             | rest of the decade. With hand knitted garments commanding
             | the high prices. It's definitely not the worst cottage
             | industry one can chose.
        
         | la64710 wrote:
         | Exactly IMHO it is irresponsible to release such classifier
         | with a title that touts the desired feature and totally do not
         | spell its limitations. At least precede such title with
         | experimental or something.
        
         | anonobviously wrote:
         | This is extremely concerning.
         | 
         | The co-author on this is includes Professor Scott Aaronson.
         | Reading his blog Shtetl-Optimized and reading his
         | [sad/unfortunate/debate-able/correct?/factual?/biased?] views
         | on adverse/collateral harm to Palestinians civilians makes me
         | question whether this model would fully consider collateral
         | damage and harm to innocent civilians, whomever that subgroup
         | might be. What if his model works well, except for some
         | minority groups' languages which might reflect OpenAI speak?
         | Does it matter if the model is 99.9% accurate if the 0.1% is
         | always one particular minority group that has a specific
         | dialect or phrasing style? Who monitors it? Who guards these
         | guards?
        
         | jameshart wrote:
         | We can't release the essay writing language model. Lazy
         | children will use it to write their essays for them!
         | 
         | We can't release the ai-generated text detection model. Lazy
         | teachers will use it to falsely accuse children of cheating!
         | 
         | The problem here appears to be _lazy people_.
         | 
         | Can we train an AI to detect lazy people? I promise not to
         | lazily rely on it without thinking.
        
         | jupp0r wrote:
         | This is worse than useless, if taking base rate fallacy into
         | account.
        
       | optimalsolver wrote:
       | https://en.wikipedia.org/wiki/Red_Queen's_race
        
       | dxbydt wrote:
       | There was a merchant who said - Buy my sword! It will pierce
       | through any shield !!
       | 
       | So the gullible people bought the swords and soon the merchant
       | ran out of swords to sell.
       | 
       | So the merchant said - Buy my shield! They can defend against any
       | sword !!
       | 
       | Once again the gullible people rushed to buy the shields.
       | 
       | But one curious onlooker asked - what happens when your sword
       | meets your shield?
        
       | causalmodels wrote:
       | My younger brother and I both have fairly severe dyslexia. He's
       | been applying to school and has been using ChatGPT to help him
       | correct spelling and grammar mistakes rather than going to a
       | person for help. It has been fairly incredible for him.
       | 
       | I wonder if this tool would start flagging his work even though
       | he is only using it as a fancy spell checker.
        
       | meetingthrower wrote:
       | Lol, just tried it against several 500 word chunks of text I had
       | the old GPT3 write for me and it classified them as "unlikely AI
       | written." Maybe because I had very specific prompts which could
       | include a lot of actual facts...?
        
         | barbazoo wrote:
         | > The classifier is very unreliable on short texts (below 1,000
         | characters). Even longer texts are sometimes incorrectly
         | labeled by the classifier.
        
         | neonate wrote:
         | How does https://openai-openai-detector.hf.space/ do on them?
        
           | meetingthrower wrote:
           | 99.4% real!!!
        
         | groestl wrote:
         | My new hobby, based on the responses I read from ChatGPT, is to
         | get a "likely written by AI" rating from these classifiers.
         | 
         | "However, this is just one example of a humorous summary and it
         | is important to note that..." and so on and so on
        
       | andrewmutz wrote:
       | OpenAI should release a classifier that detects _their own_ AI-
       | generated text. They could do this easily by just using
       | steganography to hide some information in all text that they
       | generate, and then build the classifier to look for it.
       | 
       | Sure, it's less useful than a classifier that can detect any AI
       | generated text, but it would be a nice tool for contexts where AI
       | generated text can be abused (like the classroom) in the short
       | term.
        
         | ineedtocall wrote:
         | Or they could just save/hash results and get rid of the
         | classifier all together.
        
           | rcme wrote:
           | Yea, they could provide a fingerprinting algorithm and a
           | database of every fingerprint they've generated. However, it
           | wouldn't help you identify false-positives.
        
         | sebzim4500 wrote:
         | Scott Aaronson talks about something like that being done at
         | OpenAI in this post
         | 
         | https://scottaaronson.blog/?p=6823
        
         | m3affan wrote:
         | There is work on hidden signatures in generated text, invisible
         | to humans. Only way to move forward.
        
           | bnug wrote:
           | I'd think people would migrate to just re-typing whatever was
           | generated and change some wording along the way to prevent
           | detection.
        
           | thewataccount wrote:
           | The problem with this will be the method to detect the
           | signature would reveal how to hide the signature though
           | right?
           | 
           | Obviously not an issue if everyone uses a single API for it -
           | but if this ends up like Stable Diffusion were anyone can run
           | it locally then I don't think it's possible no?
        
       | brink wrote:
       | I miss the 90's and the early 00's. Take me away from this AI
       | hell.
        
         | [deleted]
        
         | shagie wrote:
         | Musicians Wage War Against Evil Robots -
         | https://www.smithsonianmag.com/history/musicians-wage-war-ag...
         | 
         | From the March, 1931 issue of Modern Mechanix magazine:
         | 
         | > The time is coming fast when the only living thing around a
         | motion picture house will be the person who sells you your
         | ticket. Everything else will be mechanical. Canned drama,
         | canned music, canned vaudeville. We think the public will tire
         | of mechanical music and will want the real thing. We are not
         | against scientific development of any kind, but it must not
         | come at the expense of art. We are not opposing industrial
         | progress. We are not even opposing mechanical music except
         | where it is used as a profiteering instrument for artistic
         | debasement.
        
         | sekai wrote:
         | > Take me away from this AI hell
         | 
         | People used to say that about electricity too, and cars, and
         | planes, and computers. This is just the next step in the chain.
        
           | tgv wrote:
           | So your message is: bend over?
        
             | GMoromisato wrote:
             | There are only two choices:
             | 
             | 1. Try to stop the world from changing.
             | 
             | 2. Adapt to the changes (which requires changing the
             | world). E.g., the dangers of electricity led to electrical
             | codes and licensing for electricians.
        
           | [deleted]
        
       | anshumankmr wrote:
       | What I would love to see in GPT 3 is some sort of a confidence
       | score that they could return, as in how sure their model is that
       | what it returned is accurate and not gibberish. Could this
       | classifier help with that? I am working on a requirement where we
       | are using ElasticSearch to map a query to an article in a
       | knowledge base and then the plan is to send it to GPT 3 to help
       | summarize the article.
       | 
       | Since the ElasticSearch integration is still WIP, I had made a
       | POC to scrape the knowledge base (with mixed results, lots of the
       | content is poorly organized, so the scraped content that would
       | act as prompt to the GPT 3 model wasn't all that good either) and
       | then feed it to GPT 3, but the it couldn't always give the most
       | accurate answers on that. The answers sometimes were spot on, or
       | quite good but other times, not so much. I would say about 30% of
       | the time, it made sense. So if there was a way for me to get if
       | answer was sensible or not, so we could give an error response if
       | the GPT 3's response did not make sense.
       | 
       | The reason why we are doing it cause the client has a huge
       | knowledge base and mapping each question to an answer would be
       | difficult for them.
        
         | ilaksh wrote:
         | OpenAI's text completion has an option to return "log
         | probability" or something for each token. That might apply. You
         | can also turn down the temperature parameter which reduces
         | hallucinations to some degree.
        
       | antirez wrote:
       | Totally useless given it's really inaccurate, and acrively
       | dangerous as people will be considered not producing stuff they
       | actually produced, in case of false positives:
       | 
       | https://twitter.com/antirez/status/1620494358947717120
        
       | kypro wrote:
       | The existence of this tool might actually do more damage if
       | people are using with any level of confidence to check text
       | content as important as exams. I understand why they felt the
       | need to release something, but I think it would be better if this
       | didn't exist.
       | 
       | My guess is that it's very easily gamed. Something ChatGPT is
       | very good at is producing text content in different styles so if
       | you're a student and you run your text through a AI detector you
       | can always ask ChatGPT to write it in a style which is more
       | likely to pass detection.
       | 
       | Finally, I wouldn't be surprised if this detector is mostly just
       | detecting grammatical and spelling mistakes. It's obvious I'm a
       | human given how awful I am at writing, but I wouldn't be
       | surprised if a good write who uses very good grammar, has good
       | sentence structure and who's writing looks a little bit too
       | "perfect" might end up triggering the detector more often.
        
       | tinglymintyfrsh wrote:
       | Meta blocked access to staff from using DALLE2 or ChatGPT from
       | their work Google accounts.
       | 
       | Another possibility is AI generative "stenography" where rules
       | exist to insert hidden meaning or hidden data.
        
         | shagie wrote:
         | > Meta blocked access to staff from using DALLE2 or ChatGPT
         | from their work Google accounts.
         | 
         | I'm trying to reason this one out.
         | 
         | Does Meta have work Google accounts? How would Meta block
         | someone from using a work account to auth to some other
         | service?
         | 
         | Are people working at Meta signing into OpenAI with a Google
         | account?
         | 
         | (seriously, if it isn't work related - don't use a work
         | account)
         | 
         | Is Meta concerned about people uploading their code (or
         | downloading code) from ChatGPT? What is their policy on
         | Copilot?
         | 
         | Why are Meta people using DALLE while at work?
        
       | rafaelero wrote:
       | 26% of true positive and 9% of false positive is just terrible. I
       | don't see how this can be usable.
        
         | yboris wrote:
         | Quote:
         | 
         | > In our evaluations on a "challenge set" of English texts
         | 
         | I wonder if they mean "challenge" in the sense that these are
         | some of the hardest-to-discern passages. Meaning that with
         | average human writing / average type of text, the % is better.
         | I'm unsure.
        
       | [deleted]
        
       | PUSH_AX wrote:
       | Might be better to store outputs and implement a way to detect it
       | within a larger piece of text. Think like a reverse Shazam but
       | for text.
        
       | blueberrychpstx wrote:
       | Doesn't this get us into a sort of perpetual motion machine with
       | the back and forth being
       | 
       | 1) generate paragraph of my essay 2) feed it into this classifier
       | 3a) if AI -> make it sound more human 3b) if human -> $$$ Profit?
       | 
       | Obviously it could be more fine tuned than this and is in general
       | good to know, but I just love watching this game play out of ...
       | errr how do we manage the fact that humans are relatively less
       | and less creative compared to their counterparts.
        
         | dakiol wrote:
         | The thing is point 1 costs money (I imagine at some point,
         | ChatGPT will cost money), but point 2 also will cost money. So
         | OpenAI will charge you double for generate AI-written text that
         | is undetectable. Poor move. I could happily pay a lot for
         | ChatGPT, but if they also commercialize a (more accurate)
         | classifier then I won't use ChatGPT at all.
        
       | bluefone wrote:
       | What's the objective of tests in education? to test human
       | biological abilities of memorizing and analyzing? But in real
       | life, these abilities are always augmented by tech. It is like
       | testing your prospective employees for their running skills,
       | though they don't really need running to commute.
        
       | karaterobot wrote:
       | Given a set of determinations about whether a given source text
       | was written by an AI or not, how do we know whether those
       | classifications were made by an AI or a human? We need to train
       | an AI classifier classifier, pronto!
        
       | happytiger wrote:
       | 9% false positives? That's a troubling level of falsies.
       | 
       | The implications of using this tool are fun to think about
       | though.
       | 
       | If it had a very low level of false positives, but wasn't very
       | good at identifying ai text, it would be very useful.
       | 
       | But false positive rates above very, very low levels will
       | undermine any tool in this category.
        
         | klabb3 wrote:
         | Yeah it's useless currently and will become more useless
         | quickly, because people will scramble AI generated text, mix in
         | human edits and people who use AI generators a lot will mimic
         | their writing style. In short, the SNR will be abysmal outside
         | of controlled environments.
         | 
         | Im pretty sure the smart people at OpenAI know this. I think
         | this is a PR move signaling that they are "doing something",
         | looking concerned, yet insisting that everything is under
         | control. In reality, nobody can predict the societal rift that
         | this will cause, so this corporate playbook messaging is
         | dishonest in spirit and muddies the waters. This is bad, both
         | long term for OpenAI's trust, but also because muddy waters
         | makes it harder to have fruitful discussions about safeguards
         | in commercial deployments of this tech.
         | 
         | That said, they're incorrectly getting blamed for controlling
         | the _use_ of this tech, they're no more than a prolific and
         | representative champion of it. But the cat is out of the bag,
         | and they absolutely cannot stop this train, and so they
         | shouldn't be blamed for not trying.
        
       | IncRnd wrote:
       | I think the main takeaway from this is that both the AI
       | classifier and the AI output come from the same company,
       | openai.com.
       | 
       | That likely explains the extremely low accuracy of the AI
       | classifer. This is user training for chatgpt output being
       | accepted as human authored text.
        
       | maxehmookau wrote:
       | 26% seems awfully low for a tool of this importance. Granted they
       | are upfront about it, but still, it doesn't seem immediately
       | useful to release it to the public.
        
       | nothrowaways wrote:
       | Sha is all you need LoL.
       | 
       | Sha every generated text and do a bloom filter... I guarantee
       | much better than 27%.
        
         | klabb3 wrote:
         | A cryptographic hash function on stochastically generated
         | variable-length natural language? That sounds.. not very
         | effective.
        
         | TDiblik wrote:
         | I mean it depends, is number of possible answears per prompt
         | known? If so, can we even realistically calculate number of
         | possible prompts? AFAIK ChatGPT answears even tho there is a
         | gramatical mistake in your sentence, does that affect
         | answears/is that considered a new prompt? Ok, let's say you
         | feed all N possible answears in and make a rainbow table of
         | hashes. Sha is basically random (not really, but lets not go
         | there), after I generate my text using AI (which would get
         | flagged by your detection system) and change few letters/words
         | here and there, your whole Sha rainbow table becomes useless -
         | right? I could be totally wrong, but I don't see Sha as a way
         | to solve this problem, because of these complications :/
        
       | oldstrangers wrote:
       | Funny, I helped ChatGPT write a fake scientific article the other
       | day for a project I made (https://solipsismwow.com/). It's
       | result: The classifier considers the text to be unlikely AI-
       | generated.
        
       | knaik94 wrote:
       | I fed it some of my old HN comments, comfortably longer than the
       | 1000 character minimum and found 9/10 times the classifier marked
       | it as "unclear". This was a comment from 2020.
       | 
       | I found out that just repeating a sentence a few times causes it
       | to classify something as "likely". This is not only an unwinnable
       | race, I know for a fact some teachers will use anything above
       | unlikely to mean used AI. At some point in the future, compute
       | will be cheap enough to where a lot of online content will be put
       | through a similar classifier. I am curious to know how
       | conservative the estimates are. This was a non-technical comment,
       | I wonder if a more technical comment would be even more likely to
       | be misclassified.
        
       | bhouston wrote:
       | I used ChatGPT to rewrite a number of paragraphs of my own
       | writing earlier today. It rewrote them completely. I just pasted
       | those into this detection tool and it responded for both "The
       | classifier considers the text to be unlikely AI-generated."
       | 
       | So it can not detect AI re-written/augmented text it seems, even
       | things that ChatGPT itself generates.
        
         | mitchdoogle wrote:
         | Well OpenAI admits it is wrong most of the time, so your
         | results are consistent with what is expected
        
       | m3kw9 wrote:
       | I mean the way they could do it is to save all model outputs and
       | then let users input and match that would guarantee. You could
       | make changes but it'd match a high percentage. Of course, a
       | student can test it himself to make sure to change the text so it
       | will be < a certain threshold.
       | 
       | Also ignore prompts that purposefully output a pre written text
       | incase ppl want to mess with the system
        
       | netsec_burn wrote:
       | Couldn't it just use conversation history, where it already
       | stores the responses, and search within that?
        
       | hkalbasi wrote:
       | Open AI knows every text that is generated by ChatGPT, so it can
       | run a simple search algorithm instead of an AI model and achieve
       | way higher true positive rate?
        
       | cloudking wrote:
       | Let's assume this tool works better in the future and is used in
       | education, what are the next steps for a teacher after
       | identifying AI written homework?
        
         | macksd wrote:
         | _Allegedly_ identifying AI written homework.
        
         | antihero wrote:
         | Use AI to write a letter to their parents
        
         | amelius wrote:
         | Next step is to stop worrying about it, just as they did with
         | automated spelling correction.
        
       | jupp0r wrote:
       | I can now go and incorporate this detector into the retraining
       | pipeline for my evil language model or put it at the end of my
       | architecture to emit only human-like results (as labeled by the
       | detector). I don't see how detectors can win this cat and mouse
       | game.
        
       | bioemerl wrote:
       | Now they get to monetize Chat GPT and this new classifier.
       | Starting fires and providing the extinguishers, charging for both
       | of them.
       | 
       | All while pretending to be morally responsible in order to do it.
        
         | sharemywin wrote:
         | They did say big tech was starting to take over the role of
         | government.
        
           | titaniumtown wrote:
           | What does this have to do with the government?
        
             | urbandw311er wrote:
             | I can see the point the parent comment is trying to make.
             | The applications of this classifier include potentially
             | arbitrating in decisions relating to things like education
             | (ie assessment of grades) which is a matter traditionally
             | associated with the public sector.
        
         | [deleted]
        
         | dakiol wrote:
         | No way. If I were a student trying to use ChatGPT in order to
         | improve my writing, I would definitely not pay for it if I know
         | my teachers are using their AI Classifier. I mean, what's the
         | point? I don't think OpenAI will be able to reach that (big)
         | chunk of potential customers that want to use ChatGPT to write
         | essays, social media comments, etc. if OpenAI at the same time
         | sells their classifier. It's just nuts.
        
           | Balgair wrote:
           | If you're in a STEM-y major, now is _the_ time to pick up an
           | essay heavy humanities degree. If you 're in an essay heavy
           | humanities degree, now is the time to pick up a few more.
           | 
           | Think of it like this: How much is your degree costing
           | you/your family?
           | 
           | On average, it's ~$150k.
           | 
           | How much would an extra degree cost you? How about 80% of an
           | extra degree? How about 20%? How about all those books and
           | course materials? Those are in the $1000s already, per
           | degree. (and yes, we all have head of torrents).
           | 
           | What I'm saying is that chatGPT can easily be seen as 'just
           | another college cost'. And when it's 'for education', the
           | justification for those costs gets a lot more flexible. I can
           | see students spitting out ~$10,000 for something like chatGPT
           | that is specific towards their major, will pass these
           | classifiers, and gets you just ~25% of the way to your major
           | (however that is defined). The cost 'for the masses' could
           | easily be in the ~$1000s for a per class subscription.
           | 
           | With ~20M college students in the US, assuming even a 10%
           | uptake rate, you're in the _billions_ of dollars of nearly
           | pure profit (the overhead would be negligible).
           | 
           | The money potential of something like chatGPT is just too
           | damn high. Too high for essays to ever go out of style, as
           | the lobbying effect of companies like this will force
           | colleges to keep these essays that they are making the money
           | off of. Oh, any they'll sell the classifier to the colleges
           | to. Arming both sides!
        
           | eric-hu wrote:
           | You make a well reasoned argument here. At the same time,
           | respectfully, you may be too intelligent to be the target
           | audience for the student service. Can you see a college
           | version of yourself paying $500 to write a college essay for
           | yourself today?
        
       | caxco93 wrote:
       | Could a newer language model use this to penalize output that
       | fails the classifier during training?
        
       | twayt wrote:
       | Its a great way to swindle overzealous educators. The kind that
       | do hugely unproductive things to students because they think that
       | its in their best interests.
        
       | gillesjacobs wrote:
       | I found a great way to fool these detectors: piping output
       | generative models.
       | 
       | 1. Generate text by promoting ChatGPT.
       | 
       | 2. Rewrite / copyedit with Wordtune [1], InstaText [2] or Jasper.
       | 
       | This fools GPTZero [4] consistently.
       | 
       | Of course soon these emotive, genre or communication style
       | specialisations will be promptable too by a single model too.
       | Detectors will be integrated as adversarial agents in training.
       | There is no stopping generative text tooling, better adopt and
       | integrated it fully into education and work.
       | 
       | 1. https://www.wordtune.com/
       | 
       | 2. https://instatext.io/
       | 
       | 3. https://www.jasper.ai/
       | 
       | 4. https://gptzero.me/
        
       | kriro wrote:
       | I'd rather try to empower students to use ChatGPT as a tool or
       | incorporate it into class work than worry about cheating. This is
       | a pretty unique time for teachers to step up and give their
       | students a nice edge in life by teaching them how to become early
       | adopters for these kinds of things.
        
         | discreteevent wrote:
         | The purpose of writing an essay is to teach students how to
         | think. Being able to prompt is a subset of being able to think.
         | If you only teach them to prompt you have taken away any edge
         | they might have had. Its like those schools that think that
         | getting more ipads will make the kids smarter.
        
       | janalsncm wrote:
       | I posted yesterday about how GPTZero was such a horrible idea,
       | and now this nightmare. Detecting AI generated text is
       | _impossible_ without knowing what model was used. It could be
       | more feasible for them to detect text written by their own models
       | given that they know the logits. But OpenAI doesn't have a
       | monopoly on large language models.
       | 
       | However, the consequences for false positives are so dire that I
       | would never want to create such a tool. It will be misused, and
       | hiding behind "it's just information" is no excuse. You don't
       | admit testimony unless it's reliable.
       | 
       | At one time I thought that OpenAI was out to make the world a
       | better place. But now it's clear to me that ethics is the last
       | thing on their mind.
        
       | brap wrote:
       | Wouldn't better classifiers (discriminators) necessarily lead to
       | better generators that can trick them?
        
       | alphabetting wrote:
       | Tools like this are promising and needed but this still needs
       | work. I gave it two sets of 100% AI generated text. It said
       | possibly for one and very unlikely for the other. Very unlikely
       | example here: https://i.imgur.com/XoFQuYE.png
       | https://i.imgur.com/PwGzTBM.png
        
       | dqpb wrote:
       | The LLM watermark seems like a better approach.
       | 
       | https://arxiv.org/abs/2301.10226
        
         | mlsu wrote:
         | This one is very cool. Steps are:
         | 
         | - Generate seed of LLM output token t0 - Use the seed to mark
         | output tokens into "red" and "green" list - For token t1, only
         | sample from "green" list when producing the next token
         | 
         | Repeat.
         | 
         | Now, let's say you read a comment online and you want to see if
         | it's written by a robot or not. It's 20 tokens long. For each
         | token, you reconstruct the blacklist. If they use "red" words
         | with 50% probability, you can safely assume that they are
         | human. But if they use only green words, you can begin to
         | assume that they're a bot very quickly.
         | 
         | For simplicity's sake, if you mark half of the tokens as "red"
         | for each new token, correctly writing 20 tokens in a row that
         | are on the "green" list is like flipping a coin and getting
         | heads 20 times in row -- vanishingly unlikely. This allows you
         | to very robustly watermark even short passages. And if the
         | human makes adversarial edits, they still have to fight that
         | probability distribution; 19 heads and 1 tails is still
         | vanishingly unlikely.
        
       ___________________________________________________________________
       (page generated 2023-01-31 23:00 UTC)