[HN Gopher] New AI classifier for indicating AI-written text ___________________________________________________________________ New AI classifier for indicating AI-written text Author : davidbarker Score : 236 points Date : 2023-01-31 18:11 UTC (4 hours ago) (HTM) web link (openai.com) (TXT) w3m dump (openai.com) | peter303 wrote: | OpenAI archives every request and output text. Why not compare | suspected A.I. text against this? | nothrowaways wrote: | is this the new Anti virus? | minimaxir wrote: | > Our classifier is not fully reliable. In our evaluations on a | "challenge set" of English texts, our classifier correctly | identifies 26% of AI-written text (true positives) as "likely AI- | written," while incorrectly labeling human-written text as AI- | written 9% of the time (false positives). | | That is an interesting mathematical description of "not fully | reliable". | thatcherthorn wrote: | Similarly to deep fakes, it seems that creating tools to | distinguish between human and AI generated data will cause more | harm then good if. Models to distinguish will never be perfect | and an actor that can fool such a model will be very effective. | LanceJones wrote: | What about introducing a new code of ethics that students sign? | They agree to disclose the level of help (1 - 10) provided by GPT | and the teacher/instructor/prof grades accordingly. Silly? | Magi604 wrote: | Just filter your text through Quillbot to get around "AI | Detection". | | https://quillbot.com/ | | Demonstration: https://youtu.be/gp64fukhBaU?t=197 | | The arms race continues... | GOONIMMUNE wrote: | This seems like a sort of unwinnable arms race. Can't the people | who work on generative text models use this classifier as a | feedback mechanism so that their output doesn't flag it? I'm not | an AI expert, but I believe this is even the core mechanism | behind Generative Adversarial Networks. | londons_explore wrote: | Detectors can be a black box "pay $5 per detection" type | service. | | That way, you can't fire thousands of texts at it to retrain | your generative net. | | Plagiarism detectors in schools and universities work the same. | In fact, some plagiarism detection companies now offer the same | software to students to allow them to pay some money to pre- | scan their classwork to see if it will be detected... | Buttons840 wrote: | Make a model to detect cheating. Market it as "a custom built | and unique model to detect cheating; able to catch cheating | that other models miss!" It's all 100% true. Market and | profit. | telotortium wrote: | $5 is way too high of a price to use regularly. In any case, | if it's only available to education institutions, teachers | and grad students are poor enough to sell access to it to | people on the dark web for the right price. | mritchie712 wrote: | There's also always going to be more capital going towards | building better generators than better detectors. | standardly wrote: | IMO we need a mass-adopted digital signature solution using | biometric identifiers such that publishing an article, or even a | comment, can be signed by and only by a biological human. | khyryk wrote: | Any ideas on how the "only by" would work? I'm not seeing a way | around pasting generated text and signing it as one's own work. | Proof of work solutions would have to have a high cost for | anyone to care, otherwise there would be bots "proving" the | work of writing an essay by generating it in phases like a | human would edit drafts. | standardly wrote: | This is the really first time I've thought about this, so | uh... No, no ideas. | | One's identity would need verified concurrent with creation | of a text. I am not really satisfied with the idea of a | specialized word processor or input device that does | biometric validation, I'd rather have a specific, | standardized protocol. I wonder if this is already deemed | impossible, or if someone is working on the problem. | Someone1234 wrote: | You want to ban privacy? | standardly wrote: | Proving you are a human and not a computer doesn't have to | publicly reveal a single thing about you (other than the | crypto signature). Think of it like an SSL cert for a person | rather than a server. I'm purely spitballing man. It's a | problem someone will eventually have to come up with a | solution for and I think we already have the tools. | GMoromisato wrote: | The problem is that one can't be trusted to sign their own | work--otherwise they could sign AI-generated text. This | only works if a trusted human signs your work after | watching you generate it. | standardly wrote: | Easy. We just need an AI that serves as a public notary. | Wait a minute.. | | It really is an interesting problem to think about. The | other commenter pointed out you could just sign an AI | text - I see all the issues, but my gut feeling tells me | there is an elegant solution somewhere. | GMoromisato wrote: | 1. This is an arms race. You can build a generative AI that | avoids generating text caught by the classifier. | | 2. Maybe teachers will assign rare or even fictional topics that | cannot be found in the AI training corpus. Maybe a teacher could | use an AI to generate essay prompts that are hard for other AIs | to write essays for. | | 3. Is this a problem long term? If an AI can generate an essay | that's indistinguishable from a human-generated one, then why do | we need to learn how to write essays? Maybe we should just learn | how to write good prompts. | | See also: "Should calculators be banned in school?", "Do students | need to learn cursive?", "Why should I learn Greek instead of | just reading a translation of Homer?" | w_for_wumbo wrote: | So for me, I like to write my own ideas but use AI to reword them | to be succinct and readable. I'm worried that usage would flag as | AI text. | BulgarianIdiot wrote: | I wrote some text about the subjectivity of communication and the | nature of natural language, and I kept it very neutral, formal | and verbose. And it said "this text is likely AI". | | So, as honestly was predictable, people who rely on this tool | being accurate, will inflict a lot of pain on unsuspecting | individuals who simply write like GPT writes. | gzer0 wrote: | How good is this really? | | I input an article that was written, directly by chatGPT, and it | came back as "The classifier considers the text to be unclear if | it is AI-generated." This article was not edited, not put through | any paraphrasers, or anything. Interesting. | | Furthermore, these efforts are quite futile. One can just go to | numerous paraphrasers such as quillbot.com, run it through there, | and then for added obfuscation, either use an entirely different | paraphraser (Microsoft Word now has this capability, natively in | the beta channels at least, btw). | | Yeah, for someone who has intentions of bypassing this, there | will always be a way. It's a good effort, for sure. But, I don't | see this doing much in terms of truly distinguishing AI vs non AI | generated outputs. | moneywoes wrote: | 26% good | [deleted] | cjrd wrote: | This is all predicated on existing conditions, where AI-written | text hasn't influenced the way that humans write. As the years | pass and these tools become a common way to at least "spot check" | your own writing, I imagine that we will all begin to write in | styles that are increasingly similar to AI-written text. | felipelalli wrote: | The irony here is that tool can be used by the AI in the future | to self-training and be more and more like a human. | antiterra wrote: | Heck, you can use it as a manual adversarial output filter as | it is right now. | O__________O wrote: | Related option that has benchmarks and research paper; appears | they intend to release code & datasets too. | | DetectGPT: Zero-Shot Machine-Generated Text Detection | | - https://news.ycombinator.com/item?id=34557189 | ilaksh wrote: | It can't possibly work reliably. It's going to be very | challenging for honest kids because almost everyone is going to | be cheating. | | The reality is that learning to think and write will be harder | because of the ubiquity of text generation AI. This may be the | last generation of kids where most are good at doing it on their | own. | | On the other hand, at least a few will be able to use this as an | instant feedback mechanism or personal tutor, so the potential | for some carefully supervised students to learn faster is there. | | And it should increase the quality of writing overall if people | start taking advantage of these tools. It's going to fairly | quickly become somewhat like using a calculator. | | Actually it probably means that informal text will really stand | out more. | | I am giving it the ability to do simple tasks given commands like | !!create filename file content etc. | | It's actually now very important for kids to adapt quickly and | learn how to take advantage of these tools if they are going to | be able to find jobs or just adapt in general even if they don't | have jobs. It actually is starting to look like everyone is | either an entrepreneur or just unemployed. | | Learning about all the ways to use these tools and the ones | coming up in the next few years could be quite critical for | children's education. | | There are always going to be luddites of course. But it's looking | like ChatGPT etc. are going to be the least of our problems. It | is not hard to imagine that within twenty years or so, anyone | without a high bandwidth connection to an advanced AI will be | essentially irrelevant because their effective IQ will be less | than half of those who are plugged in. | logifail wrote: | > it should increase the quality of writing overall if people | start taking advantage of these tools | | Perversely, it might also dramatically decrease reading, if | there's no incentive for anyone to need to properly understand | anything. | | A pretty dire scenario :( | LanceJones wrote: | Feels like the battle between computer virus creators and anti- | virus software all over again. | ipnon wrote: | ChatGPT is already quite effective at deceiving these models with | simple prompts like, "write the output in a way that seems human | and not AI generated, so as to bypass AI-written text detectors." | gunshai wrote: | Or educators could be forced to evolve around a new tool that | _gasp_ requires a different measurement of skill one that is much | harder to fake. | | The obvious one that already exists ... ORAL EXAMS. | botplaysdice wrote: | Is this new Turing test? Who can verify the classifier itself? | siliconc0w wrote: | Horrible idea, you can't eliminate the false positives and these | are going to impact innocent students or used to re-enforce | teacher biases. | lumost wrote: | I don't see why teachers don't use this as an opportunity to | accelerate curriculum. Every student now has a cheap personal | instructor. Why not raise the bar on difficulty and quality | expectations for assignments? | odipar wrote: | As always it is the journey that matters (writing), not the | outcome (the essay). | | For example, students could record their writing of an essay with | a keylogger or something. | | Additionally - with the use of some advanced zero-knowledge algos | or crypto timestamp provenance - it should be possible to prove | that they have written the essay, without revealing their | recording. | ekanes wrote: | Yes, sort of, though if the economic incentive was high enough, | someone could connect the AI to input through the keyboard and | "type" out the essay. At scale it would be cheap. You could | record video of you typing, which would work for some time | until at some point video fakes get advanced enough... sigh. | barbazoo wrote: | > Our classifier is not fully reliable. In our evaluations on a | "challenge set" of English texts, our classifier correctly | identifies 26% of AI-written text (true positives) as "likely AI- | written," while incorrectly labeling human-written text as AI- | written 9% of the time (false positives). | | > The classifier is very unreliable on short texts (below 1,000 | characters). Even longer texts are sometimes incorrectly labeled | by the classifier. | beefman wrote: | We've lined up a fabulous type of gorilla... | animanoir wrote: | What's the point of launching this when they admit it doesn't | works most of the time and adds to the confusion? We should just | embrace the AI Chaos. | dukeofdoom wrote: | I've used ChatGTP to generate some code for me and almost every | time it was a learning experience. I saved a lot of time | searching, and it just gave me what I was after. Observing how | someone or something like AI can solve a problem, is fast way to | learn. I don't see a problem with this. Teachers can always just | use in person tests to check if a student mastered the concepts. | Math teachers got over students using calculators for homework, | and can check understanding just fine on tests. It used to be | that students would solve home work problems by candle light, | with abacus and look up tables. Yet no one want to mandate back | to that, just because it made homework harder. | gibsonf1 wrote: | Wow, that is truly not a good classifier with success that low. | gzer0 wrote: | On a side note: | | My online MBA has switched from TurnItIn to this website: | https://unicheck.com | | And the benefit of this is... incredible. It allows students to | purchase X amount of pages to check the plagiarism. Full reports | on your work before hand. | | Not sure why this move was made, but it will be interesting to | see once they integrate "possible AI detection" into UniCheck. | e_i_pi_2 wrote: | TurnItIn also let's students submit beforehand last I heard - | if people are going to use tools like this then students should | also get full access to make sure they won't get flagged ahead | of time. | | I had some professors where you could fully grade your own | assignments before submitting and it was the best courses I've | ever taken - you're fully given all the knowledge to figure out | what you know and what you don't | nerdponx wrote: | Do we have GANs for text yet? | supernova87a wrote: | I was thinking that there have been swings of what is valued (or | trusted) in education and testing (or voting, promoting) to prove | that someone has the goods. | | At one time it was live oration skill, and then people thought, | "maybe that disfavors people who are introverted or whose talent | comes from thinking and writing". | | Then, at another time, it was thought, "well you have to test | because sometimes time pressure and not being able to go away to | think about something for as long as you have time to work on it | produces something valuable". | | Yet another time, "let people do homework to prove their value | through effort who don't test well" but now who knows whether | they actually were the ones doing the work? | | I wonder what this development will produce? | Logans_Run wrote: | In my {semi-tongue-in-cheek} opinion - Thus begins the origins of | Arnies' Skynet. | | The {semi-cynical} part of my corporate soul screams 'oooh, what | a great way to boot-strap your own ML/AI and have marketing | trumpet it as 'So good that it was trained on OpenAi data and | Human(tm) Error Labelling!'. | | The Futurist (Luddite???) in me shudders at the thought of two | very powerful computer systems (models) working to out compete | each other in a way that turns out to be 'rather unfortunate' | a.k.a 'Oh shit! We should have thought about how we (the human | race) can somehow be able to tell machine output vs. human | output'. But that is a discussion I will leave to the lawyers and | ethicists to thrash out a solution/definition that outputs a | simple binary Y/N with a Five-Nines certainty. | | But Meh - A) The above is a rather random comment and B), Time | will tell and hopefully this and other similar efforts remain | 100% Libre as in 'free to all individuals forever and is non- | revocable' | omalleyt wrote: | I bet you can utterly defeat this by adding one or two typos into | the text | Sol- wrote: | Given the weak accuracy - which is of course understandable given | the difficulty of the task - this mostly seems like a fig leaf | that lets them pretend to do something about the potential | problems of AI generated text becoming more and more pervasive. | | Probably one shouldn't fault them for trying, but the cat is out | of the bag I think. | [deleted] | kiru_io wrote: | It would be interesting to know how this compares againast | GPTZeor [0]. | | [0] https://gptzero.me/ | ulizzle wrote: | Does anyone actually even believe that this A.I generated writing | is any good? The standards seem extremely low. | | Can it beat Tolkien or Asimov? No. Then what is even the point of | all this propaganda? | swatcoder wrote: | It's not good by any means, but neither is most assigned, | casual, or rote writing. | | 10,000 students are writing some crappy essay on the Great | Depression every day, and ChatGPT has probably trained on a | zillion of these. It's optimized to produce those mediocre | essays really efficiently, and that's very disruptive to how | teachers have been working with students for the last century | or so. The internet (and fraternity filing cabinets) were | already straining this kind of pedagogy, but ChatGPT breaks it | wholesale. | gunshai wrote: | What I find interesting about your comment, is that while it | can produce that mediocre essay. It can also produce a much | better one. | | How? Well it's all about how you interact with it. But the | majority of use as you said will be taking the first output | given the input. What's amazing to me is learning to reject | the output in favor of our own vision or conflicting ideas. | | If chatGPT helps people get past blank page syndrome and | interact with their own ideas better to see the limits of | what is returned contrast to what they think. That would be | an incredibly useful tool for anyone trying to learn. | ropintus wrote: | It writes better than me (I am an ESL) in writing, and that's | enough reason for me to use it. | | It might not be better than Tolkien, but so what, 99.99% people | are also not better than Tolkien and ChatGPT can add value to | the life of these people. | sebzim4500 wrote: | No offence, but I am confident that you also can't write better | than Tolkien or Asimov. | | Does that mean you should delete all your comments and stop | posting? | dqpb wrote: | This is such a comical point of view I can't tell if it's | sarcasm or a genuine question. | | Yes, I would wager that ChatGPT can write better than at least | 90% of living human beings. | jfk13 wrote: | And yet I'd rather hear what the living human beings have to | say. They may write poorly, but at least they have actual | thoughts and ideas -- no matter how misguided or bizarre -- | that they're trying to communicate. | urbandw311er wrote: | This all feels a little like OpenAI trying to get a head start on | plausible deniability. | | A bit like Apple ensuring its consumer devices can't be hacked to | bypass completely any arguments about whether they | should/shouldn't aid the state in providing a back door. | mindcrime wrote: | This is just going to lead to an arms-race like with CAPTCHA. | Next project announcement: an AI text generator that can evade | the AI-text-detector... and so on. | WestCoastJustin wrote: | Great growth hacking idea. Feed ChatGPT into this and test if it | is getting detected. You'll increase usage of both products. | keepquestioning wrote: | [dead] | dakiol wrote: | Isn't this a poor business move from OpenAI? I mean, if they make | possible to distinguish (100% in the future) between AI-written | text and human-written text... then a big chunk of potential | OpenAI's customers will not use ChatGPT and similars because | "they are gonna be caught" (e.g., students, writers, social media | writers, etc.) | anhner wrote: | 1. Create AI capable of writing almost human-level text and make | it generally available. | | 2. Make said AI generate text in a way that makes it possible to | detect that it was written by a machine. | | 3. Create another AI that detects text written by above AI | | <--- You are here | | 4. Put your detector service behind a paywall | | 5. Every time a competitor appears for your generator, change its | steganography so that only your detector correctly classifies it | | 6. Profit | gunshai wrote: | Talk about a local maxima, yuck. | michaericalribo wrote: | I foresee a dystopian education outcome: | | 1. Classifiers like this are used to flag _possible_ AI-generated | text | | 2. Non-technical users (teachers) treat this like a 100% | certainty | | 3. Students pay the price. | | Especially with a true positive rate of only 26% and a false | positive rate of 9%, this seems next to useless. | saltysnowball wrote: | This is already an issue, I'm a student in college right now | and even technical professors are operating with full | confidence in systems like turnitin which try their hand at | plagiarism detection (with often much higher false | negative/false positive rates). The problem was even more | prevalent in high school where teachers would treat it as a | 100% certainty. Thus, I think that OpenAI making atleast a | slightly better classification algorithm won't make the state | of affairs any worse. | Kiro wrote: | Funny how everyone praised GPTZero that has even worse rates | but starts being skeptical when it's OpenAI, the new bad guy. | [deleted] | dns_snek wrote: | "Everyone" didn't. In fact, the 5 top comments in that | thread[1] all called it useless or pointed out serious flaws. | | [1] https://news.ycombinator.com/item?id=34556681 | janalsncm wrote: | I urge anyone with time to write to tech journalists explaining | why this is so bad. Given previous coverage of GPTZero they | don't seem to be asking the right questions. | tremon wrote: | I dare hope for a less dystopian outcome: | | - teachers will assign less mind-numbing essay homework | assignments and focus more on oral interviews. | screye wrote: | Hilariously, this has already happened with music composition. | Especially drumming. | | Since the advent of drum machines, a lot of younger players | have started playing with the sort of precision that drum | machines enable. eg: The complete absence of swing, and clean | high-tempo blasts/rides. | | So you'd get accusations of drummers not being able to play | their own songs, because traditional drummers think such | technically complex and 'soulless' performances couldn't | possibly be human. Only to then be proven wrong, when it turns | out that younger players can in fact do it. | | The machine conditions man. | TheRealPomax wrote: | So, status quo then? This is already the case for educational | software that's used to detect plagiarism. People get wrongly | flagged, and then you'll have to plead your case. | | But the times software like this finds actual problems vastly | outnumbers of times it doesn't, and when you choice is between | "passing kids/undergrads who cheat the system" and "the | occasional arbitration", you go with the latter. Schools don't | pay teachers anywhere _near_ enough to not use these tools. | michaericalribo wrote: | Given the published true and false positive rates, it's clear | that the true positives do not "vastly outnumber" false | positives. | PeterisP wrote: | Currently the false positive rate is _far_ lower. E.g. I get | 500-ish submissions over a school year then a 1% false | positive rate would mean I 'd falsely accuse 5 innocent | students annually, which isn't acceptable at all - and a 9% | FP rate is _so_ high that 's even not worth investigating; do | you know of any grader who has the spare time to begin formal | proceedings/extra reviews/investigation for 9% of their | homework? | | For plagiarism suspicions at least the verification is simple | and _quick_ (just take a look at the identified likely | source, you can get a reasonable impression in minutes) - I | can 't even imagine what work would be required to properly | verify ones flagged by this classifier.. | TheRealPomax wrote: | > I can't even imagine what work would be required to | properly verify ones flagged by this classifier. | | Yet. | flatline wrote: | At the same time the classifier is improving, the | generative models are improving. It's a classic arms race | and this equilibrium is not likely to shift much either | way. We are talking about models that approximate human | behavior with a high degree of accuracy, I think the goal | would be to make them indistinguishable in any meaningful | way. | notahacker wrote: | > This is already the case for educational software that's | used to detect plagiarism. People get wrongly flagged, and | then you'll have to plead your case. | | How often is that the case though? A while since I've had to | worry about it, but I thought plagiarism detection generally | worked on the principle of looking for the majority of the | content being literal matches with existing material out | there with only a few small edits, which - unlike using some | "AIish" turns of phrase a bot wrongly attributes to humans 9% | of the time and correctly attributes to AI with a not much | better success rate - is pretty hard to do accidentally. | i_have_an_idea wrote: | A long time ago when I was a student, I would run my papers | through Turnitin before submitting. The tool would | sometimes mark my (completely original) work as high as mid | 20% similarity. | | As a result, I have taken out quotes and citations to | appease it and not have to deal with the hassle. | | I expect modern day students will resort to similar | measures. | notahacker wrote: | IIRC the marker got the same visualization that you used | to take out quotes and citations that highlighted that | the similar bits were in fact quotes and citations! | | Maybe high school is a different matter, but I'm pretty | sure even the most technophobic academic knows that | jargon, terse definitions and the odd citation | overlapping with stuff other people have written is going | to make a similarity of at least 10% pretty much | inevitable, especially when the purpose of the exercise | is to show you understand the core material well enough | to cite and paraphrase and compare it, not to generate | novel academic insight or show you understood the field | so well you didn't need to refer back to the source | material. The people they were actually after were the | ones that downloaded something off essaybank, removed a | couple of paragraphs and rewrote the intro to match the | given title and ended up with 80%+ similarity | ren_engineer wrote: | >false positive rate of 9% | | bringing the Roman decimation to the classroom based on AI, | this is the future | kmkemp wrote: | Any solution here is just an arms race. The better AI's get at | generating text, the more impossible the job of identifying if | an AI was responsible for writing a given text sample. | e_i_pi_2 wrote: | You could even just set up a GAN to make the AI better at not | being detected as something written by an AI, I don't see a | good general solution to this, but I also see it as a non- | issue - if students have better tools they should be able to | use them, just like a calculator on a test - that's allowed | on tests because you still need to understand the concepts to | put it to use | tshaddox wrote: | It's almost as if you need to give exams in person and watch | the students if you don't want them to cheat. This is | fundamentally no different than cheating by writing notes on | your hand in an exam or paying someone to write a take-home | essay for you. It's cheaper than the latter, but that just | means the lazy curriculum finally needs to be updated. | dougmwne wrote: | The cheating students who know how to use the classifier will | be the big winners. | cjbgkagh wrote: | > false positive rate of 9% | | Yeah, that is useless. You couldn't punish based on that alone | and students will quickly figure out to never confess. | sometimeshuman wrote: | Sorry for the tangent but a surprising number the general | public doesn't know the meaning of percent[1]. So even if a | teacher is told those percentages many wouldn't know what to | conclude. | | [1] Me, giving young adults that worked for me a commission | rate. Then asking if their commission rate is 15% and they sell | $100 of goods what is their payment. Many failed to provide an | answer. | LarryMullins wrote: | > _2. Non-technical users (teachers) treat this like a 100% | certainty_ | | This is the part that needs to be addressed the most. Teachers | can't offload their critical reasoning to the computer. They | should ask their students to write things in class and get a | feeling for what those individual students are capable of. Then | those that turn in essays written at 10x their normal writing | level will be obvious, without the use of any automated cheat | detectors. | | I was once accused of cheating by a computer; my friend and I | both turned in assignments that used do-while loops, which the | computer thought was so statistically unlikely that we surely | must have worked together on the assignment. But the | explanation was straight forward; I had been evangelizing the | aesthetic virtue of do-while loops to anybody that would listen | to me, and my friend had been persuaded. Thankfully the | professor understood this once he compared the two submissions | himself and realized we didn't even use the do-while loop in | the same part of the program. There was almost no similarity | between the two submissions besides the statistically unlikely | but completely innocuous use of do-while loops. It's a good | thing my professor used common sense instead of blindly | trusting the computer. | londons_explore wrote: | > blindly trusting the computer. | | Professors blindly trust the computer not out of laziness, | but to protect themselves from accusations of unfairness... | | "The work was detected as plagiarism, but the professor | overrode it for the pretty girl in class, but not for me" | mitchdoogle wrote: | Seems like something like this should only be used as a | first-level filter. If the writing doesn't pass, it | warrants more investigation. If no proof of plagiarism is | found, then there's nothing else to do and professor must | pass the student | TchoBeer wrote: | with a 26% true positive rate that seems flawed. | asah wrote: | seems like this is the future... 1. first day of class, write | a N word essay and sign a release permitting this to be used | to detect cheating. The essay topic is chosen at random. | | 2. digitize & feed to learning model, which detects that YOU | are cheating. | | upside: this also helps detect students who are getting help | (e.g. parents) | | downside: arms race as students feed their cheat-essays | (memorize their essays?) into AI-detection models that are | similarly trained. | feanaro wrote: | There are also some countries that don't fetishize cheating | this much so perhaps they will just continue not caring. | kaibee wrote: | The funniest implication here is that the student's writing | skill isn't expected to improve. | eh9 wrote: | I was just asking my partner who's a writer if it would | even be fair to train a model based on a student at _N_ | th grade if the whole point is to measure growth. Would | there be enough "stylistic tokens" developed in a young | person's writing style? | AlexAndScripts wrote: | Surely you could continuously add data about their latest | essays to the model, meaning any gradual improvements | would be factored in? | ask_b123 wrote: | Personally, I feel mildly embarrassed when reading my | essays from years prior. And I probably still count as a | 'young person'. | | That said, there's no need to consider changes in years | when stylistic choices can change from one day to another | depending on one's mood, recent thoughts, relationship | with the teacher, etc. | | That's why I've always been a little confused about how | some (philologists?) treat certain ancient texts as not | being written by some authors due to the text's style, as | if ancient people could not significantly deviate from | their usual style. | Aransentin wrote: | > first day of class, write a N word essay | | Initially I thought you meant having the student write an | essay about slurs, as the AI will refuse to output anything | like that. Then I realized you meant "N" as in "Number of | words". | | Still, that first idea might actually work; make the | students write about hotwiring cars or something that's | controversial enough for the AI to ban it but not | controversial enough that anybody will actually care. | JumpCrisscross wrote: | > _first day of class, write a N word essay and sign a | release permitting this to be used to detect cheating_ | | Why once? Most students need writing skills more than half | the high-school curriculum. | TheDudeMan wrote: | You are asking teachers to be good at their job. But is | teaching a merit-based profession? | busyant wrote: | I asked chatgpt to write an essay as if it were written by a | mediocre 10th grader. It did a reasonably good job. It threw | in a little bit of slang and wasn't particularly formal. | | Edit. I sometimes tell my students "if you're going to cheat, | don't give yourself a perfect score, especially if you've | failed the first exam. It fires off alarm bells." | | But the students who struggle usually can't calibrate a non- | suspicious performance. | | I guess the same applies here. | Baeocystin wrote: | You've touched upon a central issue that is not often | addressed in these conversations. People who have | difficulty comprehending and composing essays also struggle | to work with repeated prompts in AI systems like ChatGPT to | reach a solution. I've found in practice that when showing | someone how prompting works, their understanding either | clicks instantly, or they fail to grasp it at all. There | appears to be very little in between. | geph2021 wrote: | ask their students to write things in class and get a feeling | for what those individual students are capable of. Then those | that turn in essays written at 10x their normal writing level | will be obvious | | I think that's a flawed approach. Plenty of people simply | don't perform or think well under imposed time-limited | situations. I believe I can write close to 10x better with | 10x the time. To be clear, I don't mean writing more, or a | longer essay, given more time. Personally, the hardest part | of writing is distilling your thoughts down to the most | succinct, cogent and engaging text. | deepspace wrote: | > Plenty of people simply don't perform or think well under | imposed time-limited situations | | From first-hand experience, the difference between poor | stress-related performance and a total lack of knowledge is | night and day. | | I have personally witnessed students who could not speak or | understand the simplest English, and were unable to come up | with two coherent sentences in a classroom situation, but | turned in graduate level essays. The difference is | blindingly obvious. | giovannibonetti wrote: | > I have personally witnessed students who could not | speak or understand the simplest English, and were unable | to come up with two coherent sentences in a classroom | situation, but turned in graduate level essays. The | difference is blindingly obvious. | | Maybe someone helped them with their homework? | remexre wrote: | Unless their in-class performance increases as well, | isn't that help "probably cheating"? (That's the "moral | benchmark" I'd use, at least; if your collaboration | resulted in you genuinely learning the material, it's | probably not cheating.) | runarberg wrote: | The point is for the teacher to get a sense of the students | style and capabilities. Even if your home essay is 10x | better and 10x more concise as your in class work, a good | teacher that knows you--unlike an inference model--will be | able to extrapolate and spot commonalities. Also a good | teacher (that isn't overworked) will also talk to students | and get a sense of their style and capabilities that way, | this allows them to extrapolate even better then a computer | could ever hope to. | zopa wrote: | Sure, but what about all the students with mediocre | and/or overworked teachers? If our plan assumes the best- | case scenario, we're going to have problems. | runarberg wrote: | Honestly if we can't have nice things and we keep | skimping out on education, I'd rather we just accept the | fact that some will students cheat, then to introduce | another subpar technical solution to a societal problem. | runarberg wrote: | So the computer's evaluation model assumed that each | student's learning is independent? That seems like a | ludicrous assumption to put in a model like this, unless the | model authors have never been in a class setting (which I | doubt). | munificent wrote: | I think you're misunderstanding the primary purpose of | essays. | | Teachers don't have the time to do deep critical reasoning | about each student's essay. An essay is only partially an | evaluation tool. | | The primary purpose of an essay is that the act of writing an | essay teaches the student critical reasoning and structured | thought. Essays would be an effective tool even if they | weren't graded at all. Just writing them is most of the | value. A big part of the reason they're graded at all is just | to force students to actually write them. | | The main problem with AI generated essays isn't that teachers | will lose out on the ability to evaluate their students. It's | that students won't do the work and learn the skills they get | from doing the work itself. | | It's like building a robot to do push ups for you. Not only | does the teacher no longer know how many push ups you can do, | you're no longer exercising your muscles. | YeGoblynQueenne wrote: | >> The primary purpose of an essay is that the act of | writing an essay teaches the student critical reasoning and | structured thought. Essays would be an effective tool even | if they weren't graded at all. Just writing them is most of | the value. A big part of the reason they're graded at all | is just to force students to actually write them. | | That's our problem, I think. Education keeps failing to | convince students of the need to be educated. | [deleted] | thelock85 wrote: | For this exact reason, I feel like education systems and | curriculum providers (teachers are just point of contact | from a requirements perspective) should develop much more | complex essay prompts and invite students to use AI tools | in crafting their responses. | | Then it's less about the predetermined structure (5 | paragraphs) and limited set of acceptable reasoning | (whatever is on the rubric), and more about using creative | and critical thinking to form novel and interesting | perspectives. | | I feel like this is what a lot of universities and | companies currently claim they want from HS and college | grads. | desro wrote: | This is what I'm doing as an instructor at some local | colleges. A lot of the students are completely unaware of | these tools, and I really want to make sure they have | some sense of how things are changing (inasmuch as any of | us can tell...) | | So I invite them to use chatGPT or whatever they like to | help generate ideas, think things out, or learn more. The | caveat is that they have to submit their chat transcript | along with the final product; they have to show their | work. | | I don't teach any high-stakes courses, so this won't work | for everyone. But educators are deluded if they think | anyone is served by pretending that (A) this | doesn't/shouldn't exist, and that (B) this and its | successors are going away. | | All of this stuff is going to change so much. It _might_ | be a bigger deal than the Internet. Time will tell. | nonrandomstring wrote: | A more likely outcome is that teachers will pay the price [1]. | | [1] https://www.timeshighereducation.com/opinion/ai-will- | replace... | | (turn off js to jump signup-wall) | ibejoeb wrote: | I think there is a more dystopian near future: | | 1. There will be commercial products to tune per-student | writing models. | | 2. Those models will be used to evaluate progress and | contribute directly to scores, grades, and rankings. They may | also serve to detect collaboration. | | 3. The models persist indefinitely and will be sold to industry | for all sorts of purposes, like hiring. | | 4. Thy will certainly be sold to the state for law enforcement | and identity cataloging. | e_i_pi_2 wrote: | I can't remember the keyword to look it up, but there's a | problem of statistics you run into with stuff like terrorism | detection algorithms | | If we have 300M people in the US and only 1k terrorists, then | you need 99.9999% accuracy before you start getting more true | positives than false positives. If you use this in a classroom | where no one is actually using AI you'll get false positives, | and in a class where the usage is average you'll still get more | false positives than true ones, which makes the test do more | harm than good unless it's just a reason to look into it more - | and the teacher is presumably already reading the text so if | that doesn't help than this surely won't | xmddmx wrote: | It's the False Positive Paradox: https://en.wikipedia.org/wik | i/Base_rate_fallacy#False_positi... | mitchdoogle wrote: | 4. Parents sue schools 5. Admins eliminate all writing | requirements | kilgnad wrote: | This isn't that dystopian. The dystopian outcome is when | there's a classifier that rates the quality of the text and | that this classifier becomes indistinguishable from the AI- | generated classifier because AI generated text is beginning to | be superior to human generated text. | thewataccount wrote: | Hopefully they just flag relevant sections. Essay/Plagiarism | checkers already exist, although in my experience professors | were reasonable. | | For example I had a paragraph or two get flagged as being very | similar to another paper - but both papers were about a fairly | niche topic (involving therapy animals) and we had both used | the relevant quotes from the study conclusions from one of only | a few decent sources at the time - so of course they were going | to be very similar. | | Given that most essays are about roughly the same set of | topics, and there are literally hundreds of thousands of | students writing these - I wonder how many variations are even | possible for humans to write as I would expect us to converge | on similar essays? | michaericalribo wrote: | Plagiarism is easier to verify, because you can directly | compare with the plagiarized source material | thewataccount wrote: | Absolutely. I think it may have to end up more as a | statistics thing with behaviour. For example: | | "Tom had a single paragraph flag as possibly generated" vs | "Every single paper Tom writes has paragraphs flag" | | Basically we might have to move to detecting statistical | outliers as cheating. Now whether the tools/teachers will | understand/actually do that - we can only hope.... | amelius wrote: | Solution: just write your texts with a bit less confidence than | gpt3 would. | Verdex wrote: | I wonder if I should help my kids setup a server + webcam + | screen capture tool so they can document 100% of their essay | writing experience. That way if they ever get hit with a false | positive they can just respond with hundreds of hours of video | evidence that shows them as the unique author of every essay | they've ever written. | anotherjesse wrote: | You will certainly have a lot of training video to create a | "essay writing video generator" ml product | causalmodels wrote: | You could always teach them how to use git and have them | commit frequently. Seems like it would be less intrusive than | a webcam. | Verdex wrote: | Source control would certainly help establish a history of | incrementally performing school work by _someone_ when | viewed by a highly technical examiner and when periodically | stored someplace where a trusted 3rd party can confirm it | wasn 't all generated the night after a supposed false | positive. | | However, hundreds of hours of video is compelling to non- | technical audiences and even more importantly is a | preponderance of evidence that's going to be particularly | damning if played in front of a PTA meeting. | | With a git history it's going to come down to who can spin | the better story. The video is the story and everyone | recognizes it, so I expect fewer people would bother even | challenging its authenticity. | causalmodels wrote: | I guess that's fair. I just personally don't think the | additional gain is worth taking away your child's | privacy. | Verdex wrote: | It's only taking away their privacy if they're falsely | accused. | | And properly used you might not even have to relinquish | privacy if falsely accused. A quick montage video demo | and a promise to show the full hundreds of hours of video | of "irrefutable" proof to embarrass the school district | at the next PTA meeting might be sufficient to get the | appropriate response. | tshaddox wrote: | You could still cheat quite easily and inexpensively with an | earpiece, as long as you know how to write down what you | hear. | Verdex wrote: | It's about building a narrative. Yeah, you could still | cheat, but who would go through the effort of generating | hundreds of hours of fake videos proving yourself innocent. | For that amount of effort you might as well have done the | work yourself. | | Of course there are some people who put insane amounts of | effort into not doing "real" work. However, anyone trying | to prove that your child is in that position is going to | find themselves in an uphill battle. | | Which is the ultimate goal here. Make people realize that | falsely accusing my children using dubious technology is | going to be a lot more work than just giving up and leaving | them alone. | claytonjy wrote: | Is there a longer-form paper on this yet? TPR (P(T|AI)) and FPR | (P(T|H)) are useful, but what I really want is the probability | that a piece flagged as AI-generated is indeed AI-generated, | i.e. P(AI|T). Per Bayes rule I'm missing P(AI), the portion of | the challenger set that was produced by AI. | | If we assume the challenger set is evenly split 50-50, that | means P(AI|T) = P(T|AI)P(AI)/P(T) = | (0.26)(0.5)/(0.26+0.09) ~ 37% | | So slightly better than a 1/3 chance of the flagged text | actually being AI-generated. | | They say the web-app uses a confidence threshold to keep the | FPR low, so maybe these numbers get a bit better, but very far | from being used as a detector anywhere it matters. | TchoBeer wrote: | >Per Bayes rule I'm missing P(AI), the portion of the | challenger set that was produced by AI | | This will obviously depend on your circumstances. | adamsmith143 wrote: | Or we realize that essays aren't that important and technical | skills will become more highly valued. Either way, ChatGPT | can't do your exams for you so the truth will come out anyway. | mitchdoogle wrote: | Writing is very important for understanding a topic and long- | term recall. I still remember topics from papers I did 15 | years ago because I spent 10s of hours researching and | writing and forming ideas about each topic. | | Instead of being overzealous about catching cheaters, | teachers should learn to express the importance of writing | and why it is done. Convince the students that they should do | it to be a smarter person, not just to get a grade, and they | will care more about doing it honestly. | flandish wrote: | In the same way deepfake video should not be allowed as | evidence, thereby ensuring _no_ video is allowed... we can | apply that to text as well. | | We're entering an uncanny valley before a period of "reset" | with self taught (to stay on subject here) people re-learning | for the sake of learning. | | In 30 years we will be in an educational renaissance of people | learning "like the old masters did in the 1900's." | EGreg wrote: | Nah. In 30 years it will be as useless to learn most subjects | as it is right now to learn crocheing and knitting, or | learning times tables or using an abacus. | | People are wayyyy too optimistic, just like in the 1900s they | thought people would have flying cars but not the Internet, | or how Star Trek's android Data is so limited and lame. | | Bots will be doing most of the work AND have the best lines | to say, AND make the best arguments in court etc. | | You don't even need to look to AI for that. The best | algorithms are simply uploaded to all the bots and they are | able to do 800 things, in superhuman ways, and have access to | the internet for whatever extra info they need. | | When they swarm, they'll easily outcompete any group of | humans. For example they can enter this HN thread and | overwhelm it with arguments. | | No, the old masters were _needed_. Studying will not be. The | Eloi and Morlocks is closer to what we can expect. | flandish wrote: | As someone who's known how to crochet and knit since he as | 6... I disagree. | tokai wrote: | Apparently knitwear is forecasted to have a CAGR of 12% the | rest of the decade. With hand knitted garments commanding | the high prices. It's definitely not the worst cottage | industry one can chose. | la64710 wrote: | Exactly IMHO it is irresponsible to release such classifier | with a title that touts the desired feature and totally do not | spell its limitations. At least precede such title with | experimental or something. | anonobviously wrote: | This is extremely concerning. | | The co-author on this is includes Professor Scott Aaronson. | Reading his blog Shtetl-Optimized and reading his | [sad/unfortunate/debate-able/correct?/factual?/biased?] views | on adverse/collateral harm to Palestinians civilians makes me | question whether this model would fully consider collateral | damage and harm to innocent civilians, whomever that subgroup | might be. What if his model works well, except for some | minority groups' languages which might reflect OpenAI speak? | Does it matter if the model is 99.9% accurate if the 0.1% is | always one particular minority group that has a specific | dialect or phrasing style? Who monitors it? Who guards these | guards? | jameshart wrote: | We can't release the essay writing language model. Lazy | children will use it to write their essays for them! | | We can't release the ai-generated text detection model. Lazy | teachers will use it to falsely accuse children of cheating! | | The problem here appears to be _lazy people_. | | Can we train an AI to detect lazy people? I promise not to | lazily rely on it without thinking. | jupp0r wrote: | This is worse than useless, if taking base rate fallacy into | account. | optimalsolver wrote: | https://en.wikipedia.org/wiki/Red_Queen's_race | dxbydt wrote: | There was a merchant who said - Buy my sword! It will pierce | through any shield !! | | So the gullible people bought the swords and soon the merchant | ran out of swords to sell. | | So the merchant said - Buy my shield! They can defend against any | sword !! | | Once again the gullible people rushed to buy the shields. | | But one curious onlooker asked - what happens when your sword | meets your shield? | causalmodels wrote: | My younger brother and I both have fairly severe dyslexia. He's | been applying to school and has been using ChatGPT to help him | correct spelling and grammar mistakes rather than going to a | person for help. It has been fairly incredible for him. | | I wonder if this tool would start flagging his work even though | he is only using it as a fancy spell checker. | meetingthrower wrote: | Lol, just tried it against several 500 word chunks of text I had | the old GPT3 write for me and it classified them as "unlikely AI | written." Maybe because I had very specific prompts which could | include a lot of actual facts...? | barbazoo wrote: | > The classifier is very unreliable on short texts (below 1,000 | characters). Even longer texts are sometimes incorrectly | labeled by the classifier. | neonate wrote: | How does https://openai-openai-detector.hf.space/ do on them? | meetingthrower wrote: | 99.4% real!!! | groestl wrote: | My new hobby, based on the responses I read from ChatGPT, is to | get a "likely written by AI" rating from these classifiers. | | "However, this is just one example of a humorous summary and it | is important to note that..." and so on and so on | andrewmutz wrote: | OpenAI should release a classifier that detects _their own_ AI- | generated text. They could do this easily by just using | steganography to hide some information in all text that they | generate, and then build the classifier to look for it. | | Sure, it's less useful than a classifier that can detect any AI | generated text, but it would be a nice tool for contexts where AI | generated text can be abused (like the classroom) in the short | term. | ineedtocall wrote: | Or they could just save/hash results and get rid of the | classifier all together. | rcme wrote: | Yea, they could provide a fingerprinting algorithm and a | database of every fingerprint they've generated. However, it | wouldn't help you identify false-positives. | sebzim4500 wrote: | Scott Aaronson talks about something like that being done at | OpenAI in this post | | https://scottaaronson.blog/?p=6823 | m3affan wrote: | There is work on hidden signatures in generated text, invisible | to humans. Only way to move forward. | bnug wrote: | I'd think people would migrate to just re-typing whatever was | generated and change some wording along the way to prevent | detection. | thewataccount wrote: | The problem with this will be the method to detect the | signature would reveal how to hide the signature though | right? | | Obviously not an issue if everyone uses a single API for it - | but if this ends up like Stable Diffusion were anyone can run | it locally then I don't think it's possible no? | brink wrote: | I miss the 90's and the early 00's. Take me away from this AI | hell. | [deleted] | shagie wrote: | Musicians Wage War Against Evil Robots - | https://www.smithsonianmag.com/history/musicians-wage-war-ag... | | From the March, 1931 issue of Modern Mechanix magazine: | | > The time is coming fast when the only living thing around a | motion picture house will be the person who sells you your | ticket. Everything else will be mechanical. Canned drama, | canned music, canned vaudeville. We think the public will tire | of mechanical music and will want the real thing. We are not | against scientific development of any kind, but it must not | come at the expense of art. We are not opposing industrial | progress. We are not even opposing mechanical music except | where it is used as a profiteering instrument for artistic | debasement. | sekai wrote: | > Take me away from this AI hell | | People used to say that about electricity too, and cars, and | planes, and computers. This is just the next step in the chain. | tgv wrote: | So your message is: bend over? | GMoromisato wrote: | There are only two choices: | | 1. Try to stop the world from changing. | | 2. Adapt to the changes (which requires changing the | world). E.g., the dangers of electricity led to electrical | codes and licensing for electricians. | [deleted] | anshumankmr wrote: | What I would love to see in GPT 3 is some sort of a confidence | score that they could return, as in how sure their model is that | what it returned is accurate and not gibberish. Could this | classifier help with that? I am working on a requirement where we | are using ElasticSearch to map a query to an article in a | knowledge base and then the plan is to send it to GPT 3 to help | summarize the article. | | Since the ElasticSearch integration is still WIP, I had made a | POC to scrape the knowledge base (with mixed results, lots of the | content is poorly organized, so the scraped content that would | act as prompt to the GPT 3 model wasn't all that good either) and | then feed it to GPT 3, but the it couldn't always give the most | accurate answers on that. The answers sometimes were spot on, or | quite good but other times, not so much. I would say about 30% of | the time, it made sense. So if there was a way for me to get if | answer was sensible or not, so we could give an error response if | the GPT 3's response did not make sense. | | The reason why we are doing it cause the client has a huge | knowledge base and mapping each question to an answer would be | difficult for them. | ilaksh wrote: | OpenAI's text completion has an option to return "log | probability" or something for each token. That might apply. You | can also turn down the temperature parameter which reduces | hallucinations to some degree. | antirez wrote: | Totally useless given it's really inaccurate, and acrively | dangerous as people will be considered not producing stuff they | actually produced, in case of false positives: | | https://twitter.com/antirez/status/1620494358947717120 | kypro wrote: | The existence of this tool might actually do more damage if | people are using with any level of confidence to check text | content as important as exams. I understand why they felt the | need to release something, but I think it would be better if this | didn't exist. | | My guess is that it's very easily gamed. Something ChatGPT is | very good at is producing text content in different styles so if | you're a student and you run your text through a AI detector you | can always ask ChatGPT to write it in a style which is more | likely to pass detection. | | Finally, I wouldn't be surprised if this detector is mostly just | detecting grammatical and spelling mistakes. It's obvious I'm a | human given how awful I am at writing, but I wouldn't be | surprised if a good write who uses very good grammar, has good | sentence structure and who's writing looks a little bit too | "perfect" might end up triggering the detector more often. | tinglymintyfrsh wrote: | Meta blocked access to staff from using DALLE2 or ChatGPT from | their work Google accounts. | | Another possibility is AI generative "stenography" where rules | exist to insert hidden meaning or hidden data. | shagie wrote: | > Meta blocked access to staff from using DALLE2 or ChatGPT | from their work Google accounts. | | I'm trying to reason this one out. | | Does Meta have work Google accounts? How would Meta block | someone from using a work account to auth to some other | service? | | Are people working at Meta signing into OpenAI with a Google | account? | | (seriously, if it isn't work related - don't use a work | account) | | Is Meta concerned about people uploading their code (or | downloading code) from ChatGPT? What is their policy on | Copilot? | | Why are Meta people using DALLE while at work? | rafaelero wrote: | 26% of true positive and 9% of false positive is just terrible. I | don't see how this can be usable. | yboris wrote: | Quote: | | > In our evaluations on a "challenge set" of English texts | | I wonder if they mean "challenge" in the sense that these are | some of the hardest-to-discern passages. Meaning that with | average human writing / average type of text, the % is better. | I'm unsure. | [deleted] | PUSH_AX wrote: | Might be better to store outputs and implement a way to detect it | within a larger piece of text. Think like a reverse Shazam but | for text. | blueberrychpstx wrote: | Doesn't this get us into a sort of perpetual motion machine with | the back and forth being | | 1) generate paragraph of my essay 2) feed it into this classifier | 3a) if AI -> make it sound more human 3b) if human -> $$$ Profit? | | Obviously it could be more fine tuned than this and is in general | good to know, but I just love watching this game play out of ... | errr how do we manage the fact that humans are relatively less | and less creative compared to their counterparts. | dakiol wrote: | The thing is point 1 costs money (I imagine at some point, | ChatGPT will cost money), but point 2 also will cost money. So | OpenAI will charge you double for generate AI-written text that | is undetectable. Poor move. I could happily pay a lot for | ChatGPT, but if they also commercialize a (more accurate) | classifier then I won't use ChatGPT at all. | bluefone wrote: | What's the objective of tests in education? to test human | biological abilities of memorizing and analyzing? But in real | life, these abilities are always augmented by tech. It is like | testing your prospective employees for their running skills, | though they don't really need running to commute. | karaterobot wrote: | Given a set of determinations about whether a given source text | was written by an AI or not, how do we know whether those | classifications were made by an AI or a human? We need to train | an AI classifier classifier, pronto! | happytiger wrote: | 9% false positives? That's a troubling level of falsies. | | The implications of using this tool are fun to think about | though. | | If it had a very low level of false positives, but wasn't very | good at identifying ai text, it would be very useful. | | But false positive rates above very, very low levels will | undermine any tool in this category. | klabb3 wrote: | Yeah it's useless currently and will become more useless | quickly, because people will scramble AI generated text, mix in | human edits and people who use AI generators a lot will mimic | their writing style. In short, the SNR will be abysmal outside | of controlled environments. | | Im pretty sure the smart people at OpenAI know this. I think | this is a PR move signaling that they are "doing something", | looking concerned, yet insisting that everything is under | control. In reality, nobody can predict the societal rift that | this will cause, so this corporate playbook messaging is | dishonest in spirit and muddies the waters. This is bad, both | long term for OpenAI's trust, but also because muddy waters | makes it harder to have fruitful discussions about safeguards | in commercial deployments of this tech. | | That said, they're incorrectly getting blamed for controlling | the _use_ of this tech, they're no more than a prolific and | representative champion of it. But the cat is out of the bag, | and they absolutely cannot stop this train, and so they | shouldn't be blamed for not trying. | IncRnd wrote: | I think the main takeaway from this is that both the AI | classifier and the AI output come from the same company, | openai.com. | | That likely explains the extremely low accuracy of the AI | classifer. This is user training for chatgpt output being | accepted as human authored text. | maxehmookau wrote: | 26% seems awfully low for a tool of this importance. Granted they | are upfront about it, but still, it doesn't seem immediately | useful to release it to the public. | nothrowaways wrote: | Sha is all you need LoL. | | Sha every generated text and do a bloom filter... I guarantee | much better than 27%. | klabb3 wrote: | A cryptographic hash function on stochastically generated | variable-length natural language? That sounds.. not very | effective. | TDiblik wrote: | I mean it depends, is number of possible answears per prompt | known? If so, can we even realistically calculate number of | possible prompts? AFAIK ChatGPT answears even tho there is a | gramatical mistake in your sentence, does that affect | answears/is that considered a new prompt? Ok, let's say you | feed all N possible answears in and make a rainbow table of | hashes. Sha is basically random (not really, but lets not go | there), after I generate my text using AI (which would get | flagged by your detection system) and change few letters/words | here and there, your whole Sha rainbow table becomes useless - | right? I could be totally wrong, but I don't see Sha as a way | to solve this problem, because of these complications :/ | oldstrangers wrote: | Funny, I helped ChatGPT write a fake scientific article the other | day for a project I made (https://solipsismwow.com/). It's | result: The classifier considers the text to be unlikely AI- | generated. | knaik94 wrote: | I fed it some of my old HN comments, comfortably longer than the | 1000 character minimum and found 9/10 times the classifier marked | it as "unclear". This was a comment from 2020. | | I found out that just repeating a sentence a few times causes it | to classify something as "likely". This is not only an unwinnable | race, I know for a fact some teachers will use anything above | unlikely to mean used AI. At some point in the future, compute | will be cheap enough to where a lot of online content will be put | through a similar classifier. I am curious to know how | conservative the estimates are. This was a non-technical comment, | I wonder if a more technical comment would be even more likely to | be misclassified. | bhouston wrote: | I used ChatGPT to rewrite a number of paragraphs of my own | writing earlier today. It rewrote them completely. I just pasted | those into this detection tool and it responded for both "The | classifier considers the text to be unlikely AI-generated." | | So it can not detect AI re-written/augmented text it seems, even | things that ChatGPT itself generates. | mitchdoogle wrote: | Well OpenAI admits it is wrong most of the time, so your | results are consistent with what is expected | m3kw9 wrote: | I mean the way they could do it is to save all model outputs and | then let users input and match that would guarantee. You could | make changes but it'd match a high percentage. Of course, a | student can test it himself to make sure to change the text so it | will be < a certain threshold. | | Also ignore prompts that purposefully output a pre written text | incase ppl want to mess with the system | netsec_burn wrote: | Couldn't it just use conversation history, where it already | stores the responses, and search within that? | hkalbasi wrote: | Open AI knows every text that is generated by ChatGPT, so it can | run a simple search algorithm instead of an AI model and achieve | way higher true positive rate? | cloudking wrote: | Let's assume this tool works better in the future and is used in | education, what are the next steps for a teacher after | identifying AI written homework? | macksd wrote: | _Allegedly_ identifying AI written homework. | antihero wrote: | Use AI to write a letter to their parents | amelius wrote: | Next step is to stop worrying about it, just as they did with | automated spelling correction. | jupp0r wrote: | I can now go and incorporate this detector into the retraining | pipeline for my evil language model or put it at the end of my | architecture to emit only human-like results (as labeled by the | detector). I don't see how detectors can win this cat and mouse | game. | bioemerl wrote: | Now they get to monetize Chat GPT and this new classifier. | Starting fires and providing the extinguishers, charging for both | of them. | | All while pretending to be morally responsible in order to do it. | sharemywin wrote: | They did say big tech was starting to take over the role of | government. | titaniumtown wrote: | What does this have to do with the government? | urbandw311er wrote: | I can see the point the parent comment is trying to make. | The applications of this classifier include potentially | arbitrating in decisions relating to things like education | (ie assessment of grades) which is a matter traditionally | associated with the public sector. | [deleted] | dakiol wrote: | No way. If I were a student trying to use ChatGPT in order to | improve my writing, I would definitely not pay for it if I know | my teachers are using their AI Classifier. I mean, what's the | point? I don't think OpenAI will be able to reach that (big) | chunk of potential customers that want to use ChatGPT to write | essays, social media comments, etc. if OpenAI at the same time | sells their classifier. It's just nuts. | Balgair wrote: | If you're in a STEM-y major, now is _the_ time to pick up an | essay heavy humanities degree. If you 're in an essay heavy | humanities degree, now is the time to pick up a few more. | | Think of it like this: How much is your degree costing | you/your family? | | On average, it's ~$150k. | | How much would an extra degree cost you? How about 80% of an | extra degree? How about 20%? How about all those books and | course materials? Those are in the $1000s already, per | degree. (and yes, we all have head of torrents). | | What I'm saying is that chatGPT can easily be seen as 'just | another college cost'. And when it's 'for education', the | justification for those costs gets a lot more flexible. I can | see students spitting out ~$10,000 for something like chatGPT | that is specific towards their major, will pass these | classifiers, and gets you just ~25% of the way to your major | (however that is defined). The cost 'for the masses' could | easily be in the ~$1000s for a per class subscription. | | With ~20M college students in the US, assuming even a 10% | uptake rate, you're in the _billions_ of dollars of nearly | pure profit (the overhead would be negligible). | | The money potential of something like chatGPT is just too | damn high. Too high for essays to ever go out of style, as | the lobbying effect of companies like this will force | colleges to keep these essays that they are making the money | off of. Oh, any they'll sell the classifier to the colleges | to. Arming both sides! | eric-hu wrote: | You make a well reasoned argument here. At the same time, | respectfully, you may be too intelligent to be the target | audience for the student service. Can you see a college | version of yourself paying $500 to write a college essay for | yourself today? | caxco93 wrote: | Could a newer language model use this to penalize output that | fails the classifier during training? | twayt wrote: | Its a great way to swindle overzealous educators. The kind that | do hugely unproductive things to students because they think that | its in their best interests. | gillesjacobs wrote: | I found a great way to fool these detectors: piping output | generative models. | | 1. Generate text by promoting ChatGPT. | | 2. Rewrite / copyedit with Wordtune [1], InstaText [2] or Jasper. | | This fools GPTZero [4] consistently. | | Of course soon these emotive, genre or communication style | specialisations will be promptable too by a single model too. | Detectors will be integrated as adversarial agents in training. | There is no stopping generative text tooling, better adopt and | integrated it fully into education and work. | | 1. https://www.wordtune.com/ | | 2. https://instatext.io/ | | 3. https://www.jasper.ai/ | | 4. https://gptzero.me/ | kriro wrote: | I'd rather try to empower students to use ChatGPT as a tool or | incorporate it into class work than worry about cheating. This is | a pretty unique time for teachers to step up and give their | students a nice edge in life by teaching them how to become early | adopters for these kinds of things. | discreteevent wrote: | The purpose of writing an essay is to teach students how to | think. Being able to prompt is a subset of being able to think. | If you only teach them to prompt you have taken away any edge | they might have had. Its like those schools that think that | getting more ipads will make the kids smarter. | janalsncm wrote: | I posted yesterday about how GPTZero was such a horrible idea, | and now this nightmare. Detecting AI generated text is | _impossible_ without knowing what model was used. It could be | more feasible for them to detect text written by their own models | given that they know the logits. But OpenAI doesn't have a | monopoly on large language models. | | However, the consequences for false positives are so dire that I | would never want to create such a tool. It will be misused, and | hiding behind "it's just information" is no excuse. You don't | admit testimony unless it's reliable. | | At one time I thought that OpenAI was out to make the world a | better place. But now it's clear to me that ethics is the last | thing on their mind. | brap wrote: | Wouldn't better classifiers (discriminators) necessarily lead to | better generators that can trick them? | alphabetting wrote: | Tools like this are promising and needed but this still needs | work. I gave it two sets of 100% AI generated text. It said | possibly for one and very unlikely for the other. Very unlikely | example here: https://i.imgur.com/XoFQuYE.png | https://i.imgur.com/PwGzTBM.png | dqpb wrote: | The LLM watermark seems like a better approach. | | https://arxiv.org/abs/2301.10226 | mlsu wrote: | This one is very cool. Steps are: | | - Generate seed of LLM output token t0 - Use the seed to mark | output tokens into "red" and "green" list - For token t1, only | sample from "green" list when producing the next token | | Repeat. | | Now, let's say you read a comment online and you want to see if | it's written by a robot or not. It's 20 tokens long. For each | token, you reconstruct the blacklist. If they use "red" words | with 50% probability, you can safely assume that they are | human. But if they use only green words, you can begin to | assume that they're a bot very quickly. | | For simplicity's sake, if you mark half of the tokens as "red" | for each new token, correctly writing 20 tokens in a row that | are on the "green" list is like flipping a coin and getting | heads 20 times in row -- vanishingly unlikely. This allows you | to very robustly watermark even short passages. And if the | human makes adversarial edits, they still have to fight that | probability distribution; 19 heads and 1 tails is still | vanishingly unlikely. ___________________________________________________________________ (page generated 2023-01-31 23:00 UTC)