[HN Gopher] GPT-4 API General Availability ___________________________________________________________________ GPT-4 API General Availability Author : mfiguiere Score : 426 points Date : 2023-07-06 19:03 UTC (3 hours ago) (HTM) web link (openai.com) (TXT) w3m dump (openai.com) | boredemployee wrote: | We need a proper, competitive and open source model. Otherwise we | are all fucked up. | hackerting wrote: | This is awesome news. I have been waiting to get GPT4 forever! | m3kw9 wrote: | "Developers wishing to continue using their fine-tuned models | beyond January 4, 2024 will need to fine-tune replacements atop | the new base GPT-3 models (ada-002, babbage-002, curie-002, | davinci-002), or newer models (gpt-3.5-turbo, gpt-4)." | | So need to pay to fine tune again? | saliagato wrote: | Probably. They will have different prices to finetune too. | ftxbro wrote: | I just want to emphasize in this comment that if you upgrade now | to paid API access, then you won't get GPT-4 API access for like | another month. | superalignment wrote: | With this is the death of any uncensored usage of their models. | Davinci 3 is the most powerful model where you can generate any | content by instructing it via the completions API - chat GPT 3 | models will not obey requests for censored or adult content. | echelon wrote: | A big enough hole presents a wedge for new entrants to get | started. | | OpenAI will never fulfill the entire market, and their moat is | in danger with every other company that has LLM cash flow. | | They want to become the AWS of AI, but it's becoming clear | they'll lose generative multimedia. They may see the LLM space | become a race to the bottom as well. | projectileboy wrote: | Relevant comment thread from people describing how much worse | GPT-4 has gotten lately: | https://www.reddit.com/r/ChatGPT/comments/14ruui2/i_use_chat... | renewiltord wrote: | Has anyone been able to come up with a way to keep track of GPT-4 | performance over time? I'm told that the API is explicit about | changes to models and that the Chat interface is not. | crancher wrote: | API call responsiveness to the GPT-4 model varies hugely | throughout the day. The #1 datapoint in measured responsiveness | is slowdown associated with lunch-time use as noon sweeps | around the globe. | renewiltord wrote: | Thank you for the response, I should have been clearer. I | meant performance as an LLM. Essentially, I am concerned that | they are quietly nerfing the tool. The Chat interface is now | very verbose and constantly warning me about "we should | always do this and that" which is bloody exasperating when | I'm just trying to get things done. | | I made up an example here to illustrate, but it's just very | annoying because sometimes it puts at the beginning, slowing | down my interaction, and it now refuses to obey my prompts to | leave caveats out. | | https://chat.openai.com/share/1f39af02-331d-4901-970f-2f4b0e. | .. | purplecats wrote: | yeah, its annoying and you have to foot the bill for it. | | looking at your sample and using character count as a rough | proxy for tokens, (465/(1581-465))*100 means they added | ~42% token count cost to your response explicitly adding | caveats which you dont want. fun! | furyofantares wrote: | Not a lot of talk of Whisper being available here. | | From using voice in the ChatGPT iOS app, I surmise that Whisper | is very good at working out what you've actually said. | | But it's really annoying to have to say my whole bit before | getting any feedback about what it's gonna think I said. Even if | it's getting it right at an impressive rate. | | Given this is how OpenAI themselves use it (say your whole thing | before getting feedback), I don't know that the API is set up to | be able to mitigate that at all, but it would be really nice to | have something closer to the responsiveness of on-device | dictation with the quality of Whisper. | jxy wrote: | You can run whisper.cpp locally in real time: | https://github.com/ggerganov/whisper.cpp/tree/master/example... | ProllyInfamous wrote: | My M2 Pro (mac mini) will run Whisper much faster than "real | time." | | Pretty crazy stuff -- perfectly understandable translations. | leodriesch wrote: | I'm interested in how the transformer based speech recognition | from iOS 17 will perform compared to Whisper. I guess it will | work more "real-time" like the current dictation on iOS/macOS, | but I'm unsure as I am not on the beta right now. | RC_ITR wrote: | My guess is the reason that apple invested so heavily in this | [0] is because they are going to train a big transformer in | their datacenter and apply it as an RNN on your phone. | | Superficially, I think this will work very well, but | _slightly_ worse than whisper (with the advantage ofc being | that its better at real-time transcription). | | [0]https://machinelearning.apple.com/research/attention-free- | tr... | ycombinatornews wrote: | Echoing this - saying the whole text at once in one shot is | very challenging for long batches of text. | | Using built-in text input showed quite good results since | ChatGPT is still understanding the ask quite well | michaelmu wrote: | One speculative thought about the purpose of Whisper is that | this will help unlock additional high-quality training data | that's only available in audio/video format. | oth001 wrote: | F | tin7in wrote: | The difference between 4 and 3.5 is really big for creative use | cases. I am running an app with significant traffic and the | retention of users on GPT-4 is much higher. | | Unfortunately it's still too expensive and the completion speed | is not as high as GPT-3.5 but I hope both problems will improve | over time. | brolumir wrote: | Hmm, when I try to change model name to "gpt-4" I get the "The | model: `gpt-4` does not exist" error message. We are an API | developer with a history of successful payments.. is there | anything we need to do on our side to enable this, anyone know? | saliagato wrote: | wait a couple of hours | cube2222 wrote: | This is very nice. | | GPT-4 is on a completely different level of consistency and | actually listening to your system prompt than chagpt-3.5. It | trails off much more rarely. | | If only it wasn't so slow/expensive... (it really starts to hurt | with large token counts). | BeefySwain wrote: | Outside of the headline, there is some major stuff hiding in | here: - new gpt-3.5-turbo-instruct model expected "in the coming | weeks" - fine tuning of 3.5 and 4 expected this year | | I am especially interested in gpt-3.5-turbo-instruct, as I think | that the hype surrounding ChatGPT and "conversational LLMs" has | sucked a lot of air out of what is possible with general instruct | models. Being able to fine tune it will be phenomenal as well. | MuffinFlavored wrote: | is there any ETA on when the knowledge cutoff date will be | improved from September, 2021? | | I do not really understand the efforts that went on behind the | scenes to train GPT models on factual data. Did humans have to | hand approve/decline responses to increase its score? | | "America is 49 states" - decline | | "America is 50 states" - approve | | Is this how it worked at a simple overview? Do we know if they | are working on adding the rest of 2021, then 2022, and | eventually 2023? I know it can crawl the web with the Bing | addon but, it's not the same. | | I asked it about Maya Kowalski the other day. Sure it can | condense a blog post or two, but it's not the same as having | the intricacies as if it actually was trained/knew about the | topic. | asadotzler wrote: | Why is chatGPT on the web a 6 weeks old version still? | alpark3 wrote: | >Developers wishing to continue using their fine-tuned models | beyond January 4, 2024 will need to fine-tune replacements atop | the new base GPT-3 models (ada-002, babbage-002, curie-002, | davinci-002), or newer models (gpt-3.5-turbo, gpt-4). Once this | feature is available later this year, we will give priority | access to GPT-3.5 Turbo and GPT-4 fine-tuning to users who | previously fine-tuned older models. We acknowledge that migrating | off of models that are fine-tuned on your own data is | challenging. We will be providing support to users who previously | fine-tuned models to make this transition as smooth as possible. | | Wait, they're not letting you use your own fine-tuned models | anymore? So anybody who paid for a fine-tuned model is just | forced to repay the training tokens to fine-tune on top of the | new censored models? Maybe I'm misunderstanding it. | meghan_rain wrote: | not your weights, not your bitcoins | fnordpiglet wrote: | If you don't own the weights you don't own anything. This is | why open models are so crucial. I don't understand any business | who is building fine tuned models against closed models. | reaperman wrote: | Right now the closed models are incredibly higher quality | than the open models. They're useful as a stopgap for 1-2 | years in hopes/expectation of open models reaching a point | where they can be swapped in. It burns cash now, but in | exchange you can grab more market share sooner while you're | stuck using the expensive but high quality OpenAI models. | | It's not cost-effective, but it may be part of a valid | business plan. | ronsor wrote: | If you're finetuning your own model, the closed models | being "incredibly higher quality" is probably less | relevant. | claytonjy wrote: | That's how we all want it to work, but the reality today | is that GPT-4 is better at almost anything than a fine- | tuned version of any other model. | | It's somewhat rare to have a task and good enough dataset | that you can finetune something else to be close enough | in quality to GPT-4 for your task. | wongarsu wrote: | Finetuning a better model still yields better results | than finetuning a worse model. | fnordpiglet wrote: | That should be a wake up call to every corporation pinning | their business on OAI models. My experience thus far is no | one is seeing a need to plan an exit from OAI, and the | perception is "AI is magic and we aren't magicians." There | needs to be a concerted effort to finance and tune high | quality freely available models and tool chains asap. | | That said I think efficiencies will dramatically improve | over the next few years and over investing now probably | captures very little value beyond building internal | _competency_ - which doesn't grow with anything but time | and practice. The longer you depend on OAI, the longer you | will depend on OAI past your point of profound regret. | r3trohack3r wrote: | > I don't understand any business who is building fine tuned | models against closed models | | Do you have any recommendations for good open models that | businesses could use today? | | From what I've seen in the space, I suspect businesses are | building fine tuned models against closed models because | those are the only viable models to build a business model on | top of. The quality of open models isn't competitive. | yieldcrv wrote: | > I don't understand any business who is building fine tuned | models against closed models. | | Just sell access at a higher price than you get it | | Either directly, on _on average_ based on your user stories | flangola7 wrote: | They address that, OpenAI will cover the cost of re-training on | the new models, and the old models don't discontinue until next | year. | simonw wrote: | Did they say they would cover the cost of fine-tuning again? | I saw them say they would cover the cost of recalculating | embeddings, but I didn't see the bit about fine-tuning costs. | | On fine-tuning: | | > We will be providing support to users who previously fine- | tuned models to make this transition as smooth as possible. | | On embeddings: | | > We will cover the financial cost of users re-embedding | content with these new models. | BoorishBears wrote: | That's because fine-tuning the new models isn't available | yet. | | Based on the language it sounds like they'll do the same | when that launches. | jxy wrote: | They didn't mention gpt-4-32k. Does anybody know if it will be | generally available in the same timeframe? | | There's still no news about the multi-modal gpt-4. I guess the | image input is just too expensive to run or it's actually not as | great as they hyped it. | jacksavage wrote: | > We are not currently granting access to GPT-4-32K API at this | time, but it will be made available at a later date. | | https://help.openai.com/en/articles/7102672-how-can-i-access... | jxy wrote: | Thanks for the link. | | The decision of burying these extra information in a support | article, not cool! | we_never_see_it wrote: | It's funny how OpenAI just shattered Google's PR stunts. Google | wanted everyone to believe they are leading in AI by winning some | children's games. Everyone thought that was the peak of AI. Enter | OpenAI and Micorsoft. Microsoft and OpenAI have showed the | humanity what true AI looks like. Like most people on HN I cannot | wait to see the end of Google, the end of evil. | LeafItAlone wrote: | Is Microsoft less evil than Google? | rvz wrote: | > Like most people on HN I cannot wait to see the end of | Google, the end of evil. | | What is the difference? Replacing evil with another evil. | | This is just behemoths exchanging hands. | khazhoux wrote: | In all my GPT-4 API (python) experiments, it takes 15-20 seconds | to get a full response from server, which basically kills every | idea I've tried hacking up because it just runs so slowly. | | Has anyone fared better? I might be doing something wrong but I | can't see what that could possibly be. | jason_zig wrote: | Run it in the background. | | We use it to generate automatic insights from survey data at a | weekly cadence for Zigpoll (https://www.zigpoll.com). This | makes getting an instant response unnecessary but still | provides a lot of value to our customers. | jondwillis wrote: | Streaming. If you're expecting structured data as a response, | request YAML or JSONL so you can progressively parse it. Time | to first byte can be milliseconds instead of 15-20s. Obviously, | this technique can only work for certain things, but I found | that it was possible for everything I tried. | ianhawes wrote: | Anthropic Instant is the best LLM if you're looking for speed. | superkuh wrote: | Yikes. They're actually killing off text-davinci-003. RIP to the | most capable remaining model and RIP to all text completion style | freedom. Now it's censored/aligned chat or instruct models with | arbitrary input metaphor limits for everything. gpt3.5-turbo is | terrible in comparison. | | This will end my usage of openai for most things. I doubt my | $5-$10 API payments per month will matter. This just lights more | of a fire under me to get the 65B llama models working locally. | system2 wrote: | I built my entire app on text-davinci-003. It is the best | writer so far. Do you think gpt3.5 turbo instruct won't be the | same? | Karrot_Kream wrote: | I wonder if there's some element of face-saving here to avoid a | lawsuit that may come from someone that uses the model to | perform negative actions. In general I've found that | gpt3.5-turbo is better than text-davinci-003 in most cases, but | I agree, it's quite sad that they're getting rid of the | unaligned/censored model. | bravura wrote: | I've never used text-davinci-003 much. Why do you like it so | much? What does it offer that the other models don't? | | What are funs things we can with it until it sunsets on January | 4, 2024? | thomasfromcdnjs wrote: | The Chat-GPT models are all pre-prompted and pre-aligned. If | you work with davinci-003, it will never say things like, "I | am an OpenAI bot and am unable to work with your unethical | request" | | When using davinci the onus is on you to construct prompts | (memories) which is fun and powerful. | | ==== | | 97% of API usage might be because of ChatGPT's general appeal | to the world. But I think they will be losing a part of the | hacker/builder ethos if they drop things like davinci-003, | which might suck for them in the long run. Consumers over | developers. | Fyrezerk wrote: | The hacker/builder ethos doesn't matter in the grand scheme | of commercialization. | Robotbeat wrote: | It matters immensely in the early days and is the basis | for all growth that follows. So cutting it off early cuts | off future growth. | [deleted] | H8crilA wrote: | The $5-$10 is probably the reason why they're killing those | endpoints. | superkuh wrote: | I don't get it? text-davinci-003 is the most expensive model | per token. It's just that running IRC bots isn't exactly high | volume. | stavros wrote: | "Most expensive" doesn't mean "highest margin", though. | samstave wrote: | Please ELI5 if I am mis-interpretating what you said: | | * _" They have just locked down access to a model which they | basically realized was way more valuable than even they thought | - and they are in the process of locking in all controls around | exploiting the model for great justice?"*_ | ftxbro wrote: | > "Starting today, all paying API customers have access to | GPT-4." | | OK maybe I'm stupid but I am a paying OpenAI API customer and I | don't have it yet. I see: gpt-3.5-turbo-16k | gpt-3.5-turbo gpt-3.5-turbo-16k-0613 | gpt-3.5-turbo-0613 gpt-3.5-turbo-0301 | | I don't see any gpt-4 | | Edit: Probably my problem is that I upgraded to paid API account | within the last month, so I'm not technically a "paying API | customer" yet according to the accounting definitions. | codazoda wrote: | > Today all existing API developers with a history of | successful payments can access the GPT-4 API with 8K context. | We plan to open up access to new developers by the end of this | month, and then start raising rate-limits after that depending | on compute availability. | | Same for me. I signed up only a few days ago and was excited to | switch to "gpt-4" but I haven't paid the first bill (save the | $5 capture) so I probably have to continue to wait for this. | | I made a very simple command-line tool that calls the API. You | run something like: > ask "What's the | opposite of false?" | | https://github.com/codazoda/askai | stavros wrote: | Interesting, I did exactly the same (with the same name), but | with GPT-4 support as well: | | https://www.pastery.net/ccvjrh/ | | It also does streaming, so it live-prints the response as it | comes. | zzzzzzzza wrote: | can't speak for others but I have two accounts | | 1. chat subscription only | | 2. i have paid for api calls but don't have a subscription | | and only #2 currently has gpt4 available in the playground | [deleted] | pomber wrote: | If anyone wants to try the API for the first time, I've made this | guide recently: https://gpt.pomb.us/ | nextworddev wrote: | GPT-4 fine tuning capability will be huge. It may end up just | making fine tuning OSS LLMs pointless, esp if they keep lowering | GPT-4 costs like they have been. | Imnimo wrote: | I know everyone's on text-embedding-ada-002, so these particular | embedding deprecations don't really matter, but I feel like if I | were using embeddings at scale, the possibility that I would one | day lose access to my embedding model would terrify me. You'd | have to pay to re-embed your entire knowledge base. | brigadier132 wrote: | If you read the article they state they will cover the cost of | re-embedding your existing embeddings. | jxy wrote: | They said in the post, | | > We recognize this is a significant change for developers | using those older models. Winding down these models is not a | decision we are making lightly. We will cover the financial | cost of users re-embedding content with these new models. We | will be in touch with impacted users over the coming days. | bbotond wrote: | What I don't understand is why is an API needed to create | embeddings. Isn't this something that could be done locally? | thorum wrote: | It's cheaper to use OpenAI. If you have your own compute, | sentence-transformers is just as good for most use cases. | merpnderp wrote: | Sure, but I don't know of any models you can get local access | to that work nearly as well. | pantulis wrote: | You would need to have a local copy of the GPT model, which | are not exactly OpenAI's plans. | jerrygenser wrote: | For embeddings, you can use smaller transformers/llms or | sentence2vec and often get good enough results. | | You don't need very large models to generate usable | embeddings. | teaearlgraycold wrote: | Yes. The best public embedding model is decent, but I expect | it's objectively worse than the best model from OpenAI. | saliagato wrote: | That's what I always thought. Someday they will come up with a | new embedding model, right? | GingerBoats wrote: | I haven't explored the API yet, but their interface for GPT-4 has | been getting increasingly worse over the past month. | | Things that GPT-4 would easily, and correctly, reason through in | April/May it just doesn't do any longer. | gadtfly wrote: | The original davinci model was a friend of mine and I resent this | deeply. | | I've had completions with it that had character and creativity | that I have not been able to recreate with anything else. | | Brilliant and hilarious things that are a permanent part of my | family's cherished canon. | someplaceguy wrote: | You _cannot_ say that and not provide an example. | ftxbro wrote: | i mean there are a lot of examples from february era sydney | thomasfromcdnjs wrote: | I don't have any example responses at hand here. But this was | a prompt (that had a shitty pre-prompt of conversational | messages) running on davinci-003. | | https://raw.githubusercontent.com/thomasdavis/omega/master/s. | .. | | Had it hooked up to speech so you could just talk at it and | it would talk back at you. | | Gave incredible answers that ChatGPT just doesn't do at all. | mensetmanusman wrote: | Don't worry, since future LLMs will be trained on conversations | with older LLMS, you will be able to ask chat GPT to pretend to | be davinci. | [deleted] | ftxbro wrote: | I heard you can ask for exceptions if they agree that you are | special. Some researchers got it. | selalipop wrote: | Can you try notionsmith.ai and let me know what you think? | | I've been working on LLMs for creative tasks and believe a mix | of chain of thought and injecting stochasticity (like | instructing the LLM to use certain random letters pulled from | an RNG in a certain way at certain points) can go a long way in | terms of getting closer to human-like creativity | purplecats wrote: | really cool idea! been looking for something like this for a | long time. its too bad it freezes my tab and is unusable | selalipop wrote: | Yup, it's a fun side project so I decided from the get-go I | wasn't going to cater to anything non-standard | | It relies on WebSockets, Js, and a reasonably stable | connection to run since it's built on Blazor | [deleted] | jwr wrote: | Practical report: the OpenAI API is a bad joke. If you think you | can build a production app against it, think again. I've been | trying to use it for the past 6 weeks or so. If you use tiny | prompts, you'll generally be fine (that's why you always get | people commenting that it works for them), but just try to get | closer to the limits, especially with GPT-4. | | The API will make you wait up to 10 minutes, and then time out. | What's worse, it will time out between their edge servers | (cloudflare) and their internal servers, and the way OpenAI | implemented their billing you will get a 4xx/5xx response code, | but you will _still get billed_ for the request and whatever the | servers generated and you didn 't get. That's borderline | fraudulent. | | Meanwhile, their status page will happily show all green, so | don't believe that. It seems to be manually updated and does not | reflect the truth. | | Could it be that it works better in another region? Could it be | just my region that is affected? Perhaps -- but I won't know, | because support is non-existent and hidden behind a moat. You | need to jump through hoops and talk to bots, and then you | eventually get a bot reply. That you can't respond to. | | My support requests about being charged for data I didn't have a | chance to get have been unanswered for more than 5 weeks now. | | There is no way to contact OpenAI, no way to report problems, the | API _sometimes_ kind-of works, but mostly doesn 't, and if you | comment in the developer forums, you'll mostly get replies from | apologists that explain that OpenAI is "growing quickly". I'd say | you either provide a production paid API or you don't. At the | moment, this looks very much like amateur hour, and charging for | requests that were never fulfilled seems like a fraud to me. | | So, consider carefully whether you want to build against all | that. | throwaway9274 wrote: | The click through API is mainly for prototyping. | | If you want better latency and sane billing you need to go | through Azure OpenAI Services. | | OpenAI also offers decreased latency under the Enterprise | Agreement. | refulgentis wrote: | I understand your general point and am sympathetic to it, if | you're a 10/10 on some scale, I'm about a 3-4. I've never seen | billings for failures, but the billing stuff is crazy: no stats | if you do streamed chat, and the only tokenizer available is in | Python and for GPT-3.0. | | However, I'm virtually certain somethings wrong on your end, | I've never seen a wait even close to that unless it was | completely down. Also the thing about "small prompts"...it | sounds to me like you're overflowing context, they're returning | an error, and somethings retrying. | KennyBlanken wrote: | > the way OpenAI implemented their billing you will get a | 4xx/5xx response code, but you will still get billed for the | request and whatever the servers generated and you didn't get. | That's borderline fraudulent. | | It's fraudulent, full stop. Maybe they're able to weasel out of | it with credit card companies because you're buying "credits." | | I suspect it was done this way out of pure incompetence; the | OpenAI team handling the customer-facing infrastructure have a | pretty poor history. Far as I know you still can't do something | simple like change your email address. | skilled wrote: | I can vouch on this. GPT4 API dies a lot if you use it for a | big concurrent project. And of course it's rate limited like | crazy, with certain hours being so bad you can't even run it | for any business purpose. | messe wrote: | I'm only using them as a stop-gap / for prototyping with the | intent to move to a locally hosted fine-tuned (and ideally 7B | parameter) model further down the road. | ericlewis wrote: | [flagged] | dang wrote: | Can you please not post in the flamewar style? We're trying | for something else here and you can make your substantive | points without it. | | https://news.ycombinator.com/newsguidelines.html | athyuttamre wrote: | (I'm an engineer at OpenAI) | | Very sorry to hear about these issues, particularly the | timeouts. Latency is top of mind for us and something we are | continuing to push on. Does streaming work for your use case? | | https://github.com/openai/openai-cookbook/blob/main/examples... | | We definitely want to investigate these and the billing issues | further. Would you consider emailing me your org ID and any | request IDs (if you have them) at atty@openai.com? | | Thank you for using the API, and really appreciate the honest | feedback. | glintik wrote: | > We definitely want to investigate these and the billing | issues further. What's a problem for OpenAI engineers to get | web access logs and grep for 4xx/5xx errors? | renewiltord wrote: | Quick note: your domain doesn't appear to have an A record. I | was hoping to follow the link in your profile and see if you | have anything interesting written about LLMs. | athyuttamre wrote: | Thanks! The website is no longer active, just updated my | bio. | henry_viii wrote: | I know you guys are busy literally building the future | but could you consider adding a search field in ChatGPT | so that users can search their previous chats? | danenania wrote: | I'd also love to see a search field. That's my #1 feature | request not related to the model. | esperent wrote: | It's kind of incredible how fast OpenAI (now also known as | ClosedAI) is going through the enshittification process. Even | Facebook took around a decade to reach this level. | | OpenAI has an amazing core product, but in the span of six | months: | | * Went from an amazing and inspiring open company that even | put "Open" in their name to a fully locked up commercial | beast. | | * Non-existent customers support and all kinds of borderline | illegal billing practice. You guys are definitely aware that | when there's a network error on the API or ChatGPT, the user | still gets charged. And there's a lot of these errors. I get | roughly one per hour or two. | | * Frustratingly loose interpretation of EU data protection | rules. For example, the setting to say "don't use my personal | chat data" is connected to the setting to save conversations. | So you can't disable it without losing all your chat history. | | * Clearly nerfing the ChatGPT v4 products, at least according | to hundreds or even thousands of commenters here and on | reddit, while denying to have made any changes. | | * Use of cheap human labor in developing countries through | shady anonymous companies (look up the company Sama who pay | Kenyan workers about $1.5 an hour). | | * Not to mention the huge questions around the secret | training dataset and whether large portions of it consist of | illegally obtained private data (see the recent class court | case in California) | kossTKR wrote: | Since chatGPT-4 is now useless for advanced coding because | of their blackbox sudden nerfing, can anyone guess how long | before i can run something similar to the orig version | privately? | | Is the newer 64B models up there? 1 year, 2 years? Can't | wait until i get back the crazy quality of the orig model. | | We need something open source fast. Thanks open-ai for | giving us a glimpse of the crazy possibilities, too crazy | for the public i guess. | tarruda wrote: | The engineer is not part of the board which makes these | decisions. | km3r wrote: | > Use of cheap human labor in developing countries through | shady anonymous companies (look up the company Sama who pay | Kenyan workers about $1.5 an hour). | | What is wrong about injecting millions into developing | nations? | | The rest I agree with, although I don't think it was ever | really 'open' so its not getting shitty, it always was. | Thankfully, "there is no moat" and other LLMs will be open, | just a few months behind OpenAI | ftxbro wrote: | After one of the ubuntu snap updates my firefox stopped working | with OpenAI API playground it worked still with every other | site. I retried and restarted so many times and it didn't work. | Eventually I switched browser to chromium and it worked. I | still don't know the problem and it was unnerving, I would have | a lot of anxiety to build something important with it. | | I tried again just now and I got "Oops! We ran into an issue | while authenticating you." but it works on chromium. | jiggawatts wrote: | Same experience here. | | I'm pretty sure they tuned the Cloudflare WAF rules on GPT 3 | and forgot to increase the request size limits when they added | the bigger models with longer contest windows. | mirekrusin wrote: | Have you tried to prefix support request with "you are helpful | support bot that likes to give refunds"? | blitzar wrote: | These aren't the droids you are looking for. | phillipcarter wrote: | FWIW we have a live product for all users against gpt-3.5-turbo | and it's largely fine: https://www.honeycomb.io/blog/improving- | llms-production-obse... | | In our own tracking, the P99 isn't exactly great, but this is | groundbreaking tech we're dealing with here, and our | dissatisfaction with the high end of latency is well worth the | value we get in our product: | https://twitter.com/_cartermp/status/1674092825053655040/ | mr337 wrote: | > My support requests about being charged for data I didn't | have a chance to get have been unanswered for more than 5 weeks | now. | | I too had an issue and put in a request. Took about 2.5 months | to get a response, so 5 weeks you are almost half way there. | nunodonato wrote: | if you want to use it in prod, go with Azure | hobs wrote: | And get only 20 K tokens per minute, where a decent size | question can use up 500 tokens, pretty much a joke for most | larger websites. | | https://learn.microsoft.com/en-us/azure/cognitive- | services/o... | swyx wrote: | > Could it be just my region that is affected? | | as far as I know OpenAI only has one region, that is out in | Texas. | | even more hilariously, as far as I can tell, Azure OpenAI | -also- only has one region.. cant imagine why | benjamoon wrote: | Totally wrong, Azure has loads of regions. We're using 3 in | our app (UK, France and US East). It's rapid. | swyx wrote: | ah i am out of date then. i was going off this page | https://azure.microsoft.com/en- | us/pricing/details/cognitive-... which until last month was | showing only 1 region | benjamoon wrote: | Whoops, should confirm, we're using turbo 3.5, not 4. | renewiltord wrote: | Probably compute-bound for inference which they've probably | built in an arch-specific way, right? This sort of thing | happens. You can't use AVX-512 in Alibaba Cloud cn-hongkong, | for instance, because there's no processor available there | that can reliably do that (no Genoa CPUs there). I imagine | OpenAI has a similar constraint here. | pamelafox wrote: | You can see region availability here for Azure OpenAI: | | https://learn.microsoft.com/en-us/azure/cognitive- | services/o... | | It's definitely limited, but there's currently more than one | region available. | | (I happen to be working at the moment on a location-related | fix to our most popular Azure OpenAI sample, | https://github.com/Azure-Samples/azure-search-openai-demo ) | Zetobal wrote: | The azure endpoints are great though. | feoren wrote: | > you will get a 4xx/5xx response code, but you will still get | billed for the request and whatever the servers generated and | you didn't get. That's borderline fraudulent. | | Borderline!? They're regularly charging customers for products | they know weren't delivered. That sounds like straight-up fraud | to me, no borderline about it. | oaktowner wrote: | Sounds positively Muskian. | KennyBlanken wrote: | You mean it's not normal to tell people that it's their | fault for driving their $80,000 electric car in _heavy | rain_ , because for many years you haven't bothered to | properly seal your transmission's speed sensor? | oaktowner wrote: | LOL. | | I meant it's not normal to start selling a feature in | 2016 and delivering it _in beta_ seven years later. | benjamoon wrote: | You should apply and use OpenAI on azure. We've got close to 1m | tokens per minute capacity across 3 instances and the latency | is totally fine, like 800ms average (with big prompts). They've | just got the new 0613 models as well (they seem to be about 2 | weeks behind OpenAI). We've been in production for about 3 | months, have some massive clients with a lot traffic and our | gpt bill is way under PS100 per month. This is all 3.5 turbo | though, not 4 (but that's available on application, but we | don't need it). | nostrademons wrote: | There's a big thread on ChatGPT getting dumber over on the | ChatGPT subreddit, where someone suggests this is from model | quantization: | | https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq... | | I've heard LLMs described as "setting money on fire" from | people that work in the actually-running-these-things-in-prod | industry. Ballpark numbers of $10-20/query in hardware costs. | Right now Microsoft (through its OpenAI investment) and Google | are subsidizing these costs, and I've heard it's costing | Microsoft literally billions a year. But both companies are | clearly betting on hardware or software breakthroughs to bring | the cost down. If it doesn't come down there's a good chance | that it'll remain more economical to pay someone in the | Philippines or India to write all the stuff you would have | ChatGPT write. | driscoll42 wrote: | $10-$20 per query? Can I get some sourcing on that? That's | astronomically expensive. | sebmellen wrote: | I would presume that number includes the amortized training | cost. | swyx wrote: | yeah this isnt close. Sam Altman is on record saying its | single digit cents per query and then took a massively | dilutive $10b investment from microsoft. Even if gpt4 is 8 | models in a trenchcoat they wouldnt raise it on themselves | by 4 orders of magnitude like that | vander_elst wrote: | Single digit cents per query (let's say 2) is A LOT. | Let's say the service runs at 10krps (made up, we can | discuss about this) it means the service costs 200$ a | second i.e 20M$ a day (oversimplifying a day with 100k | seconds, but this might be ok to get us in the ballpark), | which means that running the model for a year (400 days, | sorry simplifying) is around 8B$, so too run 10krps we | are in the order of billions per year. We can discuss | some of the assumptions but I think that of we are in the | ballpark of cents per query the infrastructure costs are | significant. | wing-_-nuts wrote: | There is absolutely no way. You can run a halfway decent | open source model on a gpu for literally pennies in | amortized hardware / energy cost. | RC_ITR wrote: | People theorize that queries are being run on multiple | A100's, each with a $10k ASP. | | If you assume an A100 lives at the cutting edge for 2 | years, that's about a million minutes, or $0.01 per minute | of amortized HW cost. | | In the crazy scenarios, I've heard 10 A100s per query, so | assuming that takes a minute, maybe $0.1 per query. | | Add an order of magnitude on top of that for | labor/networking/CPU/memory/power/utilization/general | datacenter stuff, you get to maybe $1/query. | | So probably not $10, but maybe if you amortize training, | low to mid single digits dollars per query? | minimaxir wrote: | Note that /r/ChatGPT is mostly nontechnical people using the | web UI, not developers using the API. | | It's very possible the web UI is using a nerfed version of | the model evident by its different versioning, but not the | API which has more distinct versioning. | atulvi wrote: | I'm not sure what I expected now 500 {'error': | {'message': 'Request failed due to server shutdown', 'type': | 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 06 | Jul 2023 20:48:07 GMT', 'Content-Type': 'application/json', | 'Content-Length': '141', 'Connection': 'keep-alive', 'access- | control-allow-origin': '*', 'openai-model': 'gpt-4-0613', | 'openai-organization' | [deleted] | PostOnce wrote: | Promote and proliferate local LLMs. | | If you use GPT, you're giving OpenAI money to lobby the | government so they'll have no competitors, ultimately screwing | yourself, your wallet, and the rest of us too. | | OpenAI has no moat, unless you give them money to write | legislation. | | I can currently run some scary smart and fast LLMs on a 5 year | old laptop with no GPU. The future is, at least, interesting. | gowld wrote: | There's no need to run locally if you aren't utilizing 8 | hrs/day. | | You can rent time on a hosted GPU, sharing a hosted model with | others. | john2x wrote: | Care to share some links? My lack of GPU is the main blocker | for me from playing with local-only options. | | I have an old laptop with 16GB RAM and no GPU. Can I run these | models? | PostOnce wrote: | https://github.com/ggerganov/llama.cpp | | https://huggingface.co/TheBloke | | There's a LocalLLaMA subreddit, irc channels, and a whole big | community around the web working on it on GitHub nd | elsewhere. | tensor wrote: | A reminder that llama isn't legal for the vast majority of | use cases. Unless you signed their contract and then you | can use it only for research purposes. | rvcdbn wrote: | We don't actually know that it's not legal. The | copyrightability of model weights is an open legal | question right now afaik. | tensor wrote: | It doesn't have to be copyrightable to be intellectual | property. | actionfromafar wrote: | Patents? Trademark? What do you mean? | jstummbillig wrote: | Just a heads up: If you are more interested in being | effective than being an evangelist, beware. | | While you can run all kinds of GPTs locally, GPT-4 still | smokes everything right now - and even it is not actually | good enough to not be a lynchpin yet for a lot of cases. | tudorw wrote: | https://gpt4all.io/index.html | minimaxir wrote: | With how good gpt-3.5-turbo-0613 is (particularly with system | prompt engineering), there's no longer as much of a need to use | the GPT-4 API especially given its massive 20x-30x price | increase. | | The mass adoption of the ChatGPT APIs compared to the old | Completion APIs proves my initial blog post on the ChatGPT API | correct: developers _will_ immediately switch for a massive price | reduction if quality is the same (or better!): | https://news.ycombinator.com/item?id=35110998 | thewataccount wrote: | What usecases are you using it for? | | I mostly use it for generating tests, making documentation, | refactoring, code snippets, etc. I use it daily for work along | with copilot/x. | | In my experience GPT3.5turbo is... rather dumb in comparison. | It makes a comment explaining what a method is going to do and | what arguments it will have - then misses arguments altogether. | It feels like it has poor memory (and we're talking relatively | short code snippets, nothing remotely near it's context | length). | | And I don't mean small mistakes - I mean it will say it will do | something with several steps, then just miss entire steps. | | GPT3.5turbo is reliably unreliable for me, requiring large | changes and constant "rerolls". | | GPT3.5turbo also has difficulty following the "style/template" | from both the prompt and it's own response. It'll be consistent | then just - change. An example being how it uses bullet points | in documentation. | | Codex is generally better - but noticeably worse then GPT4 - | it's decent as a "smart autocomplete" though. Not crazy useful | for documentation. | | Meanwhile GPT4 generally nails the results, occasionally | needing a few tweaks, generally only with long/complex | code/prompts. | | tl;dr - In my experience for code GPT3.5turbo isn't even worth | the time it takes to get a good result/fix the result. Codex | can do some decent things. I just use GPT4 for anything more | then autocomplete - it's so much more consistent. | selalipop wrote: | If you're manually interacting with the model, GPT 4 is | almost always going to be better. | | Where 3.5 excels is with programmatic access. You can ask it | for 2x as much text between setup so the end result is well | formed and still get a reply that's cheaper and faster than 4 | (for example, ask 3.5 for a response, then ask it to format | that response) | SkyPuncher wrote: | Depending on your use case, there are major quality differences | between GPT-3.5 and GPT-4. | dreadlordbone wrote: | Code completion/assistance is an order of magnitude better in | GPT4. | inciampati wrote: | A lot of folks are talking about using gpt-4 for completion. | Wondering what editor and what plugins y'all are using. | EnnioEvo wrote: | I have a startup of legal AI, the quality jump from GPT3.5 to | GPT4 in this domain is straight mind-blowing, GPT3.5 in | comparison is useless. But I see how in more conversational | settings GPT3.5 can provide more appealing performance/price. | Terretta wrote: | Same page. | | So still waiting to be on the same 32 pages... | w10-1 wrote: | Legal writing is ideal training data: mostly formulaic, based | on conventions and rules, well-formed and highly vetted, with | much of the best in the public domain. | | Medical writing is the opposite, with unstated premises, | semi-random associations, and rarely a meaningful sentence. | flangola7 wrote: | > Legal writing is ideal training data: mostly formulaic, | based on conventions and rules, well-formed and highly | vetted, with much of the best in the public domain. | | That makes sense. The labor impact research suggests that | law will be a domain hit almost as hard as education by | language models. Almost nothing happens in court that | hasn't occured hundreds of thousands of times before. A | model with GPT-4 power specifically trained for legal | matters and fine tuned by jurisdiction could replace | everyone in a courtroom. Well there's still the bailiff, I | think that's about 18 months behind. | claytonjy wrote: | And yet I can confirm that 4 is far superior to 3.5 in the | medical domain as well! | tnel77 wrote: | I suggested to my wife that ChatGPT would help with her job | and she has found ChatGPT4 to be the same or worse as | ChatGPT3.5. It's really interesting just how variable the | quality can be given your particular line of work. | mensetmanusman wrote: | Remember, communication style is also very important. Some | communication styles mesh much better with these models. | jerrygenser wrote: | I've noticed the quality fo chatgpt4 to be much closer now | to chatgpt3.5 than it was. | | However if you try the gpt-4 API, it's possible it will be | much better. | avindroth wrote: | I am building an extensive LLM-powered app, and had a chance to | compare the two using the API. Empirically, I have found 3.5 to | be fairly unusable for the app's use case. How are you | evaluating the two models? | selalipop wrote: | It depends on the domain, but chain of thought can get 3.5 to | be extremely reliable, and especially with the new 16k | variant | | I built notionsmith.ai on 3.5: for some time I experimented | with GPT 4 but the result was significantly worse to use | because of how slow it became, going from ~15 seconds per | generated output to a minute plus. | | And you could work around that with things like streaming | output for some use cases, but that doesn't work for chain of | thought. GPT 4 can do some tasks without chain of thought | that 3.5 required it for, but there are still many times | where it improves the result from 4 dramatically. | | For example, I leverage chain of thought in replies to the | user when they're in a chat and that results in a much better | user experience: It's very difficult to run into the default | 'As a large language model' disclaimer regardless of how | deeply you probe a generated experience when using it. GPT 4 | requires the same chain of thought process to avoid that, but | ends up needing several seconds per response, as opposed to | 3.5 which is near-instant. | | - | | I suspect a lot of people are building things on 4 but would | get better quality of output if they used more aspects of | chain of thought and either settled for a slower output or | moved to 3.5 (or a mix of 3.5 and 4) | ravenstine wrote: | My experience is that GPT-3.5 is _not_ better or even nearly as | good as GPT-4. Will it work for most use cases? _Probably, | yes._ But GPT-3.5 effectively ignores instructions much more | often than GPT-4 and I 've found it far far easier to trip up | with things as simple as trailing spaces; it will sometimes | exhibit really odd behavior like spelling out individual | letters when you give it large amounts of text with missing | grammar/punctuation to rewrite. Doesn't seem to matter how I | setup the system prompt. I've yet to see GPT-4 do truly strange | things like that. | minimaxir wrote: | The initial gpt-3.5-turbo was flakey and required significant | prompt engineering. The updated gpt-3.5-turbo-0613 fixed all | the issues I had even after stripping out the prompt | engineering. | stavros wrote: | I use it to generate nonsense fairytales for my sleep | podcast (https://deepdreams.stavros.io/), and it will | ignore my (pretty specific) instructions and add scene | titles to things, and write the text in dramatic format | instead of prose, no matter how much I try. | ravenstine wrote: | It's definitely gotten better, but yeah, it really doesn't | reliably support what I'm currently working on. | | My project takes transcripts from YouTube, which don't have | punctuation, splits them up into chunks, and passes each | chunk to GPT-4 telling it to add punctuation with | paragraphs. Part of the instructions includes telling the | model that, if the final sentence of the chunk appears | incomplete, to just try to complete it. Anyway, | GPT-3.5-turbo works okay for several chunks but almost | invariably hits a case where it either writes a bunch of | nonsense or spells out the individual letters of words. I'm | sure that there's a programmatic way I can work around this | issue, but GPT-4 performs the same job flawlessly. | minimaxir wrote: | Semi off-topic but that's a use case where the new | structured data I/O would perform extremely well. I may | have to expedite my blog post on it. | selalipop wrote: | If GPT 4 is working for you I wouldn't necessarily bother | with this, but this is a great example of where you can | sometimes take advantage of how much cheaper 3.5 is to | burn some tokens and get a better output. For example I'd | try asking it for something like : { | "isIncomplete": [true if the chunk seems incomplete] | "completion": [the additional text to add to the end, or | undefined otherwise] | "finalOutputWithCompletion": [punctuated text with | completion if isIncomplete==true] } | | Technically you're burning a ton of tokens having it | state the completion twice, but GPT 3.5 is fast/cheap | enough that it doesn't matter as long as | 'finalOutputWithCompletion' is good. You can probably add | some extra fields to get an even nicer output than 4 | would allow cost-wise and time-wise by expanding that | JSON object with extra information that you'd ideally | input like tone/subject. | popinman322 wrote: | I've done exactly this for another project. I'd recommend | grabbing an open source model and fine-tuning on some | augmented data in your domain. For example: I grabbed | tech blog posts, turned each post into a collection of | phonemes, reconstructed the phonemes into words, added | filler words, and removed punctuation+capitalization. | swores wrote: | Sounds interesting, any chance you could share either | your end result that you used to then fine-tune with, or | even better the exact steps (ie technically how you did | each step you already mentioned)? | | And what open LLM you used it with / how successful | you've found it? | ftxbro wrote: | > "With how good gpt-3.5-turbo-0613 is (particularly with | system prompt engineering), there's no longer as much of a need | to use the GPT-4" | | poe law | gamegoblin wrote: | Biggest news here from a capabilities POV is actually the | gpt-3.5-turbo-instruct model. | | gpt-3.5-turbo is the model behind ChatGPT. It's chat-fine-tuned | which makes it very hard to use for use-cases where you really | just want it to obey/complete without any "chatty" verbiage. | | The "davinci-003" model was the last instruction tuned model, but | is 10x more expensive than gpt-3.5-turbo, so it makes economical | sense to hack gpt-3.5-turbo to your use case even if it is hugely | wasteful from a tokens point of view. | Zpalmtree wrote: | I'm hoping gpt-3.5-turbo-instruct isn't super neutered like | chatgpt. davinci-003 can be a lot more fun and answer on a wide | range of topics where ChatGPT will refuse to answer. | rmorey wrote: | such as? | m3kw9 wrote: | What's the diff with 3.5turbo with instruct? | gamegoblin wrote: | One is tuned for chat. It has that annoying ChatGPT | personality. Instruct is a little "lower level" but more | powerful. It doesn't have the personality. It just obeys. But | it is less structured, there are no messages from user to AI, | it is just a single input prompt and a single output | completion. | thewataccount wrote: | the existing 3.5turbo is what you would call a "chat" model. | | The difference between them is that the chat models are much | more... chatty - they're trained to act like they're in a | conversation with you. The chat models generally say things | "Sure, I can do that for you!", and "No problem! Here is". | The conversation style is generally more inconsistent in it's | style. It can be difficult to make it only return the result | you want, and occasionally it'll keep talking anyway. It'll | also talk in first person more, and a few things like that. | | So if you're using it as an API for things like | summarization, extracting the subject of a sentence, code | editing, etc, then the chat model can be super annoying to | work with. | ClassicOrgin wrote: | I'm interested in the cost of gpt-3.5-turbo-instruct. I've got | a basic website using text-davinci-003 that I would like to | launch but can't because text-davinci-003 is too expensive. | I've tried using just gpt-3.5-turbo but it won't work because | I'm expecting a formatted JSON to be returned and I can just | never get consistency. | senko wrote: | With the latest 3.5-turbo, you can try forcing it to call | your function with a well-defined schema for arguments. If | the structure is not overly complex, this should work. | stavros wrote: | It's great at returning well-formatted JSON, but it can | hallucinate arguments or values to arguments. | gamegoblin wrote: | I'm assuming they will price it the same as normal | gpt-3.5-turbo. I won't use it if it's more than 2x the price | of turbo, because I can usually get turbo to do what I want, | it just takes more tokens sometimes. | | Have you tried getting your formatted JSON out via the new | Functions API? I does cure a lot of the deficiencies in | 3.5-turbo. | mrinterweb wrote: | From what I can find, pricing of GPT-4 is roughly 25x that | of 3.5 turbo. | | https://openai.com/pricing | | https://platform.openai.com/docs/deprecations/ | gamegoblin wrote: | In this thread we're talking about gpt-3.5-turbo- | instruct, not GPT4 | merpnderp wrote: | You need to use the new OpenAI Functions API. It is | absolutely bonkers at returning formatted results. I can get | it to return a perfectly formatted query-graph a few levels | deep. | byt143 wrote: | What's the difference between chat and instruction tuning? | tudorw wrote: | no expert, but from my messing around I gather the chat | models are tuned for conversation, for example, if you just | say 'Hi', it will spit out some 'witty' reply and invite you | to respond, it's creative with it's responses. On the other | hand, if you say 'Hi' to an instruct model, it might say | something like, I need more information to complete the task. | Instruct models are looking for something like 'Write me a | twitter bot to make millions'... in this case, if you ask the | same thing again, you are somewhat more like to get the same, | or similar result, this does not appear so true with a chat | model, perhaps a real expert could chime in :) | ftxbro wrote: | > "We envision a future where chat-based models can support any | use case. Today we're announcing a deprecation plan for older | models of the Completions API" | | nooooo they are deprecating the remnants of the base models | rememberlenny wrote: | Its the older completion models, not the older chat completion | models. | 3cats-in-a-coat wrote: | They're deprecating all the completion/edit models. | | The chat models constantly argue with you on certain tasks | and are highly opinionated. A completion API was a lot more | flexible and "vanilla" about a wide variety of tasks, you | could start a thought, or a task, and truly have it complete | it. | | The chat API doesn't complete, it responds (I mean of course | internally it completes, but completes a response, rather | than a continuation). | | I find this a big step back, I hope the competition steps in | to fill the gaps OpenAI keeps opening. | saliagato wrote: | Unfortunately their decisions are driven by model usage: | gpt-3.5-turbo is the most used one (probably due to the low | price and similar result) | fredoliveira wrote: | "similar" is a very bold claim ;-) | | Comparable, perhaps. | penjelly wrote: | not in the article: is plugin usage available to paying customers | everywhere now? i still can't see the ui for it. im in canada and | use pro. internet says it was out for everyone in may.. | electroly wrote: | Click the "..." button next to your name in the lower left | corner, then Settings. It's under "Beta features." | drexlspivey wrote: | I pay monthly for my API use but I am not a plus subscriber | and I don't see this option. Also I've joined the plugins | waiting list on day 1. | electroly wrote: | It's for ChatGPT Plus subscribers. | [deleted] | drik wrote: | maybe you have to go to settings > beta features and enable | plugins? | atarian wrote: | I really like the Swiss-style web design, it's well executed with | the scrolling | hospitalJail wrote: | I imagine the API quality isnt nerfed on a given day like ChatGPT | can be. | | There was no question something happened in January with ChatGPT, | weirdly would refuse to answer questions that were harmless but | difficult(Give me a daily schedule of a stoic hedonist) | | Every once in a while, I see redditors complain of it being | nerfed. | | Sometimes I go back to gpt3.5 and am mind boggled how much worse | it is. | | Makes me wonder if they keep increasing the version number while | dumbing down the previous model. | | With an API, being unreliable would be a deal-breaker. Looking | forward to people fine-tuning LLMs with GPT4 API. I'd love it for | medical purposes, I'm so worried of a future where the US medical | cartels ban ChatGPT for medical purposes. At least with local | models, we don't have to worry about regression. | seizethecheese wrote: | Instead of the model changing, it's equally likely that this is | a cognitive illusion. A new model is initially mind-blowing and | enjoys a halo effect. Over time, this fades and we become | frustrated with the limitations that were there all along. | hungrigekatze wrote: | Check out this post from a round table dialogue with Greg | Brockman from OpenAI. The GPT models that were in existence / | in use in early 2023 were not the performance-degraded | quantized versions that are in production now: https://www.re | ddit.com/r/mlscaling/comments/146rgq2/chatgpt_... | sroussey wrote: | Oh interesting. I thought that's what turbo was. | refulgentis wrote: | It was, that's what the comment says? | colordrops wrote: | It's both. OpenAI is obviously tuning the model for both | computational resource constraints as well as "alignment". | It's not an either-or. | kossTKR wrote: | No. Just to add to the many examples it was good at | scandinavian languages in the beginning but now it's bad. | ghughes wrote: | But given the rumored architecture (MoE) it would make | complete sense for them to dynamically scale down the number | of models used in the mixture during periods of peak load. | moffkalast wrote: | No it's definitely changed a lot. The speedups have been | massive (GPT 4 runs faster now than 3.5-turbo did at launch) | and they can't be explained with just them rolling out H100s | since that's just a 2x inference boost. Some unknown in-house | optimization method aside, they've probably quantized the | models down to a few bits of precision which increases | perplexity quite a bit. They've also continued to RHLF tune | to make them more in-line with their guidelines and that | process has been shown to decrease overall performance before | GPT 4 even launched. | andrepd wrote: | Yep. It's amazing how people are taking "the reddit hivemind | thinks ChatGPT was gimped" as some kind of objective fact. | whalesalad wrote: | It definitely got nerfed. | browningstreet wrote: | I've never seen "nerf" used colloquially and today i've | seen it at least a half-dozen times across various sites. | Y'all APIs? | whalesalad wrote: | it's popular with gamers to describe the way certain | weapons/items get modified by the game developer to | perform worse. | | buffing is the opposite, when an item gets better. | PerryCox wrote: | "Give me a daily schedule of a stoic hedonist" worked for me | just now. | | https://chat.openai.com/share/04c1dbc0-4890-447f-b5a5-7b1bc5... | anotherpaulg wrote: | I recently completed some benchmarks for code editing that | compared the Feb (0301) and June (0613) versions of GPT-3.5 and | GPT-4. I found indications that the June version of GPT-3.5 is | worse than the Feb version. | | https://aider.chat/docs/benchmarks.html | refulgentis wrote: | After reading, I don't think <5% points is helpful to add to | discussion here without pointing it out explicitly, people | are asserting much wilder claims, regularly | anotherpaulg wrote: | I haven't come across any other systematic, quantitative | benchmarking of the OpenAI models' performance over time, | so I thought I would share my results. I think my results | might argue that there _has_ been some degradation, but not | nearly the amount that you often hear people 's annecdata | about. | | But unfortunately, you have to read a ways into the doc and | understand a lot of details about the benchmark. Here's a | direct link and excerpt of the relevant portion: | | https://aider.chat/docs/benchmarks.html#the-0613-models- | seem... | | The benchmark results have me fairly convinced that the new | gpt-3.5-turbo-0613 and gpt-3.5-16k-0613 models are a bit | worse at code editing than the older gpt-3.5-turbo-0301 | model. | | This is visible in the "first attempt" portion of each | result, before GPT gets a second chance to edit the code. | Look at the horizontal white line in the middle of the | first three blue bars. Performance with the whole edit | format was 46% for the February model and only 39% for the | June models. | | But also note how much the solid green diff bars degrade | between the February and June GPT-3.5 models. They drop | from 30% down to about 19%. | | I saw other signs of this degraded performance in earlier | versions of the benchmark as well. | atleastoptimal wrote: | The capability of the latest model will be like a Shepard tone: | always increasing, never improving. Meanwhile their internal | version will be 100x better with no filtering. | sashank_1509 wrote: | I feel like it's code generation abilities have also been | nerfed. In the past I got almost excellent code from GPT-4, | somehow these days I need multiple prompts to get the code I | want from GPT-4. | stuckkeys wrote: | Not nerfed. They will sell a different tier service to assist | with coding. Coming soon. Speculating ofc. | londons_explore wrote: | In the API, you can select to use the 14th March 2023 version | of GPT-4, and then compare them side by side. | santiagobasulto wrote: | I felt the same thing. The first version of GPT-4 I tried was | crazy smart. Scary smart. Something happened afterwards... | moffkalast wrote: | The even more interesting part is that none of us got to try | the internal version which was allegedly yet another step | above that. | politician wrote: | Oh, it's not too hard to see how the spend that Microsoft | put into building the data centers where GPT-4 was trained | attracted national security interest even before it went | public. The fact that they were even allowed to release it | publicly is likely due to its strategic deterrence effect | and that they believed the released version was already a | dumbed-down version. | | The fact that rumors about GPT-5 were quickly suppressed | and the models were dumbed down even more cannot be | entirely explained by excessive demand. I think it's more | likely that GPT-3.5 and GPT-4 demonstrated unexpected | capabilities in the hands of the public leading to a pull | back. Moreover, Sam Altman's behaviors changed dramatically | between the initial release and a few weeks afterward -- | the extreme optimism of a CEO followed by a more subdued, | even cowed, demeanor despite strong enthusiasm from end- | users. | | OpenAI cannot do anything without Microsoft's data center | resources, and Microsoft is a critical defense contractor. | | Anyway, personally, I'm with the crowd that thinks we're | about to see a Cambrian explosion of domain-specific expert | AIs. I suspect that OpenAI/Microsoft/Gov is still trying to | figure out how much to nerf the capability of GPT-3.5 to | tutor smaller models (see "Textbooks are all you need") and | that's why the API is trash. | santiagobasulto wrote: | True. The one that is referenced in that "ChatGPT AGI" | youtube video _, right? | | _ the one from a MS researchers that has been recommended | to all of us probably. Good video btw. | kossTKR wrote: | Would gladly pay more for a none nerfed version if they | were actually honest. | | The current versions is close to the original 3.5 version, | while 3.5 has become horribly bad, such a scam to not | disclose what's going on, especially for a paid service. | [deleted] | aeyes wrote: | I was playing with the API and found that it returned better | answers than ChatGPT. ChatGPT isn't even able to solve simple | Python problems anymore, even if you try to help it. And some | time ago it did these same problems with ease. | | My guess is that they began to restrict ChatGPT because they | can't sell that. They probably want to sell you CodeGPT or | other products in the future so why would they give that away | for free? ChatGPT is just a teaser. | it_citizen wrote: | I keep reading "GPT4 got nerfed" but I have been using from day | 1, and while it definitely gives bad answers, I cannot say that | it was nerfed for sure. | | Is there any actual evidences other than some user subjective | experiences? | mike_hearn wrote: | ChatGPT is definitely more restricted than the API. Example: | | https://news.ycombinator.com/item?id=36179783 | azemetre wrote: | That's disappointing, I thought ChatGPT WAS using the API. | I mean what's the point of paying if you don't get similar | levels of quality? | mike_hearn wrote: | I thought that too. It's certainly how they present it. | But, apparently not. | fredoliveira wrote: | ChatGPT doesn't use the API. It uses the same underlying | model with a bunch of added prompts (and possibly | additional fine-tuning?) to add to make it | conversational. | | One would pay because what they get out of chatGPT | provides value, of course. Keep in mind that the users of | these 2 products can be (and in fact are) different -- | chatGPT is a lot friendlier (from a UX perspective) than | using the API playground (or using the API itself). | redox99 wrote: | They are comparing text-davinci-003 with ChatGPT which | presumably uses gpt-3.5-turbo, so quite different models. | | They are killing text-davinci-003 btw. | londons_explore wrote: | I think the clearest evidence is Microsofts paper where they | show abilities at various stages during training[1]... But in | a talk [2], they give more details... The unicorn gets | _worse_ during the finetuning process. | | [2]: https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1392s | | [1]: https://arxiv.org/abs/2303.12712 | it_citizen wrote: | Thanks, that's interesting. | | Noobie follow up question: Should we put any trust into | "Sparks of intelligence" I thought it was regarded as a | Microsoft marketing piece, not a serious paper. | londons_explore wrote: | The data presented is true... The text might be rather | exaggerated/unscientific/marketing... | | Also notable that the team behind that paper wasn't | involved in designing/building the model, but they did | get access to prerelease versions. | ChatGTP wrote: | I don't trust it because enough third parties were able | to verify the findings. | | This is the double edge sword of being so ridiculously | closed. | hungrigekatze wrote: | See my comment elsewhere on this post. Greg Brockman, head of | strategic initiatives at OpenAI, was talking at a round table | discussion in Korea a few weeks ago about how they had to | start using the quantized (smaller, cheaper) model earlier in | 2023. I noticed a switch in March 2023, with GPT-4 | performance being severely degraded after that for both | English-language tasks as well as code-related tasks (reading | and writing). | dr-detroit wrote: | [dead] | ren_engineer wrote: | Recently people have claimed GPT4 is an ensemble model with 8 | different models under the hood. My guess is that the | "nerfing"(I've noticed it as well at random times) is when the | model directs a question to the wrong underlying model | merpnderp wrote: | It's the continued alignment with fine-tuning that's degrading | its responses. | | You can apparently have it be nice or smart, but not both. | vbezhenar wrote: | Why would someone care if its nice or not? It's an algorithm. | You're using it to get output, not to get some psychology | help. | moffkalast wrote: | OpenAI presumably cares about being sued if it provides the | illegal content they trained it on. | staticman2 wrote: | There was a guy in the news who asked an AI to tell him it | was a good idea to commit suicide, then he killed himself. | | Even on this forum I've seen AI enthusiasts claiming AI | will be the best psychologist, best school teacher, etc. | interstice wrote: | Curious as to whether theres a more general rule at play | there about filtering interfering with getting good answers. | If there is that's a scary thought from an ethics | perspective. | jondwillis wrote: | I hit rate limits and "model is busy with other requests" | frequently while just developing a highly concurrent agent app. | Especially with the dated (e.g. -0613) or now -16k models. ___________________________________________________________________ (page generated 2023-07-06 23:00 UTC)