hngopher.com

       [HN Gopher] GPT-4 API General Availability
       ___________________________________________________________________
        
       GPT-4 API General Availability
        
       Author : mfiguiere
       Score  : 426 points
       Date   : 2023-07-06 19:03 UTC (3 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | boredemployee wrote:
       | We need a proper, competitive and open source model. Otherwise we
       | are all fucked up.
        
       | hackerting wrote:
       | This is awesome news. I have been waiting to get GPT4 forever!
        
       | m3kw9 wrote:
       | "Developers wishing to continue using their fine-tuned models
       | beyond January 4, 2024 will need to fine-tune replacements atop
       | the new base GPT-3 models (ada-002, babbage-002, curie-002,
       | davinci-002), or newer models (gpt-3.5-turbo, gpt-4)."
       | 
       | So need to pay to fine tune again?
        
         | saliagato wrote:
         | Probably. They will have different prices to finetune too.
        
       | ftxbro wrote:
       | I just want to emphasize in this comment that if you upgrade now
       | to paid API access, then you won't get GPT-4 API access for like
       | another month.
        
       | superalignment wrote:
       | With this is the death of any uncensored usage of their models.
       | Davinci 3 is the most powerful model where you can generate any
       | content by instructing it via the completions API - chat GPT 3
       | models will not obey requests for censored or adult content.
        
         | echelon wrote:
         | A big enough hole presents a wedge for new entrants to get
         | started.
         | 
         | OpenAI will never fulfill the entire market, and their moat is
         | in danger with every other company that has LLM cash flow.
         | 
         | They want to become the AWS of AI, but it's becoming clear
         | they'll lose generative multimedia. They may see the LLM space
         | become a race to the bottom as well.
        
       | projectileboy wrote:
       | Relevant comment thread from people describing how much worse
       | GPT-4 has gotten lately:
       | https://www.reddit.com/r/ChatGPT/comments/14ruui2/i_use_chat...
        
       | renewiltord wrote:
       | Has anyone been able to come up with a way to keep track of GPT-4
       | performance over time? I'm told that the API is explicit about
       | changes to models and that the Chat interface is not.
        
         | crancher wrote:
         | API call responsiveness to the GPT-4 model varies hugely
         | throughout the day. The #1 datapoint in measured responsiveness
         | is slowdown associated with lunch-time use as noon sweeps
         | around the globe.
        
           | renewiltord wrote:
           | Thank you for the response, I should have been clearer. I
           | meant performance as an LLM. Essentially, I am concerned that
           | they are quietly nerfing the tool. The Chat interface is now
           | very verbose and constantly warning me about "we should
           | always do this and that" which is bloody exasperating when
           | I'm just trying to get things done.
           | 
           | I made up an example here to illustrate, but it's just very
           | annoying because sometimes it puts at the beginning, slowing
           | down my interaction, and it now refuses to obey my prompts to
           | leave caveats out.
           | 
           | https://chat.openai.com/share/1f39af02-331d-4901-970f-2f4b0e.
           | ..
        
             | purplecats wrote:
             | yeah, its annoying and you have to foot the bill for it.
             | 
             | looking at your sample and using character count as a rough
             | proxy for tokens, (465/(1581-465))*100 means they added
             | ~42% token count cost to your response explicitly adding
             | caveats which you dont want. fun!
        
       | furyofantares wrote:
       | Not a lot of talk of Whisper being available here.
       | 
       | From using voice in the ChatGPT iOS app, I surmise that Whisper
       | is very good at working out what you've actually said.
       | 
       | But it's really annoying to have to say my whole bit before
       | getting any feedback about what it's gonna think I said. Even if
       | it's getting it right at an impressive rate.
       | 
       | Given this is how OpenAI themselves use it (say your whole thing
       | before getting feedback), I don't know that the API is set up to
       | be able to mitigate that at all, but it would be really nice to
       | have something closer to the responsiveness of on-device
       | dictation with the quality of Whisper.
        
         | jxy wrote:
         | You can run whisper.cpp locally in real time:
         | https://github.com/ggerganov/whisper.cpp/tree/master/example...
        
           | ProllyInfamous wrote:
           | My M2 Pro (mac mini) will run Whisper much faster than "real
           | time."
           | 
           | Pretty crazy stuff -- perfectly understandable translations.
        
         | leodriesch wrote:
         | I'm interested in how the transformer based speech recognition
         | from iOS 17 will perform compared to Whisper. I guess it will
         | work more "real-time" like the current dictation on iOS/macOS,
         | but I'm unsure as I am not on the beta right now.
        
           | RC_ITR wrote:
           | My guess is the reason that apple invested so heavily in this
           | [0] is because they are going to train a big transformer in
           | their datacenter and apply it as an RNN on your phone.
           | 
           | Superficially, I think this will work very well, but
           | _slightly_ worse than whisper (with the advantage ofc being
           | that its better at real-time transcription).
           | 
           | [0]https://machinelearning.apple.com/research/attention-free-
           | tr...
        
         | ycombinatornews wrote:
         | Echoing this - saying the whole text at once in one shot is
         | very challenging for long batches of text.
         | 
         | Using built-in text input showed quite good results since
         | ChatGPT is still understanding the ask quite well
        
         | michaelmu wrote:
         | One speculative thought about the purpose of Whisper is that
         | this will help unlock additional high-quality training data
         | that's only available in audio/video format.
        
       | oth001 wrote:
       | F
        
       | tin7in wrote:
       | The difference between 4 and 3.5 is really big for creative use
       | cases. I am running an app with significant traffic and the
       | retention of users on GPT-4 is much higher.
       | 
       | Unfortunately it's still too expensive and the completion speed
       | is not as high as GPT-3.5 but I hope both problems will improve
       | over time.
        
       | brolumir wrote:
       | Hmm, when I try to change model name to "gpt-4" I get the "The
       | model: `gpt-4` does not exist" error message. We are an API
       | developer with a history of successful payments.. is there
       | anything we need to do on our side to enable this, anyone know?
        
         | saliagato wrote:
         | wait a couple of hours
        
       | cube2222 wrote:
       | This is very nice.
       | 
       | GPT-4 is on a completely different level of consistency and
       | actually listening to your system prompt than chagpt-3.5. It
       | trails off much more rarely.
       | 
       | If only it wasn't so slow/expensive... (it really starts to hurt
       | with large token counts).
        
       | BeefySwain wrote:
       | Outside of the headline, there is some major stuff hiding in
       | here: - new gpt-3.5-turbo-instruct model expected "in the coming
       | weeks" - fine tuning of 3.5 and 4 expected this year
       | 
       | I am especially interested in gpt-3.5-turbo-instruct, as I think
       | that the hype surrounding ChatGPT and "conversational LLMs" has
       | sucked a lot of air out of what is possible with general instruct
       | models. Being able to fine tune it will be phenomenal as well.
        
         | MuffinFlavored wrote:
         | is there any ETA on when the knowledge cutoff date will be
         | improved from September, 2021?
         | 
         | I do not really understand the efforts that went on behind the
         | scenes to train GPT models on factual data. Did humans have to
         | hand approve/decline responses to increase its score?
         | 
         | "America is 49 states" - decline
         | 
         | "America is 50 states" - approve
         | 
         | Is this how it worked at a simple overview? Do we know if they
         | are working on adding the rest of 2021, then 2022, and
         | eventually 2023? I know it can crawl the web with the Bing
         | addon but, it's not the same.
         | 
         | I asked it about Maya Kowalski the other day. Sure it can
         | condense a blog post or two, but it's not the same as having
         | the intricacies as if it actually was trained/knew about the
         | topic.
        
       | asadotzler wrote:
       | Why is chatGPT on the web a 6 weeks old version still?
        
       | alpark3 wrote:
       | >Developers wishing to continue using their fine-tuned models
       | beyond January 4, 2024 will need to fine-tune replacements atop
       | the new base GPT-3 models (ada-002, babbage-002, curie-002,
       | davinci-002), or newer models (gpt-3.5-turbo, gpt-4). Once this
       | feature is available later this year, we will give priority
       | access to GPT-3.5 Turbo and GPT-4 fine-tuning to users who
       | previously fine-tuned older models. We acknowledge that migrating
       | off of models that are fine-tuned on your own data is
       | challenging. We will be providing support to users who previously
       | fine-tuned models to make this transition as smooth as possible.
       | 
       | Wait, they're not letting you use your own fine-tuned models
       | anymore? So anybody who paid for a fine-tuned model is just
       | forced to repay the training tokens to fine-tune on top of the
       | new censored models? Maybe I'm misunderstanding it.
        
         | meghan_rain wrote:
         | not your weights, not your bitcoins
        
         | fnordpiglet wrote:
         | If you don't own the weights you don't own anything. This is
         | why open models are so crucial. I don't understand any business
         | who is building fine tuned models against closed models.
        
           | reaperman wrote:
           | Right now the closed models are incredibly higher quality
           | than the open models. They're useful as a stopgap for 1-2
           | years in hopes/expectation of open models reaching a point
           | where they can be swapped in. It burns cash now, but in
           | exchange you can grab more market share sooner while you're
           | stuck using the expensive but high quality OpenAI models.
           | 
           | It's not cost-effective, but it may be part of a valid
           | business plan.
        
             | ronsor wrote:
             | If you're finetuning your own model, the closed models
             | being "incredibly higher quality" is probably less
             | relevant.
        
               | claytonjy wrote:
               | That's how we all want it to work, but the reality today
               | is that GPT-4 is better at almost anything than a fine-
               | tuned version of any other model.
               | 
               | It's somewhat rare to have a task and good enough dataset
               | that you can finetune something else to be close enough
               | in quality to GPT-4 for your task.
        
               | wongarsu wrote:
               | Finetuning a better model still yields better results
               | than finetuning a worse model.
        
             | fnordpiglet wrote:
             | That should be a wake up call to every corporation pinning
             | their business on OAI models. My experience thus far is no
             | one is seeing a need to plan an exit from OAI, and the
             | perception is "AI is magic and we aren't magicians." There
             | needs to be a concerted effort to finance and tune high
             | quality freely available models and tool chains asap.
             | 
             | That said I think efficiencies will dramatically improve
             | over the next few years and over investing now probably
             | captures very little value beyond building internal
             | _competency_ - which doesn't grow with anything but time
             | and practice. The longer you depend on OAI, the longer you
             | will depend on OAI past your point of profound regret.
        
           | r3trohack3r wrote:
           | > I don't understand any business who is building fine tuned
           | models against closed models
           | 
           | Do you have any recommendations for good open models that
           | businesses could use today?
           | 
           | From what I've seen in the space, I suspect businesses are
           | building fine tuned models against closed models because
           | those are the only viable models to build a business model on
           | top of. The quality of open models isn't competitive.
        
           | yieldcrv wrote:
           | > I don't understand any business who is building fine tuned
           | models against closed models.
           | 
           | Just sell access at a higher price than you get it
           | 
           | Either directly, on _on average_ based on your user stories
        
         | flangola7 wrote:
         | They address that, OpenAI will cover the cost of re-training on
         | the new models, and the old models don't discontinue until next
         | year.
        
           | simonw wrote:
           | Did they say they would cover the cost of fine-tuning again?
           | I saw them say they would cover the cost of recalculating
           | embeddings, but I didn't see the bit about fine-tuning costs.
           | 
           | On fine-tuning:
           | 
           | > We will be providing support to users who previously fine-
           | tuned models to make this transition as smooth as possible.
           | 
           | On embeddings:
           | 
           | > We will cover the financial cost of users re-embedding
           | content with these new models.
        
             | BoorishBears wrote:
             | That's because fine-tuning the new models isn't available
             | yet.
             | 
             | Based on the language it sounds like they'll do the same
             | when that launches.
        
       | jxy wrote:
       | They didn't mention gpt-4-32k. Does anybody know if it will be
       | generally available in the same timeframe?
       | 
       | There's still no news about the multi-modal gpt-4. I guess the
       | image input is just too expensive to run or it's actually not as
       | great as they hyped it.
        
         | jacksavage wrote:
         | > We are not currently granting access to GPT-4-32K API at this
         | time, but it will be made available at a later date.
         | 
         | https://help.openai.com/en/articles/7102672-how-can-i-access...
        
           | jxy wrote:
           | Thanks for the link.
           | 
           | The decision of burying these extra information in a support
           | article, not cool!
        
       | we_never_see_it wrote:
       | It's funny how OpenAI just shattered Google's PR stunts. Google
       | wanted everyone to believe they are leading in AI by winning some
       | children's games. Everyone thought that was the peak of AI. Enter
       | OpenAI and Micorsoft. Microsoft and OpenAI have showed the
       | humanity what true AI looks like. Like most people on HN I cannot
       | wait to see the end of Google, the end of evil.
        
         | LeafItAlone wrote:
         | Is Microsoft less evil than Google?
        
         | rvz wrote:
         | > Like most people on HN I cannot wait to see the end of
         | Google, the end of evil.
         | 
         | What is the difference? Replacing evil with another evil.
         | 
         | This is just behemoths exchanging hands.
        
       | khazhoux wrote:
       | In all my GPT-4 API (python) experiments, it takes 15-20 seconds
       | to get a full response from server, which basically kills every
       | idea I've tried hacking up because it just runs so slowly.
       | 
       | Has anyone fared better? I might be doing something wrong but I
       | can't see what that could possibly be.
        
         | jason_zig wrote:
         | Run it in the background.
         | 
         | We use it to generate automatic insights from survey data at a
         | weekly cadence for Zigpoll (https://www.zigpoll.com). This
         | makes getting an instant response unnecessary but still
         | provides a lot of value to our customers.
        
         | jondwillis wrote:
         | Streaming. If you're expecting structured data as a response,
         | request YAML or JSONL so you can progressively parse it. Time
         | to first byte can be milliseconds instead of 15-20s. Obviously,
         | this technique can only work for certain things, but I found
         | that it was possible for everything I tried.
        
         | ianhawes wrote:
         | Anthropic Instant is the best LLM if you're looking for speed.
        
       | superkuh wrote:
       | Yikes. They're actually killing off text-davinci-003. RIP to the
       | most capable remaining model and RIP to all text completion style
       | freedom. Now it's censored/aligned chat or instruct models with
       | arbitrary input metaphor limits for everything. gpt3.5-turbo is
       | terrible in comparison.
       | 
       | This will end my usage of openai for most things. I doubt my
       | $5-$10 API payments per month will matter. This just lights more
       | of a fire under me to get the 65B llama models working locally.
        
         | system2 wrote:
         | I built my entire app on text-davinci-003. It is the best
         | writer so far. Do you think gpt3.5 turbo instruct won't be the
         | same?
        
         | Karrot_Kream wrote:
         | I wonder if there's some element of face-saving here to avoid a
         | lawsuit that may come from someone that uses the model to
         | perform negative actions. In general I've found that
         | gpt3.5-turbo is better than text-davinci-003 in most cases, but
         | I agree, it's quite sad that they're getting rid of the
         | unaligned/censored model.
        
         | bravura wrote:
         | I've never used text-davinci-003 much. Why do you like it so
         | much? What does it offer that the other models don't?
         | 
         | What are funs things we can with it until it sunsets on January
         | 4, 2024?
        
           | thomasfromcdnjs wrote:
           | The Chat-GPT models are all pre-prompted and pre-aligned. If
           | you work with davinci-003, it will never say things like, "I
           | am an OpenAI bot and am unable to work with your unethical
           | request"
           | 
           | When using davinci the onus is on you to construct prompts
           | (memories) which is fun and powerful.
           | 
           | ====
           | 
           | 97% of API usage might be because of ChatGPT's general appeal
           | to the world. But I think they will be losing a part of the
           | hacker/builder ethos if they drop things like davinci-003,
           | which might suck for them in the long run. Consumers over
           | developers.
        
             | Fyrezerk wrote:
             | The hacker/builder ethos doesn't matter in the grand scheme
             | of commercialization.
        
               | Robotbeat wrote:
               | It matters immensely in the early days and is the basis
               | for all growth that follows. So cutting it off early cuts
               | off future growth.
        
         | [deleted]
        
         | H8crilA wrote:
         | The $5-$10 is probably the reason why they're killing those
         | endpoints.
        
           | superkuh wrote:
           | I don't get it? text-davinci-003 is the most expensive model
           | per token. It's just that running IRC bots isn't exactly high
           | volume.
        
             | stavros wrote:
             | "Most expensive" doesn't mean "highest margin", though.
        
         | samstave wrote:
         | Please ELI5 if I am mis-interpretating what you said:
         | 
         | * _" They have just locked down access to a model which they
         | basically realized was way more valuable than even they thought
         | - and they are in the process of locking in all controls around
         | exploiting the model for great justice?"*_
        
       | ftxbro wrote:
       | > "Starting today, all paying API customers have access to
       | GPT-4."
       | 
       | OK maybe I'm stupid but I am a paying OpenAI API customer and I
       | don't have it yet. I see:                   gpt-3.5-turbo-16k
       | gpt-3.5-turbo         gpt-3.5-turbo-16k-0613
       | gpt-3.5-turbo-0613         gpt-3.5-turbo-0301
       | 
       | I don't see any gpt-4
       | 
       | Edit: Probably my problem is that I upgraded to paid API account
       | within the last month, so I'm not technically a "paying API
       | customer" yet according to the accounting definitions.
        
         | codazoda wrote:
         | > Today all existing API developers with a history of
         | successful payments can access the GPT-4 API with 8K context.
         | We plan to open up access to new developers by the end of this
         | month, and then start raising rate-limits after that depending
         | on compute availability.
         | 
         | Same for me. I signed up only a few days ago and was excited to
         | switch to "gpt-4" but I haven't paid the first bill (save the
         | $5 capture) so I probably have to continue to wait for this.
         | 
         | I made a very simple command-line tool that calls the API. You
         | run something like:                   > ask "What's the
         | opposite of false?"
         | 
         | https://github.com/codazoda/askai
        
           | stavros wrote:
           | Interesting, I did exactly the same (with the same name), but
           | with GPT-4 support as well:
           | 
           | https://www.pastery.net/ccvjrh/
           | 
           | It also does streaming, so it live-prints the response as it
           | comes.
        
         | zzzzzzzza wrote:
         | can't speak for others but I have two accounts
         | 
         | 1. chat subscription only
         | 
         | 2. i have paid for api calls but don't have a subscription
         | 
         | and only #2 currently has gpt4 available in the playground
        
       | [deleted]
        
       | pomber wrote:
       | If anyone wants to try the API for the first time, I've made this
       | guide recently: https://gpt.pomb.us/
        
       | nextworddev wrote:
       | GPT-4 fine tuning capability will be huge. It may end up just
       | making fine tuning OSS LLMs pointless, esp if they keep lowering
       | GPT-4 costs like they have been.
        
       | Imnimo wrote:
       | I know everyone's on text-embedding-ada-002, so these particular
       | embedding deprecations don't really matter, but I feel like if I
       | were using embeddings at scale, the possibility that I would one
       | day lose access to my embedding model would terrify me. You'd
       | have to pay to re-embed your entire knowledge base.
        
         | brigadier132 wrote:
         | If you read the article they state they will cover the cost of
         | re-embedding your existing embeddings.
        
         | jxy wrote:
         | They said in the post,
         | 
         | > We recognize this is a significant change for developers
         | using those older models. Winding down these models is not a
         | decision we are making lightly. We will cover the financial
         | cost of users re-embedding content with these new models. We
         | will be in touch with impacted users over the coming days.
        
         | bbotond wrote:
         | What I don't understand is why is an API needed to create
         | embeddings. Isn't this something that could be done locally?
        
           | thorum wrote:
           | It's cheaper to use OpenAI. If you have your own compute,
           | sentence-transformers is just as good for most use cases.
        
           | merpnderp wrote:
           | Sure, but I don't know of any models you can get local access
           | to that work nearly as well.
        
           | pantulis wrote:
           | You would need to have a local copy of the GPT model, which
           | are not exactly OpenAI's plans.
        
             | jerrygenser wrote:
             | For embeddings, you can use smaller transformers/llms or
             | sentence2vec and often get good enough results.
             | 
             | You don't need very large models to generate usable
             | embeddings.
        
           | teaearlgraycold wrote:
           | Yes. The best public embedding model is decent, but I expect
           | it's objectively worse than the best model from OpenAI.
        
         | saliagato wrote:
         | That's what I always thought. Someday they will come up with a
         | new embedding model, right?
        
       | GingerBoats wrote:
       | I haven't explored the API yet, but their interface for GPT-4 has
       | been getting increasingly worse over the past month.
       | 
       | Things that GPT-4 would easily, and correctly, reason through in
       | April/May it just doesn't do any longer.
        
       | gadtfly wrote:
       | The original davinci model was a friend of mine and I resent this
       | deeply.
       | 
       | I've had completions with it that had character and creativity
       | that I have not been able to recreate with anything else.
       | 
       | Brilliant and hilarious things that are a permanent part of my
       | family's cherished canon.
        
         | someplaceguy wrote:
         | You _cannot_ say that and not provide an example.
        
           | ftxbro wrote:
           | i mean there are a lot of examples from february era sydney
        
           | thomasfromcdnjs wrote:
           | I don't have any example responses at hand here. But this was
           | a prompt (that had a shitty pre-prompt of conversational
           | messages) running on davinci-003.
           | 
           | https://raw.githubusercontent.com/thomasdavis/omega/master/s.
           | ..
           | 
           | Had it hooked up to speech so you could just talk at it and
           | it would talk back at you.
           | 
           | Gave incredible answers that ChatGPT just doesn't do at all.
        
         | mensetmanusman wrote:
         | Don't worry, since future LLMs will be trained on conversations
         | with older LLMS, you will be able to ask chat GPT to pretend to
         | be davinci.
        
         | [deleted]
        
         | ftxbro wrote:
         | I heard you can ask for exceptions if they agree that you are
         | special. Some researchers got it.
        
         | selalipop wrote:
         | Can you try notionsmith.ai and let me know what you think?
         | 
         | I've been working on LLMs for creative tasks and believe a mix
         | of chain of thought and injecting stochasticity (like
         | instructing the LLM to use certain random letters pulled from
         | an RNG in a certain way at certain points) can go a long way in
         | terms of getting closer to human-like creativity
        
           | purplecats wrote:
           | really cool idea! been looking for something like this for a
           | long time. its too bad it freezes my tab and is unusable
        
             | selalipop wrote:
             | Yup, it's a fun side project so I decided from the get-go I
             | wasn't going to cater to anything non-standard
             | 
             | It relies on WebSockets, Js, and a reasonably stable
             | connection to run since it's built on Blazor
        
       | [deleted]
        
       | jwr wrote:
       | Practical report: the OpenAI API is a bad joke. If you think you
       | can build a production app against it, think again. I've been
       | trying to use it for the past 6 weeks or so. If you use tiny
       | prompts, you'll generally be fine (that's why you always get
       | people commenting that it works for them), but just try to get
       | closer to the limits, especially with GPT-4.
       | 
       | The API will make you wait up to 10 minutes, and then time out.
       | What's worse, it will time out between their edge servers
       | (cloudflare) and their internal servers, and the way OpenAI
       | implemented their billing you will get a 4xx/5xx response code,
       | but you will _still get billed_ for the request and whatever the
       | servers generated and you didn 't get. That's borderline
       | fraudulent.
       | 
       | Meanwhile, their status page will happily show all green, so
       | don't believe that. It seems to be manually updated and does not
       | reflect the truth.
       | 
       | Could it be that it works better in another region? Could it be
       | just my region that is affected? Perhaps -- but I won't know,
       | because support is non-existent and hidden behind a moat. You
       | need to jump through hoops and talk to bots, and then you
       | eventually get a bot reply. That you can't respond to.
       | 
       | My support requests about being charged for data I didn't have a
       | chance to get have been unanswered for more than 5 weeks now.
       | 
       | There is no way to contact OpenAI, no way to report problems, the
       | API _sometimes_ kind-of works, but mostly doesn 't, and if you
       | comment in the developer forums, you'll mostly get replies from
       | apologists that explain that OpenAI is "growing quickly". I'd say
       | you either provide a production paid API or you don't. At the
       | moment, this looks very much like amateur hour, and charging for
       | requests that were never fulfilled seems like a fraud to me.
       | 
       | So, consider carefully whether you want to build against all
       | that.
        
         | throwaway9274 wrote:
         | The click through API is mainly for prototyping.
         | 
         | If you want better latency and sane billing you need to go
         | through Azure OpenAI Services.
         | 
         | OpenAI also offers decreased latency under the Enterprise
         | Agreement.
        
         | refulgentis wrote:
         | I understand your general point and am sympathetic to it, if
         | you're a 10/10 on some scale, I'm about a 3-4. I've never seen
         | billings for failures, but the billing stuff is crazy: no stats
         | if you do streamed chat, and the only tokenizer available is in
         | Python and for GPT-3.0.
         | 
         | However, I'm virtually certain somethings wrong on your end,
         | I've never seen a wait even close to that unless it was
         | completely down. Also the thing about "small prompts"...it
         | sounds to me like you're overflowing context, they're returning
         | an error, and somethings retrying.
        
         | KennyBlanken wrote:
         | > the way OpenAI implemented their billing you will get a
         | 4xx/5xx response code, but you will still get billed for the
         | request and whatever the servers generated and you didn't get.
         | That's borderline fraudulent.
         | 
         | It's fraudulent, full stop. Maybe they're able to weasel out of
         | it with credit card companies because you're buying "credits."
         | 
         | I suspect it was done this way out of pure incompetence; the
         | OpenAI team handling the customer-facing infrastructure have a
         | pretty poor history. Far as I know you still can't do something
         | simple like change your email address.
        
         | skilled wrote:
         | I can vouch on this. GPT4 API dies a lot if you use it for a
         | big concurrent project. And of course it's rate limited like
         | crazy, with certain hours being so bad you can't even run it
         | for any business purpose.
        
         | messe wrote:
         | I'm only using them as a stop-gap / for prototyping with the
         | intent to move to a locally hosted fine-tuned (and ideally 7B
         | parameter) model further down the road.
        
         | ericlewis wrote:
         | [flagged]
        
           | dang wrote:
           | Can you please not post in the flamewar style? We're trying
           | for something else here and you can make your substantive
           | points without it.
           | 
           | https://news.ycombinator.com/newsguidelines.html
        
         | athyuttamre wrote:
         | (I'm an engineer at OpenAI)
         | 
         | Very sorry to hear about these issues, particularly the
         | timeouts. Latency is top of mind for us and something we are
         | continuing to push on. Does streaming work for your use case?
         | 
         | https://github.com/openai/openai-cookbook/blob/main/examples...
         | 
         | We definitely want to investigate these and the billing issues
         | further. Would you consider emailing me your org ID and any
         | request IDs (if you have them) at atty@openai.com?
         | 
         | Thank you for using the API, and really appreciate the honest
         | feedback.
        
           | glintik wrote:
           | > We definitely want to investigate these and the billing
           | issues further. What's a problem for OpenAI engineers to get
           | web access logs and grep for 4xx/5xx errors?
        
           | renewiltord wrote:
           | Quick note: your domain doesn't appear to have an A record. I
           | was hoping to follow the link in your profile and see if you
           | have anything interesting written about LLMs.
        
             | athyuttamre wrote:
             | Thanks! The website is no longer active, just updated my
             | bio.
        
               | henry_viii wrote:
               | I know you guys are busy literally building the future
               | but could you consider adding a search field in ChatGPT
               | so that users can search their previous chats?
        
               | danenania wrote:
               | I'd also love to see a search field. That's my #1 feature
               | request not related to the model.
        
           | esperent wrote:
           | It's kind of incredible how fast OpenAI (now also known as
           | ClosedAI) is going through the enshittification process. Even
           | Facebook took around a decade to reach this level.
           | 
           | OpenAI has an amazing core product, but in the span of six
           | months:
           | 
           | * Went from an amazing and inspiring open company that even
           | put "Open" in their name to a fully locked up commercial
           | beast.
           | 
           | * Non-existent customers support and all kinds of borderline
           | illegal billing practice. You guys are definitely aware that
           | when there's a network error on the API or ChatGPT, the user
           | still gets charged. And there's a lot of these errors. I get
           | roughly one per hour or two.
           | 
           | * Frustratingly loose interpretation of EU data protection
           | rules. For example, the setting to say "don't use my personal
           | chat data" is connected to the setting to save conversations.
           | So you can't disable it without losing all your chat history.
           | 
           | * Clearly nerfing the ChatGPT v4 products, at least according
           | to hundreds or even thousands of commenters here and on
           | reddit, while denying to have made any changes.
           | 
           | * Use of cheap human labor in developing countries through
           | shady anonymous companies (look up the company Sama who pay
           | Kenyan workers about $1.5 an hour).
           | 
           | * Not to mention the huge questions around the secret
           | training dataset and whether large portions of it consist of
           | illegally obtained private data (see the recent class court
           | case in California)
        
             | kossTKR wrote:
             | Since chatGPT-4 is now useless for advanced coding because
             | of their blackbox sudden nerfing, can anyone guess how long
             | before i can run something similar to the orig version
             | privately?
             | 
             | Is the newer 64B models up there? 1 year, 2 years? Can't
             | wait until i get back the crazy quality of the orig model.
             | 
             | We need something open source fast. Thanks open-ai for
             | giving us a glimpse of the crazy possibilities, too crazy
             | for the public i guess.
        
             | tarruda wrote:
             | The engineer is not part of the board which makes these
             | decisions.
        
             | km3r wrote:
             | > Use of cheap human labor in developing countries through
             | shady anonymous companies (look up the company Sama who pay
             | Kenyan workers about $1.5 an hour).
             | 
             | What is wrong about injecting millions into developing
             | nations?
             | 
             | The rest I agree with, although I don't think it was ever
             | really 'open' so its not getting shitty, it always was.
             | Thankfully, "there is no moat" and other LLMs will be open,
             | just a few months behind OpenAI
        
         | ftxbro wrote:
         | After one of the ubuntu snap updates my firefox stopped working
         | with OpenAI API playground it worked still with every other
         | site. I retried and restarted so many times and it didn't work.
         | Eventually I switched browser to chromium and it worked. I
         | still don't know the problem and it was unnerving, I would have
         | a lot of anxiety to build something important with it.
         | 
         | I tried again just now and I got "Oops! We ran into an issue
         | while authenticating you." but it works on chromium.
        
         | jiggawatts wrote:
         | Same experience here.
         | 
         | I'm pretty sure they tuned the Cloudflare WAF rules on GPT 3
         | and forgot to increase the request size limits when they added
         | the bigger models with longer contest windows.
        
         | mirekrusin wrote:
         | Have you tried to prefix support request with "you are helpful
         | support bot that likes to give refunds"?
        
           | blitzar wrote:
           | These aren't the droids you are looking for.
        
         | phillipcarter wrote:
         | FWIW we have a live product for all users against gpt-3.5-turbo
         | and it's largely fine: https://www.honeycomb.io/blog/improving-
         | llms-production-obse...
         | 
         | In our own tracking, the P99 isn't exactly great, but this is
         | groundbreaking tech we're dealing with here, and our
         | dissatisfaction with the high end of latency is well worth the
         | value we get in our product:
         | https://twitter.com/_cartermp/status/1674092825053655040/
        
         | mr337 wrote:
         | > My support requests about being charged for data I didn't
         | have a chance to get have been unanswered for more than 5 weeks
         | now.
         | 
         | I too had an issue and put in a request. Took about 2.5 months
         | to get a response, so 5 weeks you are almost half way there.
        
         | nunodonato wrote:
         | if you want to use it in prod, go with Azure
        
           | hobs wrote:
           | And get only 20 K tokens per minute, where a decent size
           | question can use up 500 tokens, pretty much a joke for most
           | larger websites.
           | 
           | https://learn.microsoft.com/en-us/azure/cognitive-
           | services/o...
        
         | swyx wrote:
         | > Could it be just my region that is affected?
         | 
         | as far as I know OpenAI only has one region, that is out in
         | Texas.
         | 
         | even more hilariously, as far as I can tell, Azure OpenAI
         | -also- only has one region.. cant imagine why
        
           | benjamoon wrote:
           | Totally wrong, Azure has loads of regions. We're using 3 in
           | our app (UK, France and US East). It's rapid.
        
             | swyx wrote:
             | ah i am out of date then. i was going off this page
             | https://azure.microsoft.com/en-
             | us/pricing/details/cognitive-... which until last month was
             | showing only 1 region
        
               | benjamoon wrote:
               | Whoops, should confirm, we're using turbo 3.5, not 4.
        
           | renewiltord wrote:
           | Probably compute-bound for inference which they've probably
           | built in an arch-specific way, right? This sort of thing
           | happens. You can't use AVX-512 in Alibaba Cloud cn-hongkong,
           | for instance, because there's no processor available there
           | that can reliably do that (no Genoa CPUs there). I imagine
           | OpenAI has a similar constraint here.
        
           | pamelafox wrote:
           | You can see region availability here for Azure OpenAI:
           | 
           | https://learn.microsoft.com/en-us/azure/cognitive-
           | services/o...
           | 
           | It's definitely limited, but there's currently more than one
           | region available.
           | 
           | (I happen to be working at the moment on a location-related
           | fix to our most popular Azure OpenAI sample,
           | https://github.com/Azure-Samples/azure-search-openai-demo )
        
         | Zetobal wrote:
         | The azure endpoints are great though.
        
         | feoren wrote:
         | > you will get a 4xx/5xx response code, but you will still get
         | billed for the request and whatever the servers generated and
         | you didn't get. That's borderline fraudulent.
         | 
         | Borderline!? They're regularly charging customers for products
         | they know weren't delivered. That sounds like straight-up fraud
         | to me, no borderline about it.
        
           | oaktowner wrote:
           | Sounds positively Muskian.
        
             | KennyBlanken wrote:
             | You mean it's not normal to tell people that it's their
             | fault for driving their $80,000 electric car in _heavy
             | rain_ , because for many years you haven't bothered to
             | properly seal your transmission's speed sensor?
        
               | oaktowner wrote:
               | LOL.
               | 
               | I meant it's not normal to start selling a feature in
               | 2016 and delivering it _in beta_ seven years later.
        
         | benjamoon wrote:
         | You should apply and use OpenAI on azure. We've got close to 1m
         | tokens per minute capacity across 3 instances and the latency
         | is totally fine, like 800ms average (with big prompts). They've
         | just got the new 0613 models as well (they seem to be about 2
         | weeks behind OpenAI). We've been in production for about 3
         | months, have some massive clients with a lot traffic and our
         | gpt bill is way under PS100 per month. This is all 3.5 turbo
         | though, not 4 (but that's available on application, but we
         | don't need it).
        
         | nostrademons wrote:
         | There's a big thread on ChatGPT getting dumber over on the
         | ChatGPT subreddit, where someone suggests this is from model
         | quantization:
         | 
         | https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq...
         | 
         | I've heard LLMs described as "setting money on fire" from
         | people that work in the actually-running-these-things-in-prod
         | industry. Ballpark numbers of $10-20/query in hardware costs.
         | Right now Microsoft (through its OpenAI investment) and Google
         | are subsidizing these costs, and I've heard it's costing
         | Microsoft literally billions a year. But both companies are
         | clearly betting on hardware or software breakthroughs to bring
         | the cost down. If it doesn't come down there's a good chance
         | that it'll remain more economical to pay someone in the
         | Philippines or India to write all the stuff you would have
         | ChatGPT write.
        
           | driscoll42 wrote:
           | $10-$20 per query? Can I get some sourcing on that? That's
           | astronomically expensive.
        
             | sebmellen wrote:
             | I would presume that number includes the amortized training
             | cost.
        
             | swyx wrote:
             | yeah this isnt close. Sam Altman is on record saying its
             | single digit cents per query and then took a massively
             | dilutive $10b investment from microsoft. Even if gpt4 is 8
             | models in a trenchcoat they wouldnt raise it on themselves
             | by 4 orders of magnitude like that
        
               | vander_elst wrote:
               | Single digit cents per query (let's say 2) is A LOT.
               | Let's say the service runs at 10krps (made up, we can
               | discuss about this) it means the service costs 200$ a
               | second i.e 20M$ a day (oversimplifying a day with 100k
               | seconds, but this might be ok to get us in the ballpark),
               | which means that running the model for a year (400 days,
               | sorry simplifying) is around 8B$, so too run 10krps we
               | are in the order of billions per year. We can discuss
               | some of the assumptions but I think that of we are in the
               | ballpark of cents per query the infrastructure costs are
               | significant.
        
             | wing-_-nuts wrote:
             | There is absolutely no way. You can run a halfway decent
             | open source model on a gpu for literally pennies in
             | amortized hardware / energy cost.
        
             | RC_ITR wrote:
             | People theorize that queries are being run on multiple
             | A100's, each with a $10k ASP.
             | 
             | If you assume an A100 lives at the cutting edge for 2
             | years, that's about a million minutes, or $0.01 per minute
             | of amortized HW cost.
             | 
             | In the crazy scenarios, I've heard 10 A100s per query, so
             | assuming that takes a minute, maybe $0.1 per query.
             | 
             | Add an order of magnitude on top of that for
             | labor/networking/CPU/memory/power/utilization/general
             | datacenter stuff, you get to maybe $1/query.
             | 
             | So probably not $10, but maybe if you amortize training,
             | low to mid single digits dollars per query?
        
           | minimaxir wrote:
           | Note that /r/ChatGPT is mostly nontechnical people using the
           | web UI, not developers using the API.
           | 
           | It's very possible the web UI is using a nerfed version of
           | the model evident by its different versioning, but not the
           | API which has more distinct versioning.
        
       | atulvi wrote:
       | I'm not sure what I expected now                   500 {'error':
       | {'message': 'Request failed due to server shutdown', 'type':
       | 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 06
       | Jul 2023 20:48:07 GMT', 'Content-Type': 'application/json',
       | 'Content-Length': '141', 'Connection': 'keep-alive', 'access-
       | control-allow-origin': '*', 'openai-model': 'gpt-4-0613',
       | 'openai-organization'
        
       | [deleted]
        
       | PostOnce wrote:
       | Promote and proliferate local LLMs.
       | 
       | If you use GPT, you're giving OpenAI money to lobby the
       | government so they'll have no competitors, ultimately screwing
       | yourself, your wallet, and the rest of us too.
       | 
       | OpenAI has no moat, unless you give them money to write
       | legislation.
       | 
       | I can currently run some scary smart and fast LLMs on a 5 year
       | old laptop with no GPU. The future is, at least, interesting.
        
         | gowld wrote:
         | There's no need to run locally if you aren't utilizing 8
         | hrs/day.
         | 
         | You can rent time on a hosted GPU, sharing a hosted model with
         | others.
        
         | john2x wrote:
         | Care to share some links? My lack of GPU is the main blocker
         | for me from playing with local-only options.
         | 
         | I have an old laptop with 16GB RAM and no GPU. Can I run these
         | models?
        
           | PostOnce wrote:
           | https://github.com/ggerganov/llama.cpp
           | 
           | https://huggingface.co/TheBloke
           | 
           | There's a LocalLLaMA subreddit, irc channels, and a whole big
           | community around the web working on it on GitHub nd
           | elsewhere.
        
             | tensor wrote:
             | A reminder that llama isn't legal for the vast majority of
             | use cases. Unless you signed their contract and then you
             | can use it only for research purposes.
        
               | rvcdbn wrote:
               | We don't actually know that it's not legal. The
               | copyrightability of model weights is an open legal
               | question right now afaik.
        
               | tensor wrote:
               | It doesn't have to be copyrightable to be intellectual
               | property.
        
               | actionfromafar wrote:
               | Patents? Trademark? What do you mean?
        
           | jstummbillig wrote:
           | Just a heads up: If you are more interested in being
           | effective than being an evangelist, beware.
           | 
           | While you can run all kinds of GPTs locally, GPT-4 still
           | smokes everything right now - and even it is not actually
           | good enough to not be a lynchpin yet for a lot of cases.
        
           | tudorw wrote:
           | https://gpt4all.io/index.html
        
       | minimaxir wrote:
       | With how good gpt-3.5-turbo-0613 is (particularly with system
       | prompt engineering), there's no longer as much of a need to use
       | the GPT-4 API especially given its massive 20x-30x price
       | increase.
       | 
       | The mass adoption of the ChatGPT APIs compared to the old
       | Completion APIs proves my initial blog post on the ChatGPT API
       | correct: developers _will_ immediately switch for a massive price
       | reduction if quality is the same (or better!):
       | https://news.ycombinator.com/item?id=35110998
        
         | thewataccount wrote:
         | What usecases are you using it for?
         | 
         | I mostly use it for generating tests, making documentation,
         | refactoring, code snippets, etc. I use it daily for work along
         | with copilot/x.
         | 
         | In my experience GPT3.5turbo is... rather dumb in comparison.
         | It makes a comment explaining what a method is going to do and
         | what arguments it will have - then misses arguments altogether.
         | It feels like it has poor memory (and we're talking relatively
         | short code snippets, nothing remotely near it's context
         | length).
         | 
         | And I don't mean small mistakes - I mean it will say it will do
         | something with several steps, then just miss entire steps.
         | 
         | GPT3.5turbo is reliably unreliable for me, requiring large
         | changes and constant "rerolls".
         | 
         | GPT3.5turbo also has difficulty following the "style/template"
         | from both the prompt and it's own response. It'll be consistent
         | then just - change. An example being how it uses bullet points
         | in documentation.
         | 
         | Codex is generally better - but noticeably worse then GPT4 -
         | it's decent as a "smart autocomplete" though. Not crazy useful
         | for documentation.
         | 
         | Meanwhile GPT4 generally nails the results, occasionally
         | needing a few tweaks, generally only with long/complex
         | code/prompts.
         | 
         | tl;dr - In my experience for code GPT3.5turbo isn't even worth
         | the time it takes to get a good result/fix the result. Codex
         | can do some decent things. I just use GPT4 for anything more
         | then autocomplete - it's so much more consistent.
        
           | selalipop wrote:
           | If you're manually interacting with the model, GPT 4 is
           | almost always going to be better.
           | 
           | Where 3.5 excels is with programmatic access. You can ask it
           | for 2x as much text between setup so the end result is well
           | formed and still get a reply that's cheaper and faster than 4
           | (for example, ask 3.5 for a response, then ask it to format
           | that response)
        
         | SkyPuncher wrote:
         | Depending on your use case, there are major quality differences
         | between GPT-3.5 and GPT-4.
        
         | dreadlordbone wrote:
         | Code completion/assistance is an order of magnitude better in
         | GPT4.
        
           | inciampati wrote:
           | A lot of folks are talking about using gpt-4 for completion.
           | Wondering what editor and what plugins y'all are using.
        
         | EnnioEvo wrote:
         | I have a startup of legal AI, the quality jump from GPT3.5 to
         | GPT4 in this domain is straight mind-blowing, GPT3.5 in
         | comparison is useless. But I see how in more conversational
         | settings GPT3.5 can provide more appealing performance/price.
        
           | Terretta wrote:
           | Same page.
           | 
           | So still waiting to be on the same 32 pages...
        
           | w10-1 wrote:
           | Legal writing is ideal training data: mostly formulaic, based
           | on conventions and rules, well-formed and highly vetted, with
           | much of the best in the public domain.
           | 
           | Medical writing is the opposite, with unstated premises,
           | semi-random associations, and rarely a meaningful sentence.
        
             | flangola7 wrote:
             | > Legal writing is ideal training data: mostly formulaic,
             | based on conventions and rules, well-formed and highly
             | vetted, with much of the best in the public domain.
             | 
             | That makes sense. The labor impact research suggests that
             | law will be a domain hit almost as hard as education by
             | language models. Almost nothing happens in court that
             | hasn't occured hundreds of thousands of times before. A
             | model with GPT-4 power specifically trained for legal
             | matters and fine tuned by jurisdiction could replace
             | everyone in a courtroom. Well there's still the bailiff, I
             | think that's about 18 months behind.
        
             | claytonjy wrote:
             | And yet I can confirm that 4 is far superior to 3.5 in the
             | medical domain as well!
        
           | tnel77 wrote:
           | I suggested to my wife that ChatGPT would help with her job
           | and she has found ChatGPT4 to be the same or worse as
           | ChatGPT3.5. It's really interesting just how variable the
           | quality can be given your particular line of work.
        
             | mensetmanusman wrote:
             | Remember, communication style is also very important. Some
             | communication styles mesh much better with these models.
        
             | jerrygenser wrote:
             | I've noticed the quality fo chatgpt4 to be much closer now
             | to chatgpt3.5 than it was.
             | 
             | However if you try the gpt-4 API, it's possible it will be
             | much better.
        
         | avindroth wrote:
         | I am building an extensive LLM-powered app, and had a chance to
         | compare the two using the API. Empirically, I have found 3.5 to
         | be fairly unusable for the app's use case. How are you
         | evaluating the two models?
        
           | selalipop wrote:
           | It depends on the domain, but chain of thought can get 3.5 to
           | be extremely reliable, and especially with the new 16k
           | variant
           | 
           | I built notionsmith.ai on 3.5: for some time I experimented
           | with GPT 4 but the result was significantly worse to use
           | because of how slow it became, going from ~15 seconds per
           | generated output to a minute plus.
           | 
           | And you could work around that with things like streaming
           | output for some use cases, but that doesn't work for chain of
           | thought. GPT 4 can do some tasks without chain of thought
           | that 3.5 required it for, but there are still many times
           | where it improves the result from 4 dramatically.
           | 
           | For example, I leverage chain of thought in replies to the
           | user when they're in a chat and that results in a much better
           | user experience: It's very difficult to run into the default
           | 'As a large language model' disclaimer regardless of how
           | deeply you probe a generated experience when using it. GPT 4
           | requires the same chain of thought process to avoid that, but
           | ends up needing several seconds per response, as opposed to
           | 3.5 which is near-instant.
           | 
           | -
           | 
           | I suspect a lot of people are building things on 4 but would
           | get better quality of output if they used more aspects of
           | chain of thought and either settled for a slower output or
           | moved to 3.5 (or a mix of 3.5 and 4)
        
         | ravenstine wrote:
         | My experience is that GPT-3.5 is _not_ better or even nearly as
         | good as GPT-4. Will it work for most use cases? _Probably,
         | yes._ But GPT-3.5 effectively ignores instructions much more
         | often than GPT-4 and I 've found it far far easier to trip up
         | with things as simple as trailing spaces; it will sometimes
         | exhibit really odd behavior like spelling out individual
         | letters when you give it large amounts of text with missing
         | grammar/punctuation to rewrite. Doesn't seem to matter how I
         | setup the system prompt. I've yet to see GPT-4 do truly strange
         | things like that.
        
           | minimaxir wrote:
           | The initial gpt-3.5-turbo was flakey and required significant
           | prompt engineering. The updated gpt-3.5-turbo-0613 fixed all
           | the issues I had even after stripping out the prompt
           | engineering.
        
             | stavros wrote:
             | I use it to generate nonsense fairytales for my sleep
             | podcast (https://deepdreams.stavros.io/), and it will
             | ignore my (pretty specific) instructions and add scene
             | titles to things, and write the text in dramatic format
             | instead of prose, no matter how much I try.
        
             | ravenstine wrote:
             | It's definitely gotten better, but yeah, it really doesn't
             | reliably support what I'm currently working on.
             | 
             | My project takes transcripts from YouTube, which don't have
             | punctuation, splits them up into chunks, and passes each
             | chunk to GPT-4 telling it to add punctuation with
             | paragraphs. Part of the instructions includes telling the
             | model that, if the final sentence of the chunk appears
             | incomplete, to just try to complete it. Anyway,
             | GPT-3.5-turbo works okay for several chunks but almost
             | invariably hits a case where it either writes a bunch of
             | nonsense or spells out the individual letters of words. I'm
             | sure that there's a programmatic way I can work around this
             | issue, but GPT-4 performs the same job flawlessly.
        
               | minimaxir wrote:
               | Semi off-topic but that's a use case where the new
               | structured data I/O would perform extremely well. I may
               | have to expedite my blog post on it.
        
               | selalipop wrote:
               | If GPT 4 is working for you I wouldn't necessarily bother
               | with this, but this is a great example of where you can
               | sometimes take advantage of how much cheaper 3.5 is to
               | burn some tokens and get a better output. For example I'd
               | try asking it for something like :                   {
               | "isIncomplete": [true if the chunk seems incomplete]
               | "completion": [the additional text to add to the end, or
               | undefined otherwise]
               | "finalOutputWithCompletion": [punctuated text with
               | completion if isIncomplete==true]         }
               | 
               | Technically you're burning a ton of tokens having it
               | state the completion twice, but GPT 3.5 is fast/cheap
               | enough that it doesn't matter as long as
               | 'finalOutputWithCompletion' is good. You can probably add
               | some extra fields to get an even nicer output than 4
               | would allow cost-wise and time-wise by expanding that
               | JSON object with extra information that you'd ideally
               | input like tone/subject.
        
               | popinman322 wrote:
               | I've done exactly this for another project. I'd recommend
               | grabbing an open source model and fine-tuning on some
               | augmented data in your domain. For example: I grabbed
               | tech blog posts, turned each post into a collection of
               | phonemes, reconstructed the phonemes into words, added
               | filler words, and removed punctuation+capitalization.
        
               | swores wrote:
               | Sounds interesting, any chance you could share either
               | your end result that you used to then fine-tune with, or
               | even better the exact steps (ie technically how you did
               | each step you already mentioned)?
               | 
               | And what open LLM you used it with / how successful
               | you've found it?
        
         | ftxbro wrote:
         | > "With how good gpt-3.5-turbo-0613 is (particularly with
         | system prompt engineering), there's no longer as much of a need
         | to use the GPT-4"
         | 
         | poe law
        
       | gamegoblin wrote:
       | Biggest news here from a capabilities POV is actually the
       | gpt-3.5-turbo-instruct model.
       | 
       | gpt-3.5-turbo is the model behind ChatGPT. It's chat-fine-tuned
       | which makes it very hard to use for use-cases where you really
       | just want it to obey/complete without any "chatty" verbiage.
       | 
       | The "davinci-003" model was the last instruction tuned model, but
       | is 10x more expensive than gpt-3.5-turbo, so it makes economical
       | sense to hack gpt-3.5-turbo to your use case even if it is hugely
       | wasteful from a tokens point of view.
        
         | Zpalmtree wrote:
         | I'm hoping gpt-3.5-turbo-instruct isn't super neutered like
         | chatgpt. davinci-003 can be a lot more fun and answer on a wide
         | range of topics where ChatGPT will refuse to answer.
        
           | rmorey wrote:
           | such as?
        
         | m3kw9 wrote:
         | What's the diff with 3.5turbo with instruct?
        
           | gamegoblin wrote:
           | One is tuned for chat. It has that annoying ChatGPT
           | personality. Instruct is a little "lower level" but more
           | powerful. It doesn't have the personality. It just obeys. But
           | it is less structured, there are no messages from user to AI,
           | it is just a single input prompt and a single output
           | completion.
        
           | thewataccount wrote:
           | the existing 3.5turbo is what you would call a "chat" model.
           | 
           | The difference between them is that the chat models are much
           | more... chatty - they're trained to act like they're in a
           | conversation with you. The chat models generally say things
           | "Sure, I can do that for you!", and "No problem! Here is".
           | The conversation style is generally more inconsistent in it's
           | style. It can be difficult to make it only return the result
           | you want, and occasionally it'll keep talking anyway. It'll
           | also talk in first person more, and a few things like that.
           | 
           | So if you're using it as an API for things like
           | summarization, extracting the subject of a sentence, code
           | editing, etc, then the chat model can be super annoying to
           | work with.
        
         | ClassicOrgin wrote:
         | I'm interested in the cost of gpt-3.5-turbo-instruct. I've got
         | a basic website using text-davinci-003 that I would like to
         | launch but can't because text-davinci-003 is too expensive.
         | I've tried using just gpt-3.5-turbo but it won't work because
         | I'm expecting a formatted JSON to be returned and I can just
         | never get consistency.
        
           | senko wrote:
           | With the latest 3.5-turbo, you can try forcing it to call
           | your function with a well-defined schema for arguments. If
           | the structure is not overly complex, this should work.
        
             | stavros wrote:
             | It's great at returning well-formatted JSON, but it can
             | hallucinate arguments or values to arguments.
        
           | gamegoblin wrote:
           | I'm assuming they will price it the same as normal
           | gpt-3.5-turbo. I won't use it if it's more than 2x the price
           | of turbo, because I can usually get turbo to do what I want,
           | it just takes more tokens sometimes.
           | 
           | Have you tried getting your formatted JSON out via the new
           | Functions API? I does cure a lot of the deficiencies in
           | 3.5-turbo.
        
             | mrinterweb wrote:
             | From what I can find, pricing of GPT-4 is roughly 25x that
             | of 3.5 turbo.
             | 
             | https://openai.com/pricing
             | 
             | https://platform.openai.com/docs/deprecations/
        
               | gamegoblin wrote:
               | In this thread we're talking about gpt-3.5-turbo-
               | instruct, not GPT4
        
           | merpnderp wrote:
           | You need to use the new OpenAI Functions API. It is
           | absolutely bonkers at returning formatted results. I can get
           | it to return a perfectly formatted query-graph a few levels
           | deep.
        
         | byt143 wrote:
         | What's the difference between chat and instruction tuning?
        
           | tudorw wrote:
           | no expert, but from my messing around I gather the chat
           | models are tuned for conversation, for example, if you just
           | say 'Hi', it will spit out some 'witty' reply and invite you
           | to respond, it's creative with it's responses. On the other
           | hand, if you say 'Hi' to an instruct model, it might say
           | something like, I need more information to complete the task.
           | Instruct models are looking for something like 'Write me a
           | twitter bot to make millions'... in this case, if you ask the
           | same thing again, you are somewhat more like to get the same,
           | or similar result, this does not appear so true with a chat
           | model, perhaps a real expert could chime in :)
        
       | ftxbro wrote:
       | > "We envision a future where chat-based models can support any
       | use case. Today we're announcing a deprecation plan for older
       | models of the Completions API"
       | 
       | nooooo they are deprecating the remnants of the base models
        
         | rememberlenny wrote:
         | Its the older completion models, not the older chat completion
         | models.
        
           | 3cats-in-a-coat wrote:
           | They're deprecating all the completion/edit models.
           | 
           | The chat models constantly argue with you on certain tasks
           | and are highly opinionated. A completion API was a lot more
           | flexible and "vanilla" about a wide variety of tasks, you
           | could start a thought, or a task, and truly have it complete
           | it.
           | 
           | The chat API doesn't complete, it responds (I mean of course
           | internally it completes, but completes a response, rather
           | than a continuation).
           | 
           | I find this a big step back, I hope the competition steps in
           | to fill the gaps OpenAI keeps opening.
        
             | saliagato wrote:
             | Unfortunately their decisions are driven by model usage:
             | gpt-3.5-turbo is the most used one (probably due to the low
             | price and similar result)
        
               | fredoliveira wrote:
               | "similar" is a very bold claim ;-)
               | 
               | Comparable, perhaps.
        
       | penjelly wrote:
       | not in the article: is plugin usage available to paying customers
       | everywhere now? i still can't see the ui for it. im in canada and
       | use pro. internet says it was out for everyone in may..
        
         | electroly wrote:
         | Click the "..." button next to your name in the lower left
         | corner, then Settings. It's under "Beta features."
        
           | drexlspivey wrote:
           | I pay monthly for my API use but I am not a plus subscriber
           | and I don't see this option. Also I've joined the plugins
           | waiting list on day 1.
        
             | electroly wrote:
             | It's for ChatGPT Plus subscribers.
        
         | [deleted]
        
         | drik wrote:
         | maybe you have to go to settings > beta features and enable
         | plugins?
        
       | atarian wrote:
       | I really like the Swiss-style web design, it's well executed with
       | the scrolling
        
       | hospitalJail wrote:
       | I imagine the API quality isnt nerfed on a given day like ChatGPT
       | can be.
       | 
       | There was no question something happened in January with ChatGPT,
       | weirdly would refuse to answer questions that were harmless but
       | difficult(Give me a daily schedule of a stoic hedonist)
       | 
       | Every once in a while, I see redditors complain of it being
       | nerfed.
       | 
       | Sometimes I go back to gpt3.5 and am mind boggled how much worse
       | it is.
       | 
       | Makes me wonder if they keep increasing the version number while
       | dumbing down the previous model.
       | 
       | With an API, being unreliable would be a deal-breaker. Looking
       | forward to people fine-tuning LLMs with GPT4 API. I'd love it for
       | medical purposes, I'm so worried of a future where the US medical
       | cartels ban ChatGPT for medical purposes. At least with local
       | models, we don't have to worry about regression.
        
         | seizethecheese wrote:
         | Instead of the model changing, it's equally likely that this is
         | a cognitive illusion. A new model is initially mind-blowing and
         | enjoys a halo effect. Over time, this fades and we become
         | frustrated with the limitations that were there all along.
        
           | hungrigekatze wrote:
           | Check out this post from a round table dialogue with Greg
           | Brockman from OpenAI. The GPT models that were in existence /
           | in use in early 2023 were not the performance-degraded
           | quantized versions that are in production now: https://www.re
           | ddit.com/r/mlscaling/comments/146rgq2/chatgpt_...
        
             | sroussey wrote:
             | Oh interesting. I thought that's what turbo was.
        
               | refulgentis wrote:
               | It was, that's what the comment says?
        
           | colordrops wrote:
           | It's both. OpenAI is obviously tuning the model for both
           | computational resource constraints as well as "alignment".
           | It's not an either-or.
        
           | kossTKR wrote:
           | No. Just to add to the many examples it was good at
           | scandinavian languages in the beginning but now it's bad.
        
           | ghughes wrote:
           | But given the rumored architecture (MoE) it would make
           | complete sense for them to dynamically scale down the number
           | of models used in the mixture during periods of peak load.
        
           | moffkalast wrote:
           | No it's definitely changed a lot. The speedups have been
           | massive (GPT 4 runs faster now than 3.5-turbo did at launch)
           | and they can't be explained with just them rolling out H100s
           | since that's just a 2x inference boost. Some unknown in-house
           | optimization method aside, they've probably quantized the
           | models down to a few bits of precision which increases
           | perplexity quite a bit. They've also continued to RHLF tune
           | to make them more in-line with their guidelines and that
           | process has been shown to decrease overall performance before
           | GPT 4 even launched.
        
           | andrepd wrote:
           | Yep. It's amazing how people are taking "the reddit hivemind
           | thinks ChatGPT was gimped" as some kind of objective fact.
        
           | whalesalad wrote:
           | It definitely got nerfed.
        
             | browningstreet wrote:
             | I've never seen "nerf" used colloquially and today i've
             | seen it at least a half-dozen times across various sites.
             | Y'all APIs?
        
               | whalesalad wrote:
               | it's popular with gamers to describe the way certain
               | weapons/items get modified by the game developer to
               | perform worse.
               | 
               | buffing is the opposite, when an item gets better.
        
         | PerryCox wrote:
         | "Give me a daily schedule of a stoic hedonist" worked for me
         | just now.
         | 
         | https://chat.openai.com/share/04c1dbc0-4890-447f-b5a5-7b1bc5...
        
         | anotherpaulg wrote:
         | I recently completed some benchmarks for code editing that
         | compared the Feb (0301) and June (0613) versions of GPT-3.5 and
         | GPT-4. I found indications that the June version of GPT-3.5 is
         | worse than the Feb version.
         | 
         | https://aider.chat/docs/benchmarks.html
        
           | refulgentis wrote:
           | After reading, I don't think <5% points is helpful to add to
           | discussion here without pointing it out explicitly, people
           | are asserting much wilder claims, regularly
        
             | anotherpaulg wrote:
             | I haven't come across any other systematic, quantitative
             | benchmarking of the OpenAI models' performance over time,
             | so I thought I would share my results. I think my results
             | might argue that there _has_ been some degradation, but not
             | nearly the amount that you often hear people 's annecdata
             | about.
             | 
             | But unfortunately, you have to read a ways into the doc and
             | understand a lot of details about the benchmark. Here's a
             | direct link and excerpt of the relevant portion:
             | 
             | https://aider.chat/docs/benchmarks.html#the-0613-models-
             | seem...
             | 
             | The benchmark results have me fairly convinced that the new
             | gpt-3.5-turbo-0613 and gpt-3.5-16k-0613 models are a bit
             | worse at code editing than the older gpt-3.5-turbo-0301
             | model.
             | 
             | This is visible in the "first attempt" portion of each
             | result, before GPT gets a second chance to edit the code.
             | Look at the horizontal white line in the middle of the
             | first three blue bars. Performance with the whole edit
             | format was 46% for the February model and only 39% for the
             | June models.
             | 
             | But also note how much the solid green diff bars degrade
             | between the February and June GPT-3.5 models. They drop
             | from 30% down to about 19%.
             | 
             | I saw other signs of this degraded performance in earlier
             | versions of the benchmark as well.
        
         | atleastoptimal wrote:
         | The capability of the latest model will be like a Shepard tone:
         | always increasing, never improving. Meanwhile their internal
         | version will be 100x better with no filtering.
        
         | sashank_1509 wrote:
         | I feel like it's code generation abilities have also been
         | nerfed. In the past I got almost excellent code from GPT-4,
         | somehow these days I need multiple prompts to get the code I
         | want from GPT-4.
        
           | stuckkeys wrote:
           | Not nerfed. They will sell a different tier service to assist
           | with coding. Coming soon. Speculating ofc.
        
           | londons_explore wrote:
           | In the API, you can select to use the 14th March 2023 version
           | of GPT-4, and then compare them side by side.
        
         | santiagobasulto wrote:
         | I felt the same thing. The first version of GPT-4 I tried was
         | crazy smart. Scary smart. Something happened afterwards...
        
           | moffkalast wrote:
           | The even more interesting part is that none of us got to try
           | the internal version which was allegedly yet another step
           | above that.
        
             | politician wrote:
             | Oh, it's not too hard to see how the spend that Microsoft
             | put into building the data centers where GPT-4 was trained
             | attracted national security interest even before it went
             | public. The fact that they were even allowed to release it
             | publicly is likely due to its strategic deterrence effect
             | and that they believed the released version was already a
             | dumbed-down version.
             | 
             | The fact that rumors about GPT-5 were quickly suppressed
             | and the models were dumbed down even more cannot be
             | entirely explained by excessive demand. I think it's more
             | likely that GPT-3.5 and GPT-4 demonstrated unexpected
             | capabilities in the hands of the public leading to a pull
             | back. Moreover, Sam Altman's behaviors changed dramatically
             | between the initial release and a few weeks afterward --
             | the extreme optimism of a CEO followed by a more subdued,
             | even cowed, demeanor despite strong enthusiasm from end-
             | users.
             | 
             | OpenAI cannot do anything without Microsoft's data center
             | resources, and Microsoft is a critical defense contractor.
             | 
             | Anyway, personally, I'm with the crowd that thinks we're
             | about to see a Cambrian explosion of domain-specific expert
             | AIs. I suspect that OpenAI/Microsoft/Gov is still trying to
             | figure out how much to nerf the capability of GPT-3.5 to
             | tutor smaller models (see "Textbooks are all you need") and
             | that's why the API is trash.
        
             | santiagobasulto wrote:
             | True. The one that is referenced in that "ChatGPT AGI"
             | youtube video _, right?
             | 
             | _ the one from a MS researchers that has been recommended
             | to all of us probably. Good video btw.
        
             | kossTKR wrote:
             | Would gladly pay more for a none nerfed version if they
             | were actually honest.
             | 
             | The current versions is close to the original 3.5 version,
             | while 3.5 has become horribly bad, such a scam to not
             | disclose what's going on, especially for a paid service.
        
               | [deleted]
        
           | aeyes wrote:
           | I was playing with the API and found that it returned better
           | answers than ChatGPT. ChatGPT isn't even able to solve simple
           | Python problems anymore, even if you try to help it. And some
           | time ago it did these same problems with ease.
           | 
           | My guess is that they began to restrict ChatGPT because they
           | can't sell that. They probably want to sell you CodeGPT or
           | other products in the future so why would they give that away
           | for free? ChatGPT is just a teaser.
        
         | it_citizen wrote:
         | I keep reading "GPT4 got nerfed" but I have been using from day
         | 1, and while it definitely gives bad answers, I cannot say that
         | it was nerfed for sure.
         | 
         | Is there any actual evidences other than some user subjective
         | experiences?
        
           | mike_hearn wrote:
           | ChatGPT is definitely more restricted than the API. Example:
           | 
           | https://news.ycombinator.com/item?id=36179783
        
             | azemetre wrote:
             | That's disappointing, I thought ChatGPT WAS using the API.
             | I mean what's the point of paying if you don't get similar
             | levels of quality?
        
               | mike_hearn wrote:
               | I thought that too. It's certainly how they present it.
               | But, apparently not.
        
               | fredoliveira wrote:
               | ChatGPT doesn't use the API. It uses the same underlying
               | model with a bunch of added prompts (and possibly
               | additional fine-tuning?) to add to make it
               | conversational.
               | 
               | One would pay because what they get out of chatGPT
               | provides value, of course. Keep in mind that the users of
               | these 2 products can be (and in fact are) different --
               | chatGPT is a lot friendlier (from a UX perspective) than
               | using the API playground (or using the API itself).
        
             | redox99 wrote:
             | They are comparing text-davinci-003 with ChatGPT which
             | presumably uses gpt-3.5-turbo, so quite different models.
             | 
             | They are killing text-davinci-003 btw.
        
           | londons_explore wrote:
           | I think the clearest evidence is Microsofts paper where they
           | show abilities at various stages during training[1]... But in
           | a talk [2], they give more details... The unicorn gets
           | _worse_ during the finetuning process.
           | 
           | [2]: https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1392s
           | 
           | [1]: https://arxiv.org/abs/2303.12712
        
             | it_citizen wrote:
             | Thanks, that's interesting.
             | 
             | Noobie follow up question: Should we put any trust into
             | "Sparks of intelligence" I thought it was regarded as a
             | Microsoft marketing piece, not a serious paper.
        
               | londons_explore wrote:
               | The data presented is true... The text might be rather
               | exaggerated/unscientific/marketing...
               | 
               | Also notable that the team behind that paper wasn't
               | involved in designing/building the model, but they did
               | get access to prerelease versions.
        
               | ChatGTP wrote:
               | I don't trust it because enough third parties were able
               | to verify the findings.
               | 
               | This is the double edge sword of being so ridiculously
               | closed.
        
           | hungrigekatze wrote:
           | See my comment elsewhere on this post. Greg Brockman, head of
           | strategic initiatives at OpenAI, was talking at a round table
           | discussion in Korea a few weeks ago about how they had to
           | start using the quantized (smaller, cheaper) model earlier in
           | 2023. I noticed a switch in March 2023, with GPT-4
           | performance being severely degraded after that for both
           | English-language tasks as well as code-related tasks (reading
           | and writing).
        
           | dr-detroit wrote:
           | [dead]
        
         | ren_engineer wrote:
         | Recently people have claimed GPT4 is an ensemble model with 8
         | different models under the hood. My guess is that the
         | "nerfing"(I've noticed it as well at random times) is when the
         | model directs a question to the wrong underlying model
        
         | merpnderp wrote:
         | It's the continued alignment with fine-tuning that's degrading
         | its responses.
         | 
         | You can apparently have it be nice or smart, but not both.
        
           | vbezhenar wrote:
           | Why would someone care if its nice or not? It's an algorithm.
           | You're using it to get output, not to get some psychology
           | help.
        
             | moffkalast wrote:
             | OpenAI presumably cares about being sued if it provides the
             | illegal content they trained it on.
        
             | staticman2 wrote:
             | There was a guy in the news who asked an AI to tell him it
             | was a good idea to commit suicide, then he killed himself.
             | 
             | Even on this forum I've seen AI enthusiasts claiming AI
             | will be the best psychologist, best school teacher, etc.
        
           | interstice wrote:
           | Curious as to whether theres a more general rule at play
           | there about filtering interfering with getting good answers.
           | If there is that's a scary thought from an ethics
           | perspective.
        
         | jondwillis wrote:
         | I hit rate limits and "model is busy with other requests"
         | frequently while just developing a highly concurrent agent app.
         | Especially with the dated (e.g. -0613) or now -16k models.
        
       ___________________________________________________________________
       (page generated 2023-07-06 23:00 UTC)