[HN Gopher] New models and developer products ___________________________________________________________________ New models and developer products Author : kevin_hu Score : 312 points Date : 2023-11-06 18:17 UTC (2 hours ago) (HTM) web link (openai.com) (TXT) w3m dump (openai.com) | alach11 wrote: | There are a lot of huge announcements here. But in particular, | I'm excited by the Assistants API. It abstracts away so many of | the routine boilerplate parts of developing applications on the | platform. | gregorym wrote: | how so? | simonw wrote: | The new assistants API looks both super-cool and (unfortunately) | a recipe for all kinds of new applications that are vulnerable to | prompt injection. | burcs wrote: | Do you see a way around prompt injection? It feels like any | feature they release is going to be susceptible to it. | minimaxir wrote: | I suspect OpenAI's black box workflow has some safeguards for | it. | sillysaurusx wrote: | Still, safeguards are quite a lot less safe than if | statements. We live in interesting times. | | I don't think there's any way to guarantee safety from | prompt injection. The most you can do is make a | probabilistic argument. Which is fine; there are plenty of | those, and we rely on them in the sciences. But it'll be | difficult to quantify. | | CS majors will find it pretty alien. The blockchain was one | of the few probabilistic arguments we use, and it's | precisely quantifiable. This one will probably be empirical | rather than theoretical. | bluecrab wrote: | Use an llm to evaluate the input and categorise it. | alexander2002 wrote: | With great power comes great responsibility! | minimaxir wrote: | Most of the products announced (and the price cuts) appear to be | more about increasing lock-in to the OpenAI API platform, which | is not surprising given increased competition in the space. The | GPTs/GPT Agents and Assistants demos in particular showed that | they are a black box within a black box within a black box that | you can't port anywhere else. | | I'm mixed on the presentation and will need to read the fine | print on the API docs on all of these things, which have been | updated just now: https://platform.openai.com/docs/api-reference | | The pricing page has now updated as well: | https://openai.com/pricing | | Notably, the DALL-E 3 API is $0.04 _per image_ which is an order | of magnitude above everyone else in the space. | | EDIT: One interesting observation with the new OpenAI pricing | structure not mentioned during the keynote: finetuned ChatGPT 3.5 | is now 3x of the cost of the base ChatGPT 3.5, down from 8x the | cost. That makes finetuning a more compelling option. | visarga wrote: | Mistral + 2 weeks of work from the community. Not as good, but | private and free. It will trail OpenAI by 6-12 months in | capabilities. | coder543 wrote: | OpenAI offering 128k context is very appealing, however. | | I tried some Mistral variants with larger context windows, | and had very poor results... the model would often offer | either an empty completion or a nonsensical completion, even | though the content fit comfortably within the context window, | and I was placing a direct question either at the beginning | or end, and either with or without an explanation of the task | and the content. Large contexts just felt broken. There are | so many ways that we are more than "two weeks" from the open | source solutions matching what OpenAI offers. | | And that's to say nothing of how far behind these smaller | models are in terms of accuracy or instruction following. | | For now, 6-12 months behind also isn't good enough. In the | uncertain case that this stays true, then a year from now the | open models could be perfectly adequate for many use cases... | but it's very hard to predict the progression of these | technologies. | pclmulqdq wrote: | Comparing a 7B parameter model to a 1.8T parameter model is | kind of silly. Of course it's behind on accuracy, but it | also takes 1% of the resources. | coder543 wrote: | The person I replied to had decided to compare Mistral to | what was launched, so I went along with their comparison | and showed how I have been unsatisfied with it. But, | these open models can certainly be fun to play with. | | Regardless, where did you find 1.8T for GPT-4 Turbo? The | Turbo model is the one with the 128K context size, and | the Turbo models tend to have a much lower parameter | count from what people can tell. Nobody outside of OpenAI | even knows how many parameters regular GPT-4 has. 1.8T is | one of several guesses I have seen people make, but the | guesses vary significantly. | | I'm also not convinced that parameter counts are | everything, as your comment clearly implies, or that | chinchilla scaling is fully understood. More research | seems required to find the right balance: | https://espadrine.github.io/blog/posts/chinchilla-s- | death.ht... | danielmarkbruce wrote: | It's an order of magnitude comparison. | | Let's just agree it's 100x-300x more parameters, and | let's assume the open ai folks are pretty smart and have | a sense for the optimal number of tokens to train on. | razodactyl wrote: | This definitely. Andrej Karpathy himself mentions tuned | weight initialisation in one of his lectures. The TinyGPT | code he wrote goes through it. | | Additionally explanations for the raw mathematics of log | likelihoods and their loss ballparks. | | Interesting low-level stuff. These researchers are the | best of the best working for the company that can afford | them working on the best models available. | razodactyl wrote: | Nah, it's training quality and context saturation. | | Grab an 8K context model, tweak some internals and try to | pass 32K context into it - it's still an 8K model and | will go glitchy beyond 8K unless it's trained at higher | context lengths. | | Anthropic for example talk about the model's ability to | spot words in the entire Great Gatsby novel loaded into | context. It's a hint to how the model is trained. | | Parameter counts are a unified metric, what seems to be | important is embedding dimensionality to transfer | information through the layers - and the layers | themselves to both store and process the nuance of | information. | spankalee wrote: | A friend of mine is building Zep (https://www.getzep.com/), | which seems to offer a lot of the Assistant + Retrieval | functionality in a self-hostable and model-agnostic way. That | type of project may the way around lock-in. | davidbarker wrote: | Also, DALL*E 3 "HD" is double the price at $0.08. I'm curious | to play around with it once the API changes go live later | today. | | The docs say: | | > By default, images are generated at standard quality, but | when using DALL*E 3 you can set quality: "hd" for enhanced | detail. Square, standard quality images are the fastest to | generate. | | https://platform.openai.com/docs/guides/images/usage | faeriechangling wrote: | It's a good strategy. For me, avoiding the moat means either a | big drop in quality and just ending up in somebody elses moat, | or a big drop in quality and a lot more money spent. I've | looked into it and maybe the most practical end-to-end system | for owning my own LLM is to run a couple of 3090s on a consumer | motherboard at substantial running cost to keep them up 24/7 | and that's not powerful enough to cut it and rather expensive | simultaniously. For a bit more expense, you can get more | quality and lower running costs and much slower processing from | buying a 128gb/192gb apple silicon setup and that's much much | much slower than the "Turbo" services that OpenAI offers. | | I think the biggest thing pushing me away from OpenAI was they | were subsidizing the chat experience much more than the API and | this seems to reconcile that quite a bit. Quite simply OpenAI | is sweetening the pot here too much for me to really ignore, | this is a massively subsdizised service. I honestly don't feel | the switching costs in the future will outweigh the benefits | I'm getting now. | ebiester wrote: | I don't understand the lock-in argument here. Yes, if a | competitor comes in there will be switching cost as everything | is re-learned. However, from a code perspective, it is a | function of the key and a relatively small API. New regulations | outstanding, what is stoping someone from moving from OpenAI to | Anthropic (for example) other than the cost of learning how to | effectively utilize Anthropic for your use case? | | OpenAI doesn't have some sort of egress feed for your database. | pclmulqdq wrote: | I sometimes wonder how much OpenAI pays for people to post | arguments about how great they are on HN, because it looks | like you are pretty much right. There isn't a ton about | OpenAI that is actually sticky. | minimaxir wrote: | I most definitely am not paid by OpenAI and am very | confused how my original (critical) comment could be seen | as astroturfing. | airstrike wrote: | _> Please don 't post insinuations about astroturfing, | shilling, brigading, foreign agents, and the like. It | degrades discussion and is usually mistaken. If you're | worried about abuse, email hn@ycombinator.com and we'll | look at the data._ | | https://news.ycombinator.com/newsguidelines.html | minimaxir wrote: | > OpenAI doesn't have some sort of egress feed for your | database. | | That's what they're trying to incentivize, especically with | being able to upload files for their own implementation of | RAG. You're not getting the vector representation of those | files back, and switching to another provider will require | rebuilding and testing that infrastructure. | vsareto wrote: | >The GPTs/GPT Agents and Assistants demos in particular showed | that they are a black box within a black box within a black box | that you can't port anywhere else. | | This just rings hollow to me. We lost the fights for database | portability, cloud portability, payments/billing portability, | and other individual SaaS lock-in. I don't see why it'll be | different this time around. | activescott wrote: | I think it's more about finding places to add value than "lock | in" per se. It seems they're adding value with improved | developer experience and cost/performance rather than on the | models themselves. Not necessarily nefarious attempts to lock | in customers, but it may have the same outcome :) | crakenzak wrote: | The 128k context window GPT-4 Turbo model looks unreal. Seems | like Anthropic's day of reckoning is here? | infecto wrote: | Anthropic never even had a day. I said this before in another | Anthropic thread but I signed up 6 months ago for API access | and they never responded. An employee in that thread apologized | and said to try again, did it, week later still nothing. As far | as commercial viability, they never had it. | QkPrsMizkYvt wrote: | same here. I wonder why they are not opening it up to more | devs. Seems strange. | freedomben wrote: | Purely a guess, but having tried to scale services to new | customers, it can be a lot harder than it seems, especially | if you have to customize anything. Early on, doing a | generic one-size-fits-all can be really, really hard, and | acquiring those early big customers is important to | survival and often requires customizations. | og_kalu wrote: | Yeah i know this wasn't the case for everyone but i got gpt-4 | access back in march the next day. Tried Claude and still | waiting. Oh well lol. | taf2 wrote: | I got access to Claude 2 - it's really good and have been | chatting with their sales team. Seems they were reasonably | responsive- but overall with OpenAI 128k context and price | anthropic has no edge | bluecrab wrote: | They can't even compete with open source since multiple | platforms have apis available. | a_wild_dandan wrote: | Anthropic's $20 billion valuation is buck wild, especially to | those who've used their "flagship" model. The thing is | insufferable. David Shapiro sums it up nicely.[1] Fighting | tools is horrendous enough. Those tools also deceiving and | lecturing you regarding benign topics is inexcusable. I suspect | that this behavior is a side-effect of Anthropic's fetishistic | AI safety obsession. I further suspect that the more one brain | washes their agent into behaving "acceptably", the more it'll | backfire with erratic and useless behavior. Just like with | humans, the antidote to harmful action is _more_ free thought | and education, not less. Punishment methods rooted in fear and | insecurity will result in fearful and insecure AI (i.e | ironically creating the _worst_ outcome we 're all trying to | avoid). | | [1] https://www.youtube.com/watch?v=PgwpqjiKkoY | machdiamonds wrote: | Anthropic doesn't care about consumer products. Their CEO | believes that the company with the best LLM by 2026 will be too | far ahead for anyone else to catch up. | topicseed wrote: | 128,000 token context, Assistants API, JSON mode, April 2023 | knowledge cutoff, GPT 4 Turbo, lower pricing, custom GPTs, a good | bunch of announcements all-round! | | https://openai.com/pricing | TIPSIO wrote: | That map/travel demo was insane. Trying to find the demo again. | topicseed wrote: | It was but most of that functionality was within the "function | calling", not really within the assistant as a top 10 of Paris | sights isn't really that crazy. Plotting these on a map is the | key part which is still your own code, not GPT-based. | rictic wrote: | Turning an airline receipt pdf into a well structured | function call is very nice. | dnadler wrote: | This might also be a bit easier than it seems. I've done | similar (though not nearly as nice of a UI) with | `unstructured`. | davidbarker wrote: | https://www.youtube.com/live/U9mJuUkhUzk?t=2006 | | (Timestamp 33:26) | | Edit: updated the timestamp | brunoqc wrote: | ~~wat? the video is 45:35 long.~~ | davidbarker wrote: | Oh! When I replied it was a lot longer -- it still had the | countdown from before the stream went live. I guess they | replaced it with the trimmed version. | brunoqc wrote: | Thanks! | WanderPanda wrote: | Yep I feel like they solved the problem that Apple never | managed to solve with Siri: How to interface it with apps. | Seems like this was an LLM-hard problem | freedomben wrote: | My guess is an LLM-based Siri is right around the corner. | Apple commonly waits for tech to be proved by others before | adopting it, so this would be in-line with standard operating | procedures. | singularity2001 wrote: | My guess is that LLM-Siri will be crippled by internal | processes and lawyers | glass-z13 wrote: | One step closer to augmenting day to day internet browsing with | the announcement of the GPT's | vineet wrote: | The Assistants API is really cool. Together with the retrieval | feature, it makes me wonder how many companies OpenAI killed by | creating it. | modeless wrote: | Whisper V3 is released! | https://github.com/openai/whisper/commit/c5d42560760a05584c1... | | Looks like it's just a new checkpoint for the large model. It | would be nice to have updates for the smaller models too. But | it'll be easy to integrate with anything using Whisper V2. I'm | excited to add it to my local voice AI | (https://www.microsoft.com/store/apps/9NC624PBFGB7) | | I assume ChatGPT voice has been using Whisper V3 and I've noticed | that it still has the classic Whisper hallucinations ("Thank you | for watching!"), so I guess it's an incremental improvement but | not revolutionary. | ianbicking wrote: | Do you also get those hallucinations just on silence? | | I kind of wonder if they had a bunch of training data of video | with transcripts, but some of the video/audio was truncated and | the transcript still said the last speech, and so now it thinks | silence is just another way of signing off from a TV program. | | IMHO the bottleneck on voice now is all the infrastructure | around it. How do you detect speech starting and stopping? How | do you play sound/speech while also being ready for the user to | speak? This stuff is necessary, but everything kind of works | poorly, and you really need hardware/software integration. | modeless wrote: | You're right, I think that's exactly what happened. | | Silence is when you get the most hallucinations. But there is | a trick supported by some implementations that helps a lot. | Whisper does have a special <|nospeech|> token that it | predicts for silence. You can look at the probability of that | token even when it's not picked during sampling. | Hallucinations often have a relatively high probability for | the nospeech token compared to actual speech, so that can | help filter them out. | | As for all the surrounding stuff like detecting speech | starting and stopping and listening for interruptions while | talking, give my voice AI a try. It has a rough first pass at | all that stuff, and it needs a lot of work but it's a start | and it's fun to play with. Ultimately the answer is end-to- | end speech-to-speech models, but you can get pretty far with | what we have now in open source! | Void_ wrote: | Too bad they didn't upgrade Whisper API yet. Can't wait to make | it available in https://whispermemos.com | dang wrote: | Related: | | _OpenAI releases Whisper v3, new generation open source ASR | model_ - https://news.ycombinator.com/item?id=38166965 | zavertnik wrote: | And here I was in bliss with the 32k context increase 3 days ago. | 128k context? Absolutely insane. It feels like now the bottle | neck in GPT workflows is no longer GPT, but instead its the | wallet! | | Such an amazing time to be alive. | naiv wrote: | now with the prices reduced so much even the wallet might not | be the bottle neck anymore | in3d wrote: | For GPT-4 Turbo, not GPT-4. | dragonwriter wrote: | GPT-4-Turbo seems to be replacing GPT-4 (non-turbo); the | GPT-4 (non-turbo) model is marked as "Legacy" in the model | list. | | EDIT: the above is corrected, it previously erroneously said | the non-turbo model was marked as "deprecated", which is a | different thing. | kridsdale3 wrote: | Yes, nowhere in the text today was there any assertion that | Turbo produces (eg) source code at the same level of | coherence and consistently high quality as GPT4. | marban wrote: | Comment will not age well. | Swizec wrote: | > 128k context? Absolutely insane | | 128k context is great and all, but how effective are the middle | 100,000 tokens? LLMs are known to struggle with remembering | stuff that isn't at the start or end of the input. Known as the | Lost Middle | | https://arxiv.org/abs/2307.03172 | saliagato wrote: | sama said they improved it | robertkoss wrote: | Does anyone know when this will be coming to Azure OpenAI? | kasetty wrote: | I would be also interested in knowing when these show up in | Azure OpenAI offerings. | Onawa wrote: | If Azure's history when rolling out GPT-4 is any indication, | probably a couple months and/or a staged rollout. | robertkoss wrote: | Is Azure adoption really that slow? Ugh. | Zaheer wrote: | The playbook OpenAI is following is similar to AWS. Start with | the primitives (Text generation, Image generation, etc / EC2, S3, | RDS, etc) and build value add services on top of it (Assistants | API / all other AWS services). They're miles ahead of AWS and | other competitors in this regard. | gumballindie wrote: | And just like amazon they will compete with their own | customers. They are miles ahead in this regard as well since | they basically take everyone's digital property and resell it. | sharemywin wrote: | don't hate the player hate the game. | chipgap98 wrote: | The Assistants API and OpenAI Store are really interesting. Those | are the types of things that could build a moat for OpenAI | visarga wrote: | You think it is hard to export an agent? It's a master prompt, | a collection of documents and a few generic plugins like | function calling and code execution. This will be implemented | in open source soon. You can even fine-tune on your bot logs. | WanderPanda wrote: | Agreed, the moat are the models (as an extension of the | instruction tuning data) | chipgap98 wrote: | The Assistants playground doesn't seem to be available yet | singularity2001 wrote: | https://chat.openai.com/gpts/editor | | you currently do not have access to this feature :( | cryptoz wrote: | For DALL-E 3, I'm getting "openai.error.InvalidRequestError: The | model `dall-e-3` does not exist." is this for everyone right now? | Maybe it's gonna be out any minute. | | I see the python library has an upgrade available with breaking | changes, is there any guide for the changes I'll need to make? | And will the DALL-E 3 endpoint require the upgrade? So many | questions. | | Edit: Oh I see, | | > We'll begin rolling out new features to OpenAI customers | starting at 1pm PT today. | minimaxir wrote: | The documentation/READMEs in the GitHub repo was updated to | play nice with the new v1.0.0 of the package: | https://github.com/openai/openai-python/ | cryptoz wrote: | Aha, makes sense, thanks :) | davio wrote: | Stream of keynote: https://youtu.be/U9mJuUkhUzk?t=1806 | WanderPanda wrote: | Does anyone have an idea why they are so open about Whisper? Is | it the poster child project for OAI people scratching their open | source itch? Is there just no commercial value in speech to text? | htrp wrote: | speech to text is a relatively crowded area with a lot of other | companies in the space. Also really hard to get "wow" | performance as it's either correct (like most other people's | models) or it's wrong | teaearlgraycold wrote: | Everyone's got a loss leader | freedomben wrote: | I've been wondering this as well. I'm super glad, but it seems | so different than every _other_ thing they do. There 's | _definitely_ commercial value, so I find it surprising. | StanAngeloff wrote: | I personally use Whisper to transcribe painfully long meetings | (2+ hours). The transcripts are then segmented and, you guessed | it, entered right into GPT-4 for clean up, summarisation, | minutes, etc. So in a sense it's a great way to get more people | to use their other products? | htrp wrote: | We need some independent benchmarks (LLM elo via chatbot arena | etc) about how gpt4 Turbo compares to gpt4. | freedomben wrote: | Text to Speech is exciting to me, though it's of course not | particularly novel. I've been creating "audiobooks" for personal | use for books that don't have a professional version, and despite | high costs and meh quality have been using AWS. | | Has anybody tried this new TTS speech for longer works and/or | things like books? Would love to hear what people think about | quality | dang wrote: | Related ongoing threads: | | _GPTs: Custom versions of ChatGPT_ - | https://news.ycombinator.com/item?id=38166431 | | _OpenAI releases Whisper v3, new generation open source ASR | model_ - https://news.ycombinator.com/item?id=38166965 | | _OpenAI DevDay, Opening Keynote Livestream [video]_ - | https://news.ycombinator.com/item?id=38165090 | QkPrsMizkYvt wrote: | Most of the API docs were updated, but none of the new APIs work | for me. Are other people experiencing the same? | davidbarker wrote: | They will start rolling out at 1pm PST today. | QkPrsMizkYvt wrote: | got it - thanks | QkPrsMizkYvt wrote: | nice it is live now! | willsmith72 wrote: | If they could roll back the extreme rate-limiting on dalle 3 in | gpt4, that would be great. | kelseyfrog wrote: | JSON mode is a great step in the right direction, but the holy | grail is either JSON-schema support or (E)BNF grammar | specification. | minimaxir wrote: | The function calling is JSON Schema support but extremely | poorly marketed. I am planning on writing a blog post about it. | danenania wrote: | Yeah I'm not sure I see the point of "JSON mode", in its | current iteration at least, considering function calling | already does this more effectively. | | I suppose it could help to make simpler API calls and save | some prompt tokens, but it would definitely need schema | support to really be useful. | minimaxir wrote: | It makes it a bit easier to parse returned tabular data, | anyways. | | I'll be curious to see if it can handle outputting nested | data without prompting. | Wherecombinator wrote: | Is this just for the API for now? | | I just got premium the other day for ChatGPT 4 and have been | blown away. I'm wondering if I'll automatically get turbo when | it's released? | tornato7 wrote: | GPT-4 Turbo is already available by default in ChatGPT | kvn8888 wrote: | I can't find anything that says it's available in ChatGPT | dragonwriter wrote: | ChatGPT (at least in Plus) when using the GPT-4 model | selected (instead of GPT-3.5) currently consistently | reports the April 2023 knowledge cutoff of GPT-4-Turbo | (gpt-4-1106-preview/gpt-4-vision-preview) as its knowledge | cutoff, not the Sep 2021 cutoff for gpt-4-0613, the most | recent pre-turbo GPT-4 model release. | | The most sensible explanation is that ChatGPT is using | GPT-4-Turbo as its GPT-4 model. | Topfi wrote: | I am very much looking forward to, but also dreading, testing | gpt-4-turbo as part of my workflow and projects. The lowered cost | and much larger context window are very attractive; however, I | cannot be the only one who remembers the difference in output | quality and overall perceived capability between gpt-3.5 and | gpt-3.5-turbo, combined with the intransparent switching from one | model to the other (calling the older, often more capable model | "Legacy", making it GPT+ exclusive, trying to pass of | gpt-3.5-turbo as a straight upgrade, etc.). If the former had | remained available after the latter became dominant, that may not | have been a problem in itself, but seeing as gpt-3.5-turbo has | fully replaced its precursor (both on the Chat website and via | API) and gpt-4 as offered up to this point wasn't a fully perfect | replacement for plain gpt-3.5 either, relying on these models as | offered by OpenAI has become challenging. | | A lot of ink has been spilled about gpt-4 (via the Chat website, | but also more recently via API) seeming less capable over the | last few months compared to earlier experiences and whilst I | still believe that the underlying gpt-4 model can perform at a | similar degree to before, I will admit that purely the amount of | output one can reliably request from these models has become | severely restricted, even when using the API. | | In other words, in my limited experience, gpt-4 (via API or | especially the Chat website) can perform equally well in tasks | and output complexity, but the amount of output one receives | seems far more restricted than before, often harming existing use | cases and workflows. There appears a greater tendency to include | comments ("place this here") even when requesting a specific | section of output in full. | | Another aspect that results from their lack of transparency is | communicating the differences between the Chat Website and API. I | understand why they cannot be fully identical in terms of output | length and context window (otherwise GPT+ would be an even bigger | loss leader), but communicating the Status Quo should not be an | unreasonable request in my eyes. Call the model gpt-4-web or | something similar to clearly differentiate the Chat Website | implementation from gpt-4 and gpt-4-1106 via API (the actual name | for gpt-4-turbo at this point in time). As it stands, people like | myself have to always add whether the Chat website or API is what | our experiences arise from, while people who may only casually | experiment with the free Website implementation of gpt-3.5-turbo | may have a hard time grasping why these models create such | intense interest in those more experienced. | doctoboggan wrote: | In the keynote @sama claimed GPT-4-turbo was superior to the | older GPT-4. Have any benchmarks or other examples been shown? I | am curious to see how much better it is, if it all. I remember | when 3.5 got its turbo version there was some controversy on | whether it was really better or not. | tornato7 wrote: | A few notes on pricing: | | - GPT-4 Turbo vision is much cheaper than I expected. A 768*768 | px image costs $0.00765 to input. That's practical to replace | more specialized computer vision models for many use-cases. | | - ElevenLabs is $0.24 per 1K characters while OpenAI TTS HD is | $0.03 per 1K characters. Elevenlabs still has voice copying but | for many use-cases it's no longer competitive. | | - It appears that there's no additional fee for the 128K context | model, as opposed to previous models that charged extra for the | longer context window. This is huge. | taf2 wrote: | Does this mean OpenAI tts is available via api? I saw whisper | but not tts - maybe I'm missing it? | davidbarker wrote: | It is, indeed! | | https://platform.openai.com/docs/guides/text-to-speech | alach11 wrote: | There are a lot of huge announcements here. But in particular, | I'm excited by the Assistants API. It abstracts away so many of | the routine boilerplate parts of developing applications on the | platform. | og_kalu wrote: | The new TTS is much cheaper than eleven labs and better too. | | I don't know how the model works so maybe what i'm asking isn't | even feasible but i wish they gave the option of voice cloning or | something similar or at least had a lot more voices for other | languages. The default voices tend to make other language output | have an accent. | | Uh if turbo's the much faster model a few have had access to in | the past week, then pressing x on the "more intelligent than | legacy 4" statement. | obiefernandez wrote: | My profit margins at https://olympia.chat just got 3x better <3 | saliagato wrote: | I think your startup just died | leobg wrote: | Elaine Jusk...lol | whytai wrote: | Every day this video ages more and more poorly [1]. | | categories of startups that will be affected by these launches: | | - vectorDB startups -> don't need embeddings anymore | | - file processing startups -> don't need to process files anymore | | - fine tuning startups -> can fine tune directly from the | platform now, with GPT4 fine tuning coming | | - cost reduction startups -> they literally lowered prices and | increased rate limits | | - structuring startups -> json mode and GPT4 turbo with better | output matching | | - vertical ai agent startups -> GPT marketplace | | - anthropic/claude -> now GPT-turbo has 128k context window! | | That being said, Sam Altman is an incredible founder for being | able to have this close a watch on the market. Pretty much any | "ai tooling" startup that was created in the past year was | affected by this announcement. | | For those asking: vectorDB, chunking, retrieval, and RAG are all | implemented in a new stateful AI for you! No need to do it | yourself anymore. [2] Exciting times to be a developer! | | [1] https://youtu.be/smHw9kEwcgM | | [2] https://openai.com/blog/new-models-and-developer-products- | an... | Der_Einzige wrote: | Startups built around actual AI tools, like if one formed | around automatic1111 or oogabooga, would be unaffected, but | because so much VC money went to the wrong places in this | space, a whole lot of people are about to be burned hard. | throwaway-jim wrote: | damn hahaha it's oobabooga not oogabooga | atleastoptimal wrote: | There will be a lot of startups who rely on marketing | aggressively to boomer-led companies who don't know what email | is and hoping their assistant never types OpenAI into Google | for them. | yawnxyz wrote: | i'm excited for the open source, local inferencing tech to | catchup. The bar's been raised. | morkalork wrote: | If you want to be a start-up using AI, you have to be in | another industry with access to data and a market that | OpenAI/MS/Google can't or won't touch. Otherwise you end up | eaten like above. | ushakov wrote: | We just launched our AI-based API-Testing tool | (https://ai.stepci.com), despite having competitors like | GitHub Co-Pilot. | | Why? Because they lack specificity. We're domain experts, we | know how to prompt it correctly to get the best results for a | given domain. The moat is having model do one task extremely | well rather than do 100 things "alright" | darkwater wrote: | Sorry to be blunt but they can be totally right, if you do | not succeed and have to shut down your startup. | ushakov wrote: | It certainly will be a fun experience. But our current | belief is that LLMs are a commodity and the real value is | in (application-specific) products built on top of them. | esafak wrote: | If you just launched it is too soon to speak. | ushakov wrote: | Of course! Today our assumption is that LLMs are | commodities and our job is to get the most out of them | for the type of problem we're solving (API Testing for | us!) | sharemywin wrote: | Time will tell | parkerhiggins wrote: | Domain specialization could be the moat, not only in the | business domain but the sheer cost of | deployment/refinement. | | Check out Will Bennett's "Small language models and | building defensibility" - https://will- | bennett.beehiiv.com/p/small-language-models-and... (free | email newsletter subscription required) | renewiltord wrote: | Writer.ai is quite successful, and is totally in another | industry that Google+MS participate in. | colordrops wrote: | I haven't been paying attention, why are embeddings not needed | anymore? | lazzlazzlazz wrote: | OP is incorrect. Embeddings are still needed since (1) | context windows can't contain all data and (2) data | memorization and continuous retraining is not yet viable. | nextworddev wrote: | "yet" | coding123 wrote: | It's also much slower. LLMs are generating text token at | a time. That's not very good for search. | | Pre-search tokenization however, probably a good fit for | LLMs. | zwily wrote: | But the common use case of using a vector DB to pull in | augmentation appears to now be handled by the Assistants | API. I haven't dug into the details yet but it appears you | can upload files and the contents will be used (likely with | some sort of vector searching happening behind the scenes). | emadabdulrahim wrote: | I believe their API can be stateful now: | https://openai.com/blog/new-models-and-developer-products- | an... | sharemywin wrote: | Retrieval: augments the assistant with knowledge from outside | our models, such as proprietary domain data, product | information or documents provided by your users. This means | you don't need to compute and store embeddings for your | documents, or implement chunking and search algorithms. The | Assistants API optimizes what retrieval technique to use | based on our experience building knowledge retrieval in | ChatGPT. | | The model then decides when to retrieve content based on the | user Messages. The Assistants API automatically chooses | between two retrieval techniques: | | it either passes the file content in the prompt for short | documents, or performs a vector search for longer documents | Retrieval currently optimizes for quality by adding all | relevant content to the context of model calls. We plan to | introduce other retrieval strategies to enable developers to | choose a different tradeoff between retrieval quality and | model usage cost. | sjnair96 wrote: | Really cool to see the Assistants API's nuanced document | retrieval methods. Do you index over the text besides | chunking it up and generating embeddings? I'm curious about | the indexing and the depth of analysis for longer docs, | like assessing an author's tone chapter by chapter--vector | search might have its limits there. Plus, the process to | shape user queries into retrievable embeddings seems | complex. Eager to hear more about these strategies, at | least what you can spill! | lazzlazzlazz wrote: | Embeddings are still important (context windows can't contain | all data + memorization and continuous retraining is not yet | viable), and vertical AI agent startups can still lead on UX. | Finbarr wrote: | Context windows can't contain all data... yet. | ren_engineer wrote: | depends on how much developers are willing to embrace the risk | of building everything on OpenAI and getting locked onto their | platform. | | What's stopping OpenAI from cranking up the inference pricing | once they choke out the competition? That combined with the | expanded context length makes it seem like they are trying to | lead developers towards just throwing everything into context | without much thought, which could be painful down the road | keithwhor wrote: | I suspect it is in OpenAI's interest to have their API as a | loss leader for the foreseeable future, and keep margins slim | once they've cornered the market. The playbook here isn't to | lock in developers and jack up the API price, it's the | marketplace play: attract developers, identify the highest- | margin highest-volume vertical segments built atop the | platform, then gobble them up with new software. | | They can then either act as a distributor and take a | marketplace fee or go full Amazon and start competing in | their own marketplace. | baq wrote: | Checking hn and product hunt a few times a week gives you most | of that awareness and I don't need to remind you about the | person behind hn 'sama' handle. | bluecrab wrote: | Vector DBs should never have existed in the first place. I feel | sorry for the agent startups though. | m3kw9 wrote: | How does this absolve vectordbs | danielbln wrote: | It doesn't, but semantic search is a lot less relevant if | you can squeeze 350 pages of text into the context. | dragonwriter wrote: | If you are using OpenAI, the new Assistants API looks like | itnwill handle internally what you used to handle | externally with a vector DB for RAG (and for some things, | GPT-4-Turbo's 128k context window will make it unnecessary | entirely.) There are some other uses for Vector DBs than | RAG for LLMs, and there are reasons people might use non- | OpenAI LLMs with RAG, so there is still a role for | VectorDBs, but it shrunk a lot with this. | echelon wrote: | We don't want Open AI to win everything. | blibble wrote: | HN is quite notorious for _that_ Dropbox comment | | I suspect that video is going to end up more notorious, it's | even funnier given it's the VCs themselves | arcanemachiner wrote: | More context, please. | | EDIT: I guess it's this: | | https://news.ycombinator.com/item?id=8863#9224 | blibble wrote: | that's the one | bilsbie wrote: | Why don't you need embedding? | riku_iki wrote: | > - vectorDB startups -> don't need embeddings anymore | | they don't provide embedings, but storage and query engines for | embeddings, so still very relevant | | > - file processing startups -> don't need to process files | anymore | | curious what is that exactly?.. | | > - vertical ai agent startups -> GPT marketplace | | sure, those startups will be selling their agents on | marketplace | make3 wrote: | they definitely do provide embeddings, | https://openai.com/blog/new-models-and-developer-products- | an... ctrl+f retrieval, "... won't need to ... compute or | store embeddings" | riku_iki wrote: | I mean embeddingsDB startups don't provide embeddings. They | provide databases which allows to store and query computed | embeddings (e.g. computed by ChatGPT), so they are | complimentary services. | larodi wrote: | Well, if said startups were visionaries, the could've known | better the business they're entering. On the other hand - there | are plenty of VC-inflated balloons, making lots of noise, that | everyone would be happy to see go. If you mean these startups - | well, farewell. | | There's plenty more to innovate, really, saying OpenAI killed | startups it's like saying that PHP/Wordpress/NameIt killed | small shops doing static HTML. or IBM killing the... typewriter | companies. Well, as I said - they could've known better. | Competition is not always to blame. | karmasimida wrote: | TBH those are low-hanging fruits for OpenAI. Much of the value | still being captured by OpenAI's own model. | | The sad thing is, GPT-4 is its own league in the whole LLM | game, whatever those other startups are selling, it isn't | competing with OpenAI. | schrodingerscow wrote: | I'm confused by the pricing. Gpt-4 turbo appears to be better in | every way, but is 3x cheaper?! | dragonwriter wrote: | The same as true of GPT-3.5-turbo compared to the GPT-3 models | which preceded it. | | They want everyone on GPT-4-turbo. It may also be a smaller (or | otherwise more efficient) but more heavily trained model that | is cheaper to do inference on. | tornato7 wrote: | According to [1], the new gpt-4-1106-preview model should be | available to all, but the API is telling me "The model | `gpt-4-1106-preview` does not exist or you do not have access to | it." | | Anyone able to call it from the API? | | 1. https://help.openai.com/en/articles/8555510-gpt-4-turbo | anotherpaulg wrote: | Same. I am eager to run my code editing benchmark [1] against | it, to compare it with gpt-4-0314 and gpt-4-0613. | | Edit: Ha, I just re-read the announcement [2] and it says 1pm | in the 5th sentence: We'll begin rolling out | new features to OpenAI customers starting at 1pm PT today. | | [1] https://aider.chat/docs/benchmarks.html | | [2] https://openai.com/blog/new-models-and-developer-products- | an... | naiv wrote: | rumours on x are that it will be available 1pm san francisco | time | tekacs wrote: | > We'll begin rolling out new features to OpenAI customers | starting at 1pm PT today. | | ^ It says exactly this in the linked article. | naiv wrote: | oh, totally overread this :D | reqo wrote: | Didn't the tickets to Dev Day cost around 600$? They basically | took that money and gave it back to developers as credits so they | can start using their API today! Pretty smart move! | longnguyen wrote: | Awesome. Adding GPT-4 Turbo and DALL*E 3 to my ChatGPT macOS | client[0] | | [0]: https://boltai.com | gwern wrote: | > We're also launching a feature to return the log probabilities | for the most likely output tokens generated by GPT-4 Turbo and | GPT-3.5 Turbo in the next few weeks, which will be useful for | building features such as autocomplete in a search experience. | | This is very surprising to me. Are they not worried about people | not just training on GPT-4 outputs to steal the model | capabilities, but doing full blown logit knowledge-distillation? | (Which is the reason everyone assumed that they disabled logit | access in the first place.) | leobg wrote: | How many GBs worth of logits would you need to reverse engineer | their model? Also, if it's a conglomerate of models that | they're using, you'd end up in a blind alley. | danielmarkbruce wrote: | I thought the same thing.... My guess is they did a lot of | analysis and decided it would be safe enough to do? "most | likely" might be literally a handful and cover little of the | entire distribution % wise? | saliagato wrote: | You can now [1] pay from $2 to $3 million to pretrain custom | gpt-n model. This has gone unnoticed but seems really neat. | Provided that a start-up has enough money spend on that, it would | certainly give competitive advantage. | | [1] https://openai.com/form/custom-models | | Edit: forgot to put the link | llmllmllm wrote: | While this makes some of what my startup https://flowch.ai does a | commodity (file uploads and embeddings based queries are an | example, but we'll see how well they do it - chunking and | querying with RAG isn't easy to do well), the lower prices of | models make my overall platform way better value, so I'd say | overall it's a big positive. | | Speaking more generally, there's always room for multiple | players, especially in specific niches. | mediaman wrote: | Their system also does not seem to support techniques like | hybrid search, automated cleaning/modifying of chunks prior to | embedding, or the ability to access citations used, all of | which are pretty important for enterprise search. | | Could just mean it's coming, though. | aantix wrote: | Can I pay someone to have my ChatGPT transcripts searchable? | raylad wrote: | So with 128K context window, if you actually input 100K it would | cost you: | | Input: $0.01 per 1K tokens * 100 = $1.00 | | $1.00 per query? | | Given that each query uses the entire context window, the session | would start at $1 for the first query and go up from there? Or do | I have it wrong? | minimaxir wrote: | It would be $1 for each individual API call, if you were | continuing the conversation based on the same 100K input. | ChatGPT is stateless. | raylad wrote: | Right, so that adds up very fast. | 0xDEF wrote: | If it truly is GPT-4+ with a 128K context window it's still | absolutely worth the high price. However if they are cheating | like everyone else who has promised gigantic context windows | then we are better off with RAG and a vector database. | shanusmagnus wrote: | This is kind of the wrong place for this, but given the burst of | attention from LLM-loving people: is there any open source chat | scaffolding that actually provides a good UI for organizing chat | streams and doing stuff with them? | | A trivial example is how the LHS of the ChatGPT UI only allows | you a handful of characters to name your chat, and you can't even | drag the pane to the right to make it bigger; so I have all these | chats with cryptic names from the last eleven months that I can't | figure out wtf they are; and folders are subject to the same | problem. | | Seriously, just being able to organize all my chats would be a | massive help; but there are so many cool things you could do | beyond this! But I've found nothing other than literal clones of | the ChatGPT UI. Is there really nothing? Nobody has made anything | better? | bluecrab wrote: | Also natural language search of the chat history would be | great. | nextworddev wrote: | Organize how? | sharemywin wrote: | tree structure. like email. | shanusmagnus wrote: | That would be one very obvious way and a big improvement | over the current state of affairs. | sharemywin wrote: | I agree why not vector search for history. | davidbarker wrote: | This may not be useful to you, but there are browser extensions | that add a bunch of functionality to ChatGPT. | | The first that comes to mind: | https://chrome.google.com/webstore/detail/superpower-chatgpt... | shanusmagnus wrote: | No joy with the one you linked (can't see what problem that | one is actually solving), but I'll look through browser | extensions -- I hadn't considered that. | ryanklee wrote: | ChatGPT Keeper Chrome extension at least allows for search. | singularity2001 wrote: | did they break the api? | | from openai import OpenAI | | Traceback (most recent call last): File "<stdin>", line 1, in | <module> ImportError: cannot import name 'OpenAI' from 'openai' | | If so where is the current documentation? | ofermend wrote: | Excited to see GPT4-Turbo and longer sequence lengths from | OpenAI. We just released Vectara's "Hallucination Evaluation | Model" (aka HEM) today | https://huggingface.co/vectara/hallucination_evaluation_mode... | (along with this leaderboard: | https://github.com/vectara/hallucination-leaderboard). GPT-4 was | already in the lead. Looking forward to seeing GPT4-Turbo there | soon. | m3kw9 wrote: | How many startups got shafted today? | dangrigsby wrote: | Is there a special "developer" designation? I am a paying API | customer, but can't see gpt-4-1106-preview in the playground and | can't use it via the API. | danenania wrote: | Apparently they'll be granting access at 1pm PST. We'll see | what happens. Rate limits also don't seem to be updated yet to | reflect their new "Usage Tiers" - | https://platform.openai.com/docs/guides/rate-limits/usage-ti... | karmajunkie wrote: | As other comments have noted it seems to be rolling out at 1pm | PST today | wilg wrote: | What context length will ChatGPT have on GPT-4-Turbo? It wasn't | using the full 32K before was it? | bluck wrote: | Copyright Shield | | > OpenAI is committed to protecting our customers with built-in | copyright safeguards in our systems. Today, we're going one step | further and introducing Copyright Shield--we will now step in and | defend our customers, and pay the costs incurred, if you face | legal claims around copyright infringement. This applies to | generally available features of ChatGPT Enterprise and our | developer platform. | | So essentially they are giving devs a free pass to treat any | output as free of copyright infringement? Pretty bold when | training data sources are kinda unknown. | fnordpiglet wrote: | It's not unknown to OpenAI, presumably? And I assume the shield | evaporates if their court cases goes against them. | layer8 wrote: | It probably also means having to remain a paying customer as | long as you want that protection to persist for any previous | output. | tyree731 wrote: | I am not a lawyer, but this doesn't seem quite "free". Note | that they aren't indemnifying customers for any consequences of | said legal claims, meaning that customers would seem to bare | the full brunt of those consequences should there be a credible | copyright infringement claim. | ShakataGaNai wrote: | For large-scale usage, it doesn't matter what the devs want. If | the lawyers show up and say "We can't use this technology | because we're probably going to get sued for copyright | infringement", it's dead in the water. | | It's a logical "feature" for them to offer this "shield" as it | significantly mitigates one of the large legal concerns to | date. It doesn't make the risks fully go away, but if someone | else is going to step up and cover the costs, then it could be | worthwhile. | | For large enterprises, IP is a big deal, probably the single | biggest concern. They'll spend years and billions of dollars | attempting to protect it, _cough_ sco /oracle _cough_ , right | or wrong. | conorh wrote: | We just changed a project we've been working on to try out the | new gpt-4-turbo model and it is MUCH faster. I don't know if this | is a factor of the number of people using it or not, but | streaming a response for the prompts we are interested in went | from 40-50 seconds to 6 seconds. | activescott wrote: | It is interesting that the updates are largely developer | experience updates. It doesn't appear that significant | innovations are happening on the core models outside of | performance/cost improvements. Both devex and perf/cost are | important to be sure, but incremental. | Davidzheng wrote: | presumably next model is coming next year? | danielmarkbruce wrote: | 128k context? | layer8 wrote: | The TTS seems really nice, though still relatively expensive, and | probably limited to English (?). I can't wait until that level of | TTS will become available basically for free, and/or self-hosted, | with multi-language support, and ubiquitous on mobile and | desktop. ___________________________________________________________________ (page generated 2023-11-06 21:00 UTC)