[HN Gopher] 100K Context Windows ___________________________________________________________________ 100K Context Windows Author : samwillis Score : 578 points Date : 2023-05-11 16:46 UTC (6 hours ago) (HTM) web link (www.anthropic.com) (TXT) w3m dump (www.anthropic.com) | fire wrote: | god I'd love to work there | tikkun wrote: | This is the first time I've felt like Anthropic may be a true | competitor to OpenAI. | | I see 6 ways to improve foundation LLMs other than cost. If your | product is best at one of the below, and has parity at the other | 5 items, then customers will switch. I'm currently using | GPT-4-8k. I regularly run into the context limit. If Claude-100K | is close enough on "intelligence" then I will switch. | | Six Dimensions to Compare Foundation LLMs: | | 1. Smarter models | | 2. Larger context windows | | 3. More input and output modes | | 4. Lower time to first response token and to full response | | 5. Easier prompting | | 6. Integrations | rizky05 wrote: | [dead] | RobotToaster wrote: | >Six Dimensions to Compare Foundation LLMs | | I'd add open source to the list, which neither "open"AI or this | is. | ugh123 wrote: | I don't think most of the large customers will care about OSS | AI. Over the last decade they've learned (trained | themselves?) where to put their money towards (cloud vs. in- | house infra for all manner of things, for better or worse) | and I think AI tools will follow similar trends. | | Businesses will certainly care about cost, but just as | important will be: | | - Customization and fine-tuning capabilities (also 'white | labeling' where appropriate) | | - Integrations (with 3rd party and in-house services & data | stores) | | - SLA & performance concerns | | - Safety features | | Open Source AI will have a place, but may be more towards | personal-use and academic work. And it will certainly drive | competition with the major players (OpenAI, Google, etc) and | push them to innovate more which is starting to play out now. | ibains wrote: | A lot of B2B startups can technically the cloud API to | provide value added applications to Enterprises, but often | the banks and healthcare companies will not want their data | running through startups pipes to OpenAI pipes. | | We provide a low code data transformation product | (prophecy.io), and we'll never close sales at any volume, | if we have a to get an MSA that approves this. Might get | easier if we become large :) | lannisterstark wrote: | >I don't think most of the large customers will care about | OSS AI | | Problem again, is centralization of LLMs by either the | governments (and they always act in your best interest, | amirite?) and corporation, which Non-FOSS LLMs prevent. | | Democratization of the models is the only way to actually | prevent bad actors from doing bad things. | | "But they'll then have access to it too" you say. Yes, they | will, but given how many more people who will also have | access to open LLMs we'd have tools to prevent actually | malicious acts. | dragonwriter wrote: | > I don't think most of the large customers will care about | OSS AI. | | OSS AI will open up more diverse and useful services than | the first-party offerings from relatively risk averse major | vendors, which customers *will" care about. | simonw wrote: | Here's a really important reason to care about open source | models: prompt engineering is fiddly enough without the | risk of your model provider "upgrading" the model you are | using in a way that breaks your existing prompts. | | OpenAI already upset a lot of (admittedly non-paying | academic) users when they shut off access to the old Ada | code model with only a few week's notice. | danysdragons wrote: | The OpenAI APi has model checkpoints, right now the chat | options are: | | gpt-4 gpt-3-5-turbo gpt-4-0314 gpt-3-5-turbo-0301 | spacebanana7 wrote: | I'm curious about how enterprises will manage model | upgrades. | | On one hand, as you mention, upgrades could break or | degrade prompts in ways that are hard to fix. However, | these models will need constant streams of updates for | bugs and security fixes just like any other piece of | software. Plus the temptation to get better performance. | | The decisions around how and whether to upgrade LLMs will | be much more complicated than upgrading Postgres | versions. | Vecr wrote: | Why would the models themselves need security fixes? The | software running the models, sure, but you should be able | to upgrade that without changing anything observable | about the actual model. | ebiester wrote: | Yes, but I think for most companies this has more to do | with cost. They're not going to pay for the OSS model, and | if they can use an OSS model + fine tuning, they'll choose | to save the money. | hdjjhhvvhga wrote: | > I don't think most of the large customers will care about | OSS AI. | | One would think the same in the 90s but yet, for some | reason, Open Source prevailed and took over the world. I | don't believe it was about cost, at least not only. In my | career I had to evaluate many technical solutions and | products and OSS was often objectively superior at several | levels without taking account the cost. | | The first really successful alternative to "Open"AI will: | | * gather many talented developers | | * will quickly become a de facto standard solution | | * people will rapidly start developing a wide range of | integrations for it | | * everybody will be using it, including large orgs, | because, well, it's open source | ugh123 wrote: | True, but the difference here is that running a | performant and capable AI solution will be | infrastructure-dependent, which has real costs. | [deleted] | nullc wrote: | Companies that aren't mindful of vendor lock in aren't long | for the world. | | Though those cloud platforms all have their own proprietary | components most users are savvy enough to constrain and | compartmentalize their use of them lest they find | themselves having all their profits taken by a platform | that knows it can set its prices arbitrarily. The cloud vs | in-house adoption is what it is in large part because the | cloud offerings are a commodity and a big part of them | being a commodity is that much of the underlying software | is free software. | deltree7 wrote: | History is littered with companies that went dead because | they focused on things that don't matter (open source, | anti-microsoft, pro-linux). | | There will be a time when those things matter when it | hurts the bottom-line (Dropbox), but to prematurely | optimize for that while you are finding product-market- | fit is crazy and _all_ companies are finding product- | market-fit in the new AI era | throwawayadvsec wrote: | now that I think about it | | is it that important to open source models that can only run | on hardware worth tens of thousand of dollars? | | who does that benefit besides their competitors and nefarious | actors? | | I've been trying to run one of the largest models for a | while, unless 30,000$ falls in my hand I'll probably never be | able to run the current SOTA | chrisco255 wrote: | > is it that important to open source models that can only | run on hardware worth tens of thousand of dollars? | | Yes, because as we've seen with other open source AI | models, it's often possible for people to fork code and | modify it in such a way that it runs on consumer grade | hardware. | YetAnotherNick wrote: | I agree utility of open source for personal usecase is | overblown. | | But for commercial usecases, open source is very relevant | for privacy reasons as many enterprises have strict policy | not to share data with third party. Also it could be a lot | cheaper for bulk inference or to have a small model for | particular task. | turtles3 wrote: | However, the same thing could be achieved with closed | source models. There's nothing to stop an LLM being made | available to run on prem under a restrictive license. It | would really be no different to ye olde desktop software | - keeping ownership over bits shipped to a customer is | solved with the law rather than technical means. | | That said, I really hope open source models can succeed, | it would be far better for the industry if we had a Linux | of LLMs. | sanxiyn wrote: | > Keeping ownership over bits shipped to a customer is | solved with the law rather than technical means. | | Yes in theory... In practice, what happened with LLaMA | showed people will copy and distribute weights while | ignoring the license. | chaxor wrote: | They don't only run on high end systems. Good models can | run on a desktop you have at home. If you don't have a | desktop... I'm not sure what you're doing on HN. | circuit10 wrote: | It will create price competition for different providers of | the model though, which should drive down prices | iknowstuff wrote: | Even a small startup, a researcher or a tinkerer can get a | cloud instance with a beefy GPU. Also of note, Apple's M1 | Max/Ultra should be be able to run it on their GPUs given | their 64/128GB of memory, right? That's an order of | magnitude cheaper. | mejutoco wrote: | I am confused. Those amounts are ram, not gpu ram, aren't | they? Macs cpus are impressive, but not for ml. A most | realistic one for a consumer is a 4090 rtx 24 GB. A lot | of models do not fit in that, so A6000 48GB and over for | some professional cards. That might be around 9000EUR | already. | codedokode wrote: | > Macs cpus are impressive, but not for ml | | On Mac GPU has access to all memory. | piperswe wrote: | Apple Silicon has unified memory - all memory is | accessible to both the CPU and GPU parts of the SoC. | karmasimida wrote: | But they comes at max 32GB model? | [deleted] | mkl wrote: | Mac Studio (desktop) is up to 128GB, and Macbook Pro is | up to 96GB. | himlion wrote: | I overlooked the unified memory on those machines. Can it | really run this performantly? | lannisterstark wrote: | "It only benefits bad people" is a pretty shitty argument | at this point tbf. You can apply this logic to any | expensive thing at this point. | | I _can_ for example, afford the hardware worth tens of | thousands of dollars. I don 't want to, but I can if I | needed to. Does that automagically make me their competitor | or a bad actor? | fnordpiglet wrote: | Yes, because it can always be down ported by people with | more constraints than the original authors. We've see a lot | of this in the LLM space, and a lot of other OSS efforts. | RobotToaster wrote: | When linux was first released in 1991 a 386 to run it would | cost about $2000. | | We've already seen big advancements in tools to run them on | lesser hardware. It wouldn't surprise me if we see some big | advancements in the hardware to run them over the next few | years, currently they are mostly being run of graphics | processors that aren't optimised for the task. | dfadsadsf wrote: | $30000 is less than price of average car that Americans buy | (and most families have two of them) - that's definitely in | the realm of something that affluent family can buy if it | provides enough value. I also expect price to go down and | at $10k it's less than mid-range bathroom update. The | question is only if it provides enough value or using in | the cloud better option for almost all families. | overgard wrote: | Considering the very smart people asking for a moratorium on | AI development, and it's potential to disrupt a lot of jobs, | this may be a good thing. | nr2x wrote: | For me I'd say speed trumps all else. It's impossible to truly | reach scale with the glacial response times you get from | current API. | sebzim4500 wrote: | >speed trumps all else | | Then use GPT-2 | nr2x wrote: | I actually do prefer 3.5-turbo over 4 for many tasks. | IshKebab wrote: | Reliability surely? They still haven't managed to make a model | that says "I don't know" rather than bullshitting. That's by | far the biggest unsolved problem. | srowaway2 wrote: | 7. Price! | | GPT4-32K costs ~$2 if you end up using the full 32K tokens, so | if you're doing any chaining or back-and-forth it can get | expensive very quickly. | hesdeadjim wrote: | Oof, got access to the 8k model recently and was wondering | what costs would be on the 32k one. That's brutal. | zomglings wrote: | Also if you allow users to receive vector representations of | context and provide such representations as side information | when querying LLMs. | danenania wrote: | One question is how much other factors really matter compared | to the raw "intelligence" of the model--how good its | completions are. You're not going to care very much about | context window, prompting, or integrations if the output isn't | good. It would be sort of like a car that has the best steering | and brakes on the market, but can't go above 5 mph. | majormajor wrote: | Big question on that for me is that there's a variety of | "completion styles" and I'm curious how "universal" | performance on them is. Probably more than this, but a quick | list that comes to mind: | | * Text summary/compression | | * Creative writing (fiction/lyrics/stylization) | | * Text comparison | | * Question-answering | | * Logical reasoning/sequencing ("given these tools and this | scenario, how would you perform this task") | | IMO, for stuff like text comparison and question-answering, | some combo of speed/cost/context-size could make up for a | lot, even if they do "worse" versions of stuff just that's | too slow or expensive or context-limited in a different | model. | solarkraft wrote: | I don't know. While using Phind I regularly get annoyed by | long prose that doesnt answer anything (yes, "concise" is | always on). Claude seems to be directly geared towards | solving stuff over nice writing. | Tostino wrote: | I generally add to my initial prompts to GPT4 to: From now | on, please use the fewest tokens possible in all replies to | save tokens and provide brief and accurate answers. | modernpink wrote: | Or rather, more analogously, a self-driving car that has a | range of 10 000 miles but sometimes makes mistakes when | driving vs a self-driving car with a range of 800 miles that | never makes mistakes. Once you've have a taste of | intelligence it's hard to give up. | | However, in many applications there is a limit on how | intelligent you need the LLM to be. I have found I am able to | fall back to the cheaper and faster GPT-3.5 to do the grunt | work of forming text blobs into structured json within a | chain involving GPT-4 for higher-level functions. | tikkun wrote: | Strongly agree. They are ordered by how much I think they | generally will lead to users choosing one model over the | other. | | Intelligence is the most important dimension by far, perhaps | an order of magnitude or more above the second item on the | list. | danenania wrote: | On that note, can anyone speak to how Anthropic (or other | models) are doing on catching up to OpenAI for pure model | intelligence/quality of completions? Are any others | approaching GPT-4? I've only used GPT-based tools so I have | no idea. | og_kalu wrote: | The best claude model is closer to GPT-4 than 3.5 | jll29 wrote: | More languages? | nico wrote: | Faster, cheaper fine-tuning and training | | If I could train a useful model, on my own data, in a | reasonable time | | I would want to have a CI-training pipeline to always have my | models up to date | makestuff wrote: | Yeah I remember in undergrad I was working on using | transformation learning to train an object detector. | Basically you only needed 100ish images to get the model to | detect that new object really well. | | I'm not sure what the analogous term is for a similar process | on LLMs, but that will be huge when there is a service for | it. | visarga wrote: | LLMs can do that without any examples (zero shot) or with | one or a few demonstrations in the prompt, if you can | describe the task in the limited context window. | | If you want for example to train the model to learn to use | a very large API, or access the knowledge in a whole book, | it might need fine-tuning. | nico wrote: | Could I just train a very small LLM with an English | dictionary + Python + large API documentation + large | Python code base? | | Then do some chat fine tuning (like what HF did with | StarCoder to get ChatCoder) | | And get a lightweight LLM that knows the docs and code | for the thing I need it for | | After that, maybe incrementally fine tune the model as | part of your CI/CD process | toss1 wrote: | How similar were the object to other objects? | | E.g., were you trying to distinguish an object vs nothing, | a bicycle vs a fish, a bird vs a squirrel, or two different | species of songbird at a feeder? | | How much would the training requirements increase or | decrease moving up or down that scale? | ilaksh wrote: | The PaLM 2 stuff released yesterday has fine tuning for their | newest large models as a core feature. | moffkalast wrote: | Until they actually make any of it available in anything but an | obscure expensive API you have to request access to, they might | as well not even exist. | r_thambapillai wrote: | there are many services that integrate with them that would | allow you to self-serve signup | williamstein wrote: | The landing page says "Easy integration via standard APIs | Claude can be incorporated into any product or toolchain | you're building with minimal effort." Then there is a big | button "Request Access", which for me right now just does | nothing. OpenAI has really faced the pain to make their | product available via an API to the general public at scale, | but Anthropic/Google/etc. don't quite seem to be there yet. | It's frustrating. | chaxor wrote: | I don't think the person you're responding to wants a | network based or cloud based solution. | | When someone says they want it available they mean running | on their own device. | | This is hackernews, nearly everyone on this site should | have their own self hosted LLM running on a computer/server | or device they have at their house. | | Relying on 'the cloud' for everything makes us worse | developers in just about every imaginable way, creates a | ton of completely unnecessary and complicated source code, | and creates far too many calls to the internet which are | unnecessary. Using local hard drives for example is | thousands of times faster than using cloud data storage, | and we should take advantage of that in the software we | write. So instead of making billions of calls to download a | terabyte database query-by-query (seen this 'industry- | standard' far too many times), maybe make _one_ call and | build it locally. This is effectively the same problem in | LLMs /ML in general, and the same incredible stupidity is | being followed. Download the model once, run your queries | locally. That's the solution we should be using. | akiselev wrote: | Try a browser or a clean profile without any ad blocking | turned on. It took me a couple of tries to figure out how | to get it working but you should see a modal with a form | when it works. | | FYI the waitlist form submits a regular POST request so | it'll reload the main page instead of closing the modal | dialog. I opened network monitor with preserved logs to | double check that I made it on the list :facepalm: | dkarras wrote: | I've been using it through poe and I prefer it to ChatGPT but | can't pinpoint why. It just "gets" me better I guess? | winstonprivacy wrote: | Don't forget the ability to fine tune based on one's own data | sources. For me, this is more important than any of the six | reasons you mentioned. | ianhawes wrote: | We use Anthropic Instant in production and it has been much | faster than Davinci/GPT4 for awhile. In terms of quality, | Instant is at least as good as GPT3.5. | [deleted] | timsuchanek wrote: | Curious what this will mean for the vector db vendors. Imagine | finetuning would be quick and cheap. Could there be a world where | vector dbs aren't needed anymore? | shri_krishna wrote: | 100k context limit is still a limit (we have no idea how | Anthropic is achieving this - if it is extension of the base | model context limit itself or some vector db trickery in the | backend or probably even RAG). Even in this example, though it | could fit entire text of Great Gatsby it still is 1 | book/text/document. Typical business use cases require | searching through hundreds if not thousands of documents/books | and finding similar vector embeddings through all of them and | fetching top-K results (this is how Google search works when it | has to scan through embeddings for billions of websites). These | top-K results can be stuffed into the 100k context limit and | produce an even more holistic picture rather than just stuff | one book/pdf/file into the context. Depends on the requirements | though. I don't see how it might affect vector db vendors who | can process billions of vectors per query and provide top-K | results. | | Also having a massive context length is not necessarily a good | thing from perspective of cost. It also doesn't work great with | a chatbot as you will have to feed the same 100k worth context | back into the chatbot for every question which will turn out to | be very expensive. At some point you will have to discard some | parts of the context to be specific to the question being asked | and that is where vector embeddings come into play. For one off | research/Q&A 100k limit works great! | PeterisP wrote: | All I see in the link is empty PR claims - is there any | information about _how_ they 're doing that? There are all kinds | of known techniques that "expand" context window without really | doing so, with different tradeoffs, and unless they provide | actual information, any claims should be taken with a pile of | salt, we shouldn't just assume that they actually have "true" | 100k context windows. | justanotheratom wrote: | This is nice, but it can get quite expensive. | | Let's say I have a book and I want to ask multiple questions | about it. Every query will pay the price of the book's text. It | would be awesome if I could "index" the book once, i.e. pay for | the context once, and then ask multiple questions. | mikrl wrote: | The analogy I can think of here is a pointer, but AFAIK the | context would always need to go along with the prompt unless | you could tweak internal state to bias towards the context. | | Otherwise, it might make sense to have a separate routine which | compresses the context as efficiently as possible. Auto | encoder? | wahnfrieden wrote: | Not sure about this one but you can usually ask multiple | questions in one shot at least | minimaxir wrote: | Generation is more expensive than the prompt input (for | Claude v1, generation is 3x the cost; for GPT-4 it's 2x the | cost) | | It makes the economics slightly trickier. | newhouseb wrote: | I wonder why this is? Naively there's no difference between | the two from a transformer standpoint. | | Perhaps it's because under the hood there's additional | safety analysis/candidate generate that is resource | intensive? | pyth0 wrote: | Normally the inputs are padded out to the context length | [1] and so the cost to embed 1 token or N tokens is the | same. The output is produced token-by-token and so the | amount of GPU time increases with the number of output | tokens. | | [1] I'm not sure if these huge context lengths are | achieved the same way (i.e. a single input vector of | length N) but given the cost is constant for input I | would assume the resource usage is too. | newhouseb wrote: | This doesn't match my mental model (or implemented model | in the case of GPT2) of how self-attention works (you | need to calculate the residual stream for each individual | token, attending to all prior tokens before it). Have a | link? | pyth0 wrote: | I work on infrastructure for serving large language | models but I don't have any background in ML, so my | perspective is looking at these models as a black box | (and also conversations with the people that do the ML | stuff). It is the case in practice at least from a | latency side that with a fixed context length N, | embedding any number of tokens from 0 to N takes the same | amount of time. Perhaps it's a difference between the | conceptual and actual implementation on GPU? | | _edit_ - This occurred to me after the fact but I wonder | if the difference is that the use case I work with is | processing batches of many different embedding requests | (but computed in one batch), therefore it has to process | `min(longest embedding, N)` tokens so any individual | request in theory has no difference. This would also be | the case for Anthropic however. | newhouseb wrote: | Ah, you're thinking about embeddings which are basically | the encoder stack on a traditional transformer | architecture. Modern GPT-like models (including Claude), | however, drop the encoder and use decoder-only | architectures. | | I could imagine something where encoders pad up to the | context length because causal masking doesn't apply and | the self attention has learned to look across the whole | context-window. | sebzim4500 wrote: | Everyone serious batches together short prompts so the | cost is roughly proportional to the tokens. | space_fountain wrote: | Well each additional token generated requires rerunning | the model right to find the next likely token given the | previous one | newhouseb wrote: | Naively, yes, but you can cache the bulk of that | "rerunning" [1]. That said the (non-flash) attention | costs go up with the length of the sequence so perhaps | this is just a simpler way to approximate these costs. | | [1] https://kipp.ly/blog/transformer-inference- | arithmetic/ | tikkun wrote: | With embeddings, you essentially can. Group the book into | sections, embed each section, then when you do a prompt, add in | the N most similar embedded sections to your prompt. | adamgordonbell wrote: | What if the question is "What are the main themes of this | work?" | | Or anything where the question answer isn't 'close' to the | words used in the question? | | How well does this work vs giving it the whole thing as a | prompt? | | I assume worse but I'm not sure how this approach compares to | giving it the full thing in the prompt or splitting it into N | sections and running on each and then summarizing. | summarity wrote: | That is solved by hypothetical embeddings. | | Background: https://summarity.com/hyde | | Demo: https://youtu.be/elNrRU12xRc?t=1550 (or try it on | findsight.ai and compare results of the "answer" vs the | "state" filter) | | For even deeper retrieval consider late interaction models | such as ColBERT | akiselev wrote: | Any material comparing the different embedding models? | I'm working on information retrieval from government | documents and without any ML experience it's daunting | jtlicardo wrote: | You pretty much summed up the drawbacks of the embeddings | approach. In my experience it's pretty hard to extract the | relevant parts of text, especially when the text is | uniform. | abraxas wrote: | You could do multi level summaries etc but yeah this is all | just band aids around token limits. | Spivak wrote: | I don't think it's as much of a band-aid as it first | appears since this roughly mimics how a human would do | it. | | The problem is that humans have continuous information | retrieval and storage where the current crop of embedding | systems are static and mostly one shot. | crucialfelix wrote: | Humans have limited working memory, they quickly forget | short term memory (unless it's super significant) and our | long term memory fades selectively if not reactivated or | significant (intense). | | This weird leaky memory has advantages and disadvantages. | Forgetting is useful, it removes garbage. | | Machine models could vary the balance of temporal types, | drop out Etc. We may get some weird behavior. | | I would guess we will see many innovations in how memory | is stored in systems like these. | make3 wrote: | Yes, caching the states of the sequence would make sense. An | issue is that it's still more expensive to compute the new | tokens even if you cache the states viewed so far | fdgsdfogijq wrote: | The price on this will plummet over the next few years, the | economic benefits are too large | moffkalast wrote: | The economic benefits of mining asteroids are also too large | to ignore yet here we are, levelling villages to dig for | coal. | | Just a few manufacturers hold the effective cartel monopoly | on LLM acceleration and you best bet they will charge out the | ass for it. | modernpink wrote: | Market competition and innovation in both ML and hardware | has consistently driven down the price of AI in the past | decade. You only have to look at where we are with | capabilities today compared to ten years ago when CIFAR100 | classifiers were the state of the art. | | Barring a Chinese invasion of Taiwan, these APIs will halve | in price over the next year. | [deleted] | moffkalast wrote: | Well here's to hoping I guess. | skybrian wrote: | I'm wondering what level you're thinking. Cloud vendors? | GPU vendors? Fabs? | moffkalast wrote: | Given what's used right now to my knowledge, the main | ones would be Nvidia's tensor cores, Apple's M chips and | Google's cloud TPUs. All of that's TSMC I think? | nr2x wrote: | Yes, but physics trumps economics. | pyth0 wrote: | This more or less is already a thing and it's called RAG | [1][2]. It essentially allows you to have a database of | embeddings (in this case your book) from which a model can pull | knowledge from while producing answers. As for the standard | operation of these generative models, the context window is the | only working memory it has and so it must see the entire text | each time. | | [1] https://arxiv.org/abs/2005.11401 | | [2] https://huggingface.co/docs/transformers/model_doc/rag | m1sta_ wrote: | Cam you help me understand this? The research appears to be | from a few years ago. Can this be used with Claude (for | example)? How is it different to the approach many people are | taking with vector stores and embeddings? | make3 wrote: | it's not different. RAG is a way to train embedding stores | end to end | pyth0 wrote: | Other people seem to be suggesting that the user would do | the retrieval of the relevant parts of the book from a | vectordb first, and then feed those sections along with the | question as the prompt. Conceptually it is very similar | (and it too uses vector database), but with RAG it would | happen as part of the inferencing pipeline and therefore | achieve better performance than the end user emulating it. | [deleted] | helen___keller wrote: | This seems like it could be a game changer. Modern LLM based | applications face a balancing act of context limitations, which | often results in some kind of mapreduce-type behavior when that | context can't fit the input | | If contexts keep growing, the landscape of LLM application | engineering will as well | whimsicalism wrote: | The problem is there are no public benchmarks usually so it is | hard to really compare on long context lengths to see if they | are still performing equally intelligent of tasks. | terabytest wrote: | How does Claude stack up to GPT-4? | tempusalaria wrote: | Would be great to see some benchmarks on how loss changes across | this very large context. It's been technically possible to do | 1mln+ token context for some time with performance deterioration | so it would be interesting to see how this compares to those | efforts | Imnimo wrote: | >For example, we loaded the entire text of The Great Gatsby into | Claude-Instant (72K tokens) and modified one line to say Mr. | Carraway was "a software engineer that works on machine learning | tooling at Anthropic." When we asked the model to spot what was | different, it responded with the correct answer in 22 seconds. | | This sort of needle-in-a-haystack retrieval is definitely | impressive, and it makes a lot more sense to achieve this in- | context rather than trying to use a vector database if you can | afford it. | | I'm curious, though, whether there are diminishing returns in | terms of how much _analysis_ the model can do over those 100k | tokens in a single forward pass. A human reading modified-Gatsby | might eventually spot the altered line, but they 'd also be able | to answer questions about the overarching plot and themes of the | novel, including ones that cannot be deduced from just a small | number of salient snippets. | | I'd be curious to see whether huge-context models are also able | to do this, or if they start to have trouble when the bottleneck | becomes reasoning capacity rather than input length. I feel like | it's hard to predict one way or the other without trying it, just | because LLMs have already demonstrated a lot of surprising | powers. | fzliu wrote: | I'm also not entirely convinced by "huge" context models just | yet, especially as it relates to fuzzy knowledge such as | overarching themes or writing style. | | In particular, there are 0 mentions of the phrase "machine | learning" in The Great Gatsby, so adding one sentence that | introduces the phrase should be easy for self-attention to pick | out. | EGreg wrote: | This sounds like all the other skepticism about what AI can | do. And then it can spot 200x more than any human and | correlate it into common themes, and you'll say what? | devmor wrote: | Doing more than a human can isn't impressive. Most computer | problems for any purpose can do more of something, or | something faster than a human can. | | A better comparison would be if it can pick out any | differences that can't be picked out by more traditional | and simple algorithms. | EGreg wrote: | Or course it can very soon, since those were also written | by humans. Like AlphaZero vs Rybka | chaxor wrote: | It does, using this method. | | My immediate thought as well was '... Yeah, well vimdiff | can do that in milliseconds rather than 22 seconds' - but | that's obviously missing the point entirely. Of course, | we need to tell people to use the right tool for the job, | and that will be more and more important to remind people | of now. | | However, it's pretty clear that the reason they used this | task is to give something simple to understand what was | done in a very simple example. Of course it can do more | semantic understanding related tasks, because that's what | the model does. | | So, without looking at the details we all know that it | can summarize full books, give thematic differences | between two books, write what a book may be like if a | character switch from one book to another is done, etc. | | If it _doesn 't_ do these things (not just badly, but | can't at all) I would be surprised. If it does them, but | badly, I wouldn't be surprised, but it also wouldn't be | mind bending to see it do better than any human at the | task as well. | lumost wrote: | I'd be more impressed if it could rewrite Mr. Carraway as an | ML engineer in the entire novel. However it's not | intrinsically clear that it cannot do this... | | It'll be tough to find good benchmarks on long context | windows. A human cannot label using 100k tokens of context. | zooch wrote: | My thoughts exactly - rewrite the novel with Mr. Carraway | as an ML engineer while maintaining themes/motifs (possible | adding new ones too). I'm guessing what's impressive is | that these are the first steps towards something like this? | Or is it already possible? Someone please correct me here. | SkyPuncher wrote: | Further, the problem with this example is it relies on a | comparison against public data. | | Most of these AI start failing pretty hard when you ask it to | do the same task on something completely novel to it (like a | company document). Sometimes they'll get it right. Other times, | they'll spit out gibberish that's clearly some generic answer. | dmix wrote: | I'd imagine working with an entire company document would | require a lot more hand holding and investment in prompt | engineering. You can definitely get better results if you add | much more context of what you're expecting and how the LLM | should do it. Treating these LLMs as just simple Q&A machines | is usually not enough unless you're doing simple stuff. | nomel wrote: | > Most of these AI | | This is as meaningful as saying most of the hominids can't | count. You can't usefully generalize AI models with the rate | of change that exists right now. Any statements/comparisons | about AI has to contain specific models and versions, | otherwise it's increasingly irrelevant noise. | robotresearcher wrote: | Asking to spot the difference between a given document and an | unseen document is impossible. | lkbm wrote: | A couple years ago, I read Superfudge by Judy Blume, a book | originally published in 1980. In it, the protagonist writes | a letter to Santa: "Please bring me one or more of the | following items. A clock-radio, a remote-controlled model | airplane, a laptop computer, an MP3 player and six CD's." | | I didn't need to have seen this book before to know this | wasn't in the original 1980s text. | | Similarly, if I were reading the Great Gatsby for the first | time, and it identified a character as a software engineer, | I would notice. | drusepth wrote: | I think there are plenty of humans who wouldn't notice, | though. | | And probably plenty of AI implementations that would | notice. | tunesmith wrote: | I've been curious about this for a while, I have a hobby use- | case of wanting to input in-progress novellas and then asking | it questions about plot holes, open plot threads, and if new | chapter "x" presents any serious plot contradiction problems. I | haven't tried exploring that with a vectordb-embeddings | approach yet. | make3 wrote: | This is an exact example of something a vector dbs would be | terrible at. | | Vector dbs work by fetching segments that are similar in | topics to the question, so like "Where did <Character> go | after <thing>" will retrieve segments with locations & the | character & maybe talking about <thing> as a recent event. | | Your question has no similarity with the segments required in | any way; & it's not the segments that are wrong it's the way | they relate to the rest of the story | HarHarVeryFunny wrote: | Do the OpenAI APIs support converting prompts to vectors, | or are people running their own models locally to do this? | Can you recommend any good resources to read up on vector | DB approaches to working around context length limits ? | toss1 wrote: | Good points - LLMs are ok at finding things that exist, but | they have zero ability to abstract and find what is missing | (actually, probably negative; they'd likely hallucinate and | fill in the gaps). | | Which makes me wonder if the opposite, but more laborious | approach might work - request it identify all characters | and plot themes, then request summaries of each. You'd have | to review the summaries for holes. Lotsa work, but still | maybe quicker than re-reading everything yourself? | TeMPOraL wrote: | > _LLMs are ok at finding things that exist, but they | have zero ability to abstract and find what is missing | (actually, probably negative; they 'd likely hallucinate | and fill in the gaps)._ | | I feel this is mostly a prompting issue. Specifically | GPT-4 shows surprising ability to abstract to some degree | and work with high-level concepts, but it seems that, | quite often, you need to guide it towards the right | "mode" of thinking. | | It's like dealing with a 4 year old kid. They may be | perfectly able to do something you ask them, but will | keep doing something else, until you give them specific | hints, several times, in different ways. | vidarh wrote: | Firstly, I don't at all agree that they have zero ability | to abstract. Doesn't fit my experience at all. A lot of | the tasks I use ChatGPT for is exactly to analyse gaps in | specifications etc. And have it tell me what is missing, | suggest additions or ask for clarifications. It does that | just fine. | | But I've started experimenting with the second part, of | sorts, not to find plot holes but to have it create | character sheets for my series of novels for my own | reference. | | Basically have it maintain a sheet and feed it chunks of | one or more chapters and asking it to output an a new | sheet augmented with the new details. | | With a 100K context window I might just test doing it | over while novels or much larger chunks of one. | sashank_1509 wrote: | How are LLM's increasing their context size? I guess you just | increase input size if it's for the self supervised GPT3 style | training but for RLHF? Are they creating datasets of books to | input to the LLM and then making human labelers label the | response? There might be a smart way that does not involve new | datasets | sp332 wrote: | Mosaic wrote about their new model here. | https://www.mosaicml.com/blog/mpt-7b It was trained on 65k | inputs and has decent performance working with 80k+ tokens. | potatoman22 wrote: | I don't think RLHF datasets need to take full advantage of the | context window. There's also many ways to programatically | generate NLP datasets. | mark_l_watson wrote: | With quadratic time complexity for context size, that gets | expensive. | ginger2016 wrote: | How do I sign-up? What is the cost? | karmasimida wrote: | Going to be absolutely expensive. | gigel82 wrote: | Nice, that's roughly a 250-page book based on average word | counts. | maxutility wrote: | I don't see this in the article. Has Anthropic explained the | mechanism by which they were able to cost-effectively expand the | context window, and whether there was additional training or a | design decision (e.g. alternative positional embedding approach) | that helped the model optimize for a larger window? | cheeselip420 wrote: | Maybe this model can finish Winds of Winter and the rest of GoT | for us... | babuloseo wrote: | Add Berserk to that list. | azakai wrote: | 75,000 words is a drop in the bucket for A Song of Ice and | Fire: | | https://blog.fostergrant.co.uk/2017/08/03/word-counts-popula... | akiselev wrote: | You'd want to generate it in multiple steps to make it | feasible to control the text generation anyway. First call | generates the broad outline, several parallel calls flesh out | character development and some other details so that they're | consistent, then generate the story piece by piece by feeding | in bits of the outline. | nottorp wrote: | And then you end up with what the movie did which is not | exactly a GRRM novel. | camel-cdr wrote: | Meanwhile web serial authors: [0] [1] | | [0] https://wanderinginn.neocities.org/statistics | | [1] https://www.reddit.com/r/Parahumans/comments/rz8ogt/wildb | ows... | thepasswordis wrote: | That's actually a really interesting use case! | pclmulqdq wrote: | That may need a million tokens just for one book, though! | f6v wrote: | I'd be excited for Dexter ending that doesn't suck. | [deleted] | gumballindie wrote: | I am noticing a different tone coming from Anthropic. Unlike | openai they dont appear to be focused on fud and replacement. | Gives the impression it's run by adults instead of crypto bros | turned ai experts. Curious how their models will work. | lubesGordi wrote: | Um Ilya Sutskever isn't a crypto bro. | gumballindie wrote: | No but sam altman is. That company can go whistling. | Workaccount2 wrote: | Is there any path towards folding tokens into the actual model? | That is, continual training rather than the current "training | first then just tokens after" | ilaksh wrote: | PaLM 2 on Vertex AI which Google just released yesterday has | fine tuning the large models as a core part of their offering. | whimsicalism wrote: | We need public benchmarks. | | This is incredibly fast progress on large contexts and I would | like to see if they are actually attending equally as well to all | of the information or there is some sparse approximation leading | to intelligence/reasoning degradation. | monlockandkey wrote: | https://lmsys.org/blog/2023-05-10-leaderboard/ | | https://chat.lmsys.org/?arena | | Claude by Anthropic has more favourable responses then ChatGPT | Workaccount2 wrote: | ChatGPT3.5* | | It's still below GPT4, but it is closer to 4 than 3.5 | polishdude20 wrote: | So I tried this prompt in their chatbot arena multiple times. | Each time getting the wrong answer: | | "Given that Beth is Sue's sister and Arnold is Sue's father | and Beth Junior is Beth's Daughter and Jacob is Arnold's | Great Grandfather, who is Jacob to Beth Junior?" | jefftk wrote: | Is the right answer pointing out that Arnold might not be | Beth's father, and so Beth Junior might be unrelated to | Jacob? | svachalek wrote: | I just tried it and gpt-3.5-turbo got it right. | nynx wrote: | There has got to be a number of fascinating tricks that they're | using to support context lengths that long. Shame it's all | closed-source. | sweezyjeezy wrote: | Can LLMs take advantage of this bigger window to solve meaningful | tasks though? I can't imagine in the training data, knowing what | happened 100k tokens ago would be _that_ relevant to predicting | the current token very often, so unless this is something that | the model learns to leverage more implicitly, I'd be a bit | pessimistic. | ttul wrote: | Yes. For instance, a large context window allows you to have a | chat for months where the model can remember and make use of | everything you've ever talked about. That enables creating a | much more effective "assistant" that can remember key details | months later that may be valuable. | | A second example is the analysis of long documents. Today, | hacks like chunking and HyDE enable us to ask questions about a | long document or a corpus of documents. But is far superior if | the model can ingest the whole document and apply attention to | everything, rather than just one chunk at a time. Chunking | effectively means that the model is limited to drawing | conclusions from one chunk at a time and cannot synthesize | useful responses relating to the entire document. | m3kw9 wrote: | Gets pricier as you chat for longer, imagine having to chat a | line with a history with 20k token. | sweezyjeezy wrote: | I'm not questioning whether it would be useful, just whether | it's actually something that token masking in training is | going to work to make the model learn this. | woeirua wrote: | It remains to be seen just how effective longer contexts are | because if the attention vectors don't ever learn to pick up | specific items from further back in the text then having more | tokens doesn't really matter. | | Given that the conventional cost of training attention layers | grows quadratically with the number of tokens I think | Anthropic is doing some kind of approximation here. Not clear | at all that you would get the same results as vanilla | attention. | ttul wrote: | They did mention that the inference time to answer a | question about the book was something like 22 seconds, so | perhaps they are indeed still using self-attention. | SomewhatLikely wrote: | I would guess that semantic similarity would be the stronger | training signal than distance once you go beyond a sentence or | two away. | sweezyjeezy wrote: | I'm pretty dubious - how would the model not get absolutely | swamped by the vast amount of potential context if it's not | learning to ignore long range signals for the most part? | [deleted] | dr_dshiv wrote: | I often prefer Claude over GPT4 (partially due to speed), but it | degrades more quickly. Like I can get a better response early, | but usually the quality drops faster. But, sometimes if it can | really vibe with it, it gets better over time. | ilaksh wrote: | Did anyone else get on the waitlist, get in, and now their | console link doesn't work? I remember deciding the code | generation wasn't good enough to bother. Not sure if I actually | ever activated it but I guess not. | | Now I tried to request access again on their form and it just | redirected. Can't even tell if that worked. | | Does anyone know if this can program as well as GPT-4? Because if | so then the larger context window is a big improvement. | M4v3R wrote: | I do have access to it and from my very limited testing it | looks like it can program at least on par with GPT-3.5. I | didn't have time yet to test it more comprehensively against | GPT-4. | ilaksh wrote: | OK great thanks that's what I heard. Very interested to hear | about comparisons with GPT-4. | ablyveiled wrote: | What's the catch? Using GPT-4 relative to its own marketing copy | was a letdown. | SeanAnderson wrote: | big if true? :) | | Exciting to see competition across LLMs for increasing context | window size. | | I can't find updated pricing anywhere. Previous prices are here: | https://cdn2.assets-servd.host/anthropic-website/production/... | but don't seem to be embedded directly on the Anthropic website. | I tried messing with the URL (apr -> may/jun) but 404'ed. | kordlessagain wrote: | > Exciting to see competition across LLMs for increasing | context window size. | | Maybe. I think the debate is going to continue about prompt | optimization vs. context window size. | | A while ago, I had a rather interesting conversation with | GPT-3.5 about forgetting things. Knowing what to forget, or | delete from the prompt, may be just as important as what to put | in it. | | Putting the kitchen sink into the prompt probably isn't going | to help much, past a certain point and it may be putting | certain things in there based on time and context is a better | strategy. | SeanAnderson wrote: | Yeah, there's definitely diminishing returns. I just wanted | to talk to ChatGPT about a game I'm developing. I have pages | upon pages of product design notes and I'm not able to just | copy/paste the whole thing in and start talking to it at 8k | context length. There's not really duplicate information as | far as I can tell since each section covers new topics. I'm | sure there's a way to express the same ideas more succinctly, | but I kind of want ChatGPT to do that for me rather than me | figuring out how to do that just to interface the ideas into | it. | seydor wrote: | so i m going to just paste a few physics book and ask it "make | fusion" | | What is the approach to increase the sequence length here? | [deleted] | [deleted] | swiftcoder wrote: | > When we asked the model to spot what was different, it | responded with the correct answer in 22 seconds. | | Now we've gone from using ML to implement slow, unreliable | databases, to using ML to implement slow, unreliable string | comparison, I guess | we_never_see_it wrote: | Google is really trying to catch up to OpenAI & MS. The truth is | they have never been in the race to begin with. All they had and | still have is PR stunts. Let's see if their copying of MS model | will produce anything useful. | oars wrote: | Google has multiple horses in this race. | | They invested $300m in Anthropic in late 2022: | https://www.ft.com/content/583ead66-467c-4bd5-84d0-ed5df7b5b... | | (Non-paywall: https://archive.is/Y5A9B) | thewataccount wrote: | > The truth is they have never been in the race to begin with. | | Product race? My understanding is they've been so concerned | with safety/harm that they've been slow to implement a lot of | tools - then OpenAI made an attempt at it anyway. | | Google has generally been ahead from a research perspective | though. And honestly it's going to be really sad if they just | stop releasing papers outright - hopefully the release their | previous gen stuff as they go :/ | andreyk wrote: | Curious why you think this? PaLM2 looks great, and Google has | been productizing cutting edge AI pretty fast for years. | sebzim4500 wrote: | I guess PaLM2 is competitive with GPT-3.5 so for people not | willing to pay it will be an attractive offering. | | I'm not sure that counts as 'great' though. | rsstack wrote: | Based on what do you think it's comparable to GPT-3.5 and | not to 4? Did we see a lot of public performance? | sebzim4500 wrote: | They claim it is already being used in Bard, also if you | read the paper it does much worse at the important | benchmarks. | MacsHeadroom wrote: | PaLM 2 can't even solve "Write three sentences ending in the | word Apple." | | It's worse than GPT-3.5. Go see for yourself at | bard.google.com, which is running on PaLM 2 everywhere but | the EU as of yesterday. | Garrrrrr wrote: | Ah yes, the famous benchmark for all LLMs. I just tried | your novel example with GPT-3.5 and it couldn't solve it | either: | | > After lunch, I like to snack on a juicy and crisp apple | to satisfy my sweet tooth. | | > In the fall, many families enjoy going to apple orchards | to pick their own apples and make homemade apple pies. | | > The new MacBook Pro features a powerful M1 chip and a | stunning Retina display, making it the perfect tool for | creative professionals who work with Apple software. | mustacheemperor wrote: | Eh, I think as "human evaluated" metrics go, it's a | decent test of how well it can parse a reasonably complex | sentence and reply accurately. | | For me: | | GPT4 3/3: I couldn't resist the temptation to take a bite | of the juicy, red apple. Her favorite fruit was not a | pear, nor an orange, but an apple. When asked what type | of tree to plant in our garden, we unanimously agreed on | an apple. | | GPT3.5 2/3: "After a long day of hiking, I sat under the | shade of an apple tree, relishing the sweet crunch of a | freshly picked apple." "As autumn approached, the air | filled with the irresistible aroma of warm apple pie | baking in the oven, teasing my taste buds." "The teacher | asked the students to name a fruit that starts with the | letter 'A,' and the eager student proudly exclaimed, | 'Apple!'" | | Bard 0/3: Sure, here are three sentences ending in the | word "apple": I ate an apple for breakfast.The apple tree | is in bloom. The apple pie was delicious. Is there | anything else I can help you with? | | Bard definitely seems to fumble the hardest, it's pretty | funny how it brackets the response too. "Here's three | sentences ending with the word apple!" nope. | | Edit: Interesting enough, Bard seems to outperform GPT3.5 | and at least match 4 on my pet test prompt, asking it | "What's that Dante quote that goes something like "before | me there were no something, and only something | something." 3.5 struggled to find it, 4 finds it | relatively quickly, Bard initially told me that quote | isn't in the poem but when I reiterated I couldn't | remember the whole thing it found it immediately and | sourced the right translation. It answered as if it were | reading out of a specific translation too - "The source I | used was..." Is there agent behavior under the hood of | bard or is just how the model is trained to communicate? | kernal wrote: | OpenAI is the Microsoft Explorer of AI. | endisneigh wrote: | I don't know how anyone can say this with a straight face when | Google is the one who invented LLMs as used today to begin | with. | | Google has a product issue, not an AI research one. | cubefox wrote: | DeepMind and Google invented many other things, but I think | the first GPT style token predictor was actually ... GPT, a | model by OpenAI. RLHF was also invented at OpenAI. They also | had the first text-to-image model. | onlyrealcuzzo wrote: | It's usually the least informed with the most self-assured | sweeping opinions. | darig wrote: | [dead] | meghan_rain wrote: | The most interesting bit is that for the first time since the | release of ChatGPT in December 2022, OpenAI does not have the | lead on LLMs anymore. | | At least, for people who need large context windows, they would | not be the first choice anymore. | sebzim4500 wrote: | GPT-4 still leads in the chatbot arena[1] but at least it is a | two horse race now. | | [1] https://lmsys.org/blog/2023-05-10-leaderboard/ | refulgentis wrote: | Claude's very quietly better on everything but pricing, for a | while, it just got buried because they announced on "AI | Tuesday" (iirc gpt4 and Bing announcement day) | | The ChatGPT equivalent is 3x speed and was somewhere between | ChatGPT and GPT4 on my TriviaQA benchmark replication I did | | Couple tweets with data and examples. Note they're from 8 weeks | ago, I know Claude got a version bump, GPT3.5/4 accessible via | API seem the same. | | [1] brief and graphical summary of speed and TriviaQA | https://twitter.com/jpohhhh/status/1638362982131351552?s=46&... | | [2] ad hoc side by sides | https://twitter.com/jpohhhh/status/1637316127314305024?s=46&... | com2kid wrote: | > I know Claude got a version bump, GPT3.5/4 accessible via | API seem the same. | | GPT3.5 just got an update a few days ago that resulted in a | pretty good improvement on its creativity. I saved some | sample outputs from the previous March model, and for the | same prompt the difference is quite dramatic. Prose is much | less formulaic overall. | ndr_ wrote: | Is this update made visible somewhere? The language models | offered on my Playground are still the ones from March, | same with ChatGPT. | refulgentis wrote: | Thank you, every little comment I get from fellow boots on | the ground is so valuable, lotta noise these days. | | Random Q, I don't use the ChatGPT front end much past month | or two, used it a week back and it seemed blazingly faster | than my integration: do you have a sense of if it got | faster too? | ilaksh wrote: | How is the code generation of Claude? | esafak wrote: | And is code generation ability equivalent to code | understanding and search ability? | technics256 wrote: | I have access to claude. It's not bad, but decently behind | gpt4 for code | refulgentis wrote: | Note, all impressions based on Claude 1.2, got an email | from Poe in the last week saying it was version bumped to | 1.3 with a focus on coding improvements. | | Impressions: | | Bad enough compared to GPT-4 that I default to GPT-4. I | think if I had api access I'd use it instead, right now it | requires more coaxing, and using Poe. | | I did find "long-term" chats went better, was really | impressed with how it held up when I was asking it a nasty | problem that was hard to even communicate verbally. Wrong | at first, but as I conversed it was a real conversation. | | GPT4 seems to circle a lower optima. My academic guess it's | what Anthropic calls it "sycophancy" in its papers, tldr | GPT really really wants to do more like what's in the | context, so the longer the conversation with initial errors | goes, it's actually harder to talk it out of the errors. | flerovium wrote: | It means nothing as long as they don't actually let us test the | API. | | Good luck waiting for it. | jackson1372 wrote: | See the pricing PDF[^1] and API docs[^2], but TL;DR: | | - Price per token doesn't change compared to regular models | | - Existing api users have access now by setting the `model` | param to "claude-v1-100k" or "claude-instant-v1-100k" | | - New customers can join waitlist at anthropic.com/product | | [1]: https://cdn2.assets-servd.host/anthropic- | website/production/... [2]: | https://console.anthropic.com/docs/api/reference#parameters | nr2x wrote: | "POC or GTFO" as the security people say. :-) | qwertox wrote: | The day a quantum computer is able to host a huge LLM, things | will get really interesting for humanity. | | I say this, because I'm not sure how all of this is really going | to scale on GPUs. It feels like LLM's are just as magical as | quantum computing. | gdiamos wrote: | Nice. Will we be able to get to 1M tokens? | programmarchy wrote: | Seems like a good target. Even 100K seems too small. As a | reference point, the Bible is ~750,000 words. | smallerfish wrote: | "You are a hebrew god and below the dashes is The Word. Who | will you smite today?" | vrglvrglvrgl wrote: | [dead] | jacooper wrote: | Anthropic is basically Google's openAI. | cubefox wrote: | It's not a Google company, their share amounts to ~10%. | m3kw9 wrote: | Is this real input context or is it some vectordb in the | background type trickery? | HarHarVeryFunny wrote: | Pretty sure it's not "real" (model) context width. | | Another wide context model is MosaicML's | MPT-7B-StoryWriter-65k+ which they are describing as having a | context width of 65k, but then give a bit more detail to say | they are using ALiBi - a type of positional encoding that | allows longer contexts at inference time than training (i.e | beyond the real context width of the model). | | For these types of "extended context" models to actually reason | over inputs longer than the native context width of the model, | I _assume_ that there is indeed some sort of vector DB trickery | - maybe paging thru the input to generate vector DB content, | then using some type of Retrieval Augmented Generation (RAG) to | process that using the extended contexts ? | | Maybe someone from Anthropic or MosaicML could throw us a bone | and give a bit more detail of how these are working ! | | https://www.mosaicml.com/blog/mpt-7b | | https://arxiv.org/abs/2005.11401 | [deleted] | minimaxir wrote: | No pricing, but given that OpenAI's GPT-4 doubles the cost-per- | token if you go from 8k to a 32k context window, I suspect the | pricing here will be 2-4x from the base Claude model which is 9k: | https://cdn2.assets-servd.host/anthropic-website/production/... | | Although with flash attention, who knows if marginal cost scales | that consistently. | adamkochanowicz wrote: | https://cdn2.assets-servd.host/anthropic-website/production/... | minimaxir wrote: | Those are the same SKUs I linked. | | The new model are a different model identified that's not | listed in the pricing doc, although it sounds like the intent | may be to replace the base from looking at the API docs: http | s://console.anthropic.com/docs/api/reference#-v1-complet... | f_devd wrote: | <4x would be quite optimistic, at ~11x the tokens the amount of | compute/memory required would be n^10 (even with the lower | starting point of flash attention) so unless they are already | have excessive margins it wouldn't make much sense to go that | low. | sp332 wrote: | I was assuming they used a different architecture to get the | increase instead of just letting it eat hardware that way. | Especially with the speed numbers in the post. | l1n wrote: | Pricing is the same as the base model. | jimsimmons wrote: | Confirmation here: | | https://twitter.com/AnthropicAI/status/1656743460769259521?s. | .. | minimaxir wrote: | Huh. Well that changes things. | rat9988 wrote: | Only for the duration of the beta | jimsimmons wrote: | Source? | felixgallo wrote: | the actual tweet you linked. | jimsimmons wrote: | It doesn't say exclusively for the beta period | scoopertrooper wrote: | With an extremely literal reading you are correct, but | there was clearly an implication. | alpark3 wrote: | I use GPT-4 through the API, but I can't help but hate the | token/characterization pricing of these LLM APIs we've seen so | far. Because the entire context needs to be fed back into the | model, as my conversation gets longer, it gets more expensive. | Yeah, it's fractions of a cent and cheaper, but something about | it is so psychologically taxing that I'd rather pay a flat | sum/month and get unlimited access, even if it costs more | considering my usage. | WA wrote: | Have you tried to start a new chat after your first question, | but refine your new prompt to include some infos you gathered | from the first response? This way, you know exactly how many | tokens you gonna send. | RoddaWallPro wrote: | I requested & have been waiting for access to Claude for nearly 3 | months now. Guess the waitlist must be really long... | jazzyjackson wrote: | API access or just access to the chatbot? | | You can go through Poe.com | technics256 wrote: | You likely got rejected. Was the same for me and I reapplied | with a good use case and was let in | melvinmelih wrote: | > You can drop multiple documents or even a book into the prompt | and then ask Claude questions that require synthesis of knowledge | across many parts of the text. | | This is cool but does it also work the other way around? Generate | a book's worth of content based on a single prompt? | cubefox wrote: | That's a good question. Can Claude write a coherent book? | Chabsff wrote: | Kinda. But it`s going to be a lot like how data compression | works. There will always be a somewhat fundamental limit to how | much "creativity" you can get out of a small prompt generating | large texts when using an isolated model. | worik wrote: | Their sign up form does not let me sign in for early access. | | A bit disappointing | skilled wrote: | My wallet is hardly capable of handling 8k GPT-4. | ibitto wrote: | Anyone using Claude? How long did it take you to get access? | harisec wrote: | Claude is available for free in the Poe app (poe.com). I think | it's good and underappreciated. | danysdragons wrote: | It is good, but the free subscription to Poe only provides | access to Claude Instant. It's impressively fast but not | their smartest model (claude-v1.3). | dkarras wrote: | yeah, been using it instead of ChatGPT and it performs better | IMO. My conversational LLM of choice for sure. | Mizza wrote: | I've got access, it's _blazing_ fast and seems very good. | Solved some of my little puzzles that other models couldn't. I | haven't tried ChatGPT-4 yet, but it's the best one that I have | used. | thewataccount wrote: | You need to try GPT4 only because GPT3.5 really doesn't | compare to it in a lot of ways. | iEchoic wrote: | GPT-4 is a major leap ahead of everything else I've used | (including GPT-3.5), so definitely worth trying for | comparison. | pk-protect-ai wrote: | Ok. It has spatial comprehension of some level. Unlike GPT-4 it | lacks proper time comprehension because it is bad at calculus. | Unlike GPT-4 it can't properly solve traveling salesman problem. | com2kid wrote: | I am curious how consistent Claude is at obeying detailed | instructions. One issue ChatGPT 3.5 and 4 have, even with just a | few hundred words of instructions, is it forgets instructions | given to it earlier on.[1] | | This huge context window is awesome though, I'm trying to use | LLMs to do small town social interaction simulations, with output | in a structured format. Finding ways to compress existing state | and pass it around, so the LLM knows the current state of what | people in the town did for a given day is hard with a tiny token | limit! | | [1] For my use cases, early instructions tend to be describing a | DSL syntax for responses, if I add too much info after the | instructions, the response syntax starts getting wonky! | rescripting wrote: | A simple example I ran in to was I asked ChatGPT to generate me | story in madlibs format for my 4 year old daughter. They're in | the format "The young _____ went to the ______, ...", and she | fills in the blanks with silly nouns/adjectives. | | As she kept asking for more, I prompted "great, do another one" | and eventually my original instruction fell out of the context | window. It continued to generate a children's story, but with | no more blanks. | com2kid wrote: | This is actually a different issue, largely a UI one, | although one I wish ChatGPT would fix it. | | There is no good way to tell it "this isn't a conversation, | just repeat the answer to the initial prompt again". | | The solution is to just re-paste the initial prompt in each | time, but still it isn't ideal. There isn't a good way to | tell chatgpt "you can throw away all the context after the | initial prompt and up until now". | | Of course the entire point of ChatGPT is that it maintains a | conversation thread, so I get why they don't fix up this edge | case. | | My problem is more of, I give ChatGPT some complicated | instructions, and it'll start forgetting the early on | instructions long before any token limit is reached. | | So for example, if early on I ask for certain tokens to be | returned in parens, well my initial prompt is too long, it'll | forget the parens thing and start returning tokens without | the surrounding (), which then breaks my parser! | orost wrote: | Almost every UI for LLMs I've seen has a way to specify an | initial prompt that never goes out of context, it's strange | that it's not a feature in ChatGPT. | throwaway012919 wrote: | Sounds expensive. I guess we know where the $580M 'investment' | from SBF is going now. ___________________________________________________________________ (page generated 2023-05-11 23:00 UTC)