[HN Gopher] Could you train a ChatGPT-beating model for $85k and...
       ___________________________________________________________________
        
       Could you train a ChatGPT-beating model for $85k and run it in a
       browser?
        
       Author : sirteno
       Score  : 297 points
       Date   : 2023-03-31 18:21 UTC (4 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | nwoli wrote:
       | What we need is a RETRO style model where basically after the
       | input you go through a small net that just fetches a desired set
       | of weights from a server (serving data without compute is dirt
       | cheap) and is then executed locally. We'll get there eventually
        
         | tinco wrote:
         | Can anyone explain or link some resource on why these big GPT
         | models all don't incorporate any RETRO style? I'm only very
         | superficially following ML developments and I was so hyped by
         | RETRO and then none of the modern world changing models apply
         | it.
        
           | nwoli wrote:
           | Openai might very well be using that internally who knows how
           | they implement things. Also emad retweeted a RETRO related
           | thing a bit back so they might very well be using that for
           | their awaited LM, here's hoping
        
       | ushakov wrote:
       | Now imagine loading 3.9 GB each time you want to interact with a
       | webpage
        
         | KMnO4 wrote:
         | Yeah, I've used Jira.
        
           | neilellis wrote:
           | :-)
        
         | sroussey wrote:
         | 10yrs from now models will be in the OS. Maybe even in silicon.
         | No downloads required.
        
           | swader999 wrote:
           | The OS will be in the cloud interfacing into our brain by
           | then. I don't want this btw.
        
           | pessimizer wrote:
           | Not in mine. I don't even want redhat's bullshit in there.
           | I'm not installing some black box into my OS that was
           | programmed with _motives_ that can 't be extracted from the
           | model at rest.
        
             | sroussey wrote:
             | iOS already has this to a degree, for a couple of years.
        
       | brrrrrm wrote:
       | The WebGPU demo mentioned in this post is insane. Blows any WASM
       | approach out of the water. Unfortunately that performance is not
       | supported anywhere but chrome canary (behind a flag)
        
         | raphlinus wrote:
         | This will be changing soon. I believe Chrome M113 is scheduled
         | to ship to stable on May 2, and will support WebGPU 1.0. I
         | agree it's a game-changing technology.
        
       | ChumpGPT wrote:
       | [dead]
        
       | agnokapathetic wrote:
       | > My friends at Replicate told me that a simple rule of thumb for
       | A100 cloud costs is $1/hour.
       | 
       | AWS charges $32/hr for an 8xA100s (p4d.24xlarge) which comes out
       | to $4/hour/gpu. Yes you can get lower pricing with a 3 year
       | reservation but thats not what this question is asking.
       | 
       | You also need 256 nodes to be colocated on the same fabric --
       | which AWS will do for you but only if you reserve for years.
        
         | pavelstoev wrote:
         | model-depending, you can train on lesser (cheaper) GPUs but
         | system-level optimizations are needed. Which is what we provide
         | at centml.ai
        
         | sebzim4500 wrote:
         | Maybe they are using spot instances? $1/hr is about right for
         | those.
        
         | thewataccount wrote:
         | AWS certainly isn't the cheapest for this, did they mention
         | using AWS? Lamdba Labs is 12$/hr for 8xA100's, and there's
         | others relatively close to this price on demand, I assume you
         | can get a better deal if you contact them for a large project.
         | 
         | Replicate themselves rent out GPU time so I assume they would
         | definitely know as that's almost certainly the core of their
         | business.
        
         | IanCal wrote:
         | Lambda labs charges about 11-12/hr for 8xA100.
        
           | robmsmt wrote:
           | and is completely at capacity
        
             | IanCal wrote:
             | But reflects an upper bound at the cost of running a100s.
        
         | celestialcheese wrote:
         | lambdalabs will let you do on-demand 8xa100 @ 80GB VRAM/GPU for
         | $12/hr, or reserved @ $10.86/hr
         | 
         | 8xA100 @ 40gb for $8/hr
         | 
         | Replicate friend isn't far off.
        
       | pavelstoev wrote:
       | Training a ChatGPT-beating model for much less than $85,000is
       | entirely feasible. At CentML, we're actively working on model
       | training and inference optimization without affecting accuracy,
       | which can help reduce costs and make such ambitious projects
       | realistic. By maximizing (>90%) GPU and platform hardware
       | utilization, we aim to bring down the expenses associated with
       | large-scale models, making them more accessible for various
       | applications. Additionally, our solutions also have a positive
       | environmental impact, addressing the excess CO2 concerns. If
       | you're interested in learning more about how we are doing it,
       | please reach out via our website: https://centml.ai
        
       | astlouis44 wrote:
       | WebGPU is going to be a major component in this. Modern GPU's
       | prevalent in mobile devices, desktops and laptops, is more than
       | enough to do all of this client side.
        
       | nope96 wrote:
       | I remember watching one of the final episodes of Connections 3:
       | With James Burke, and he casually said we'd have personal
       | assistants that we could talk to (in our PDAs). That was 1997 and
       | I knew enough about computers to think he was being overly
       | optimistic about the speed of progress. Not in our lifetimes.
       | Guess I was wrong!
        
       | TMWNN wrote:
       | Hey, that means it can be turned into an Electron app!
        
       | breck wrote:
       | Just want to say SimonW has become one of my favorite writers
       | covering the AI revolution. Always fun thought experiments with
       | linked code and very constructive for people thinking about how
       | to make this stuff more accessible to the masses.
        
       | fswd wrote:
       | There is somebody finetunin 160m rwkv4 on alpaca on the rwkv
       | discord, I am out of the office and can't link but the person
       | posted in prompt showcase channel
        
         | buzzier wrote:
         | RWKV-v4 Web Demo (169m/430m params)
         | https://josephrocca.github.io/rwkv-v4-web/demo/
        
       | skybrian wrote:
       | I wonder why anyone would want to run it in a browser, other than
       | to show it could be done? It's not like the extra latency would
       | matter, since these things are slow.
       | 
       | Running it on a server you control makes more sense. You can pick
       | appropriate hardware for running the AI. Then access it from any
       | browser you like, including from your phone, and switch devices
       | whenever you like. It won't use up all the CPU/GPU on a portable
       | device and run down your battery.
       | 
       | If you want to run the server at home, maybe use something like
       | Tailscale?
        
         | simonw wrote:
         | The browser thing is definitely more for show than anything
         | else - I used it to help demonstrate quite how surprisingly
         | lightweight these models can be.
        
       | GartzenDeHaes wrote:
       | It's interesting to me that LLaMA-nB's still produce reasonable
       | results after 4-bit quantization of the 32-bit weights. Does this
       | indicate some possibility of reducing the compute required for
       | training?
        
       | lxe wrote:
       | Keep in mind that image transformer models like stable diffusion
       | are generally smaller than language models, so they are easier to
       | fit in wasm space.
       | 
       | Also. you can finetune llama-7b on a 3090 for about $3 using
       | LoRA.
        
         | bitL wrote:
         | Only for images. People want to generate videos next and those
         | models will be likely GPT-sized.
        
           | Metus wrote:
           | There is a video model making the rounds on
           | /r/stablediffusion and it is just a tiny bit larger than
           | Stable Diffusion.
        
             | isoprophlex wrote:
             | You're not kidding! it's far from perfect, but pretty funny
             | still...
             | 
             | https://www.reddit.com/r/StableDiffusion/comments/126xsxu/n
             | i...
             | 
             | Too bad SD learned the Shutterstock watermark so well, lol
        
             | bitL wrote:
             | It's cool though not very stable in details over temporal
             | axis.
        
         | danielbln wrote:
         | Generative image models don't use transformers, they're
         | diffusion models. LLMs are transformers.
        
           | lxe wrote:
           | Ah yes that's right. Well they technically do use a visual
           | transformer for CLIP text encoder as I understand.
        
       | jedberg wrote:
       | With the explosion of LLMs and people figuring out ways to
       | train/use them relatively cheaply, unique data sets will become
       | that much more valuable, and will be the key differentiator
       | between LLMs.
       | 
       | Interestingly, it seems like companies that run chat programs
       | where they can read the chats are best suited to building "human
       | conversation" LLMs, but someone who manages large text datasets
       | for others are in the perfect place to "win" the LLM battle.
        
       | captaincrowbar wrote:
       | The big problem with AI R&D is that nobody can keep up with the
       | big bux companies. It makes this kind of project a bit pointless.
       | Even if you can run a GPT3-equivalent on a web browser, how many
       | people are going to bother (except as a stunt) when GPT4 is
       | available?
        
         | simonw wrote:
         | An increasingly common complaint I'm hearing about GPT3/4/etc
         | is people who don't want to pass any of their private data to
         | another company.
         | 
         | Running models locally is by far the most promising solution
         | for that concern.
        
         | adeon wrote:
         | The ones that can't use the GPT4 for whatever reason. Maybe you
         | are a company and you don't want to send OpenAI your prompts.
         | Or a person who has very private prompts and feel sketchy about
         | sending them over.
         | 
         | Or maybe you are an individual who has a use case that's too
         | edgy for OpenAI or a silicon valley corporate image. When
         | Replika shut down people trying to have virtual
         | boyfriend/girlfriends on their platform, their reddit filled up
         | with people who mourned like they just lost a partner.
         | 
         | I think it's important that alternative non-big bux company
         | options exist, even if most people don't want to or need to use
         | them.
        
           | moffkalast wrote:
           | Or maybe you're in Italy and OpenAI had just been banned from
           | the country for not adhering to GDPR. I suspect the rest of
           | the EU may follow soon.
        
           | psychphysic wrote:
           | Those are seriously niche use cases. They exist but can they
           | fund gpt5 level development?
        
             | r00fus wrote:
             | Most corporations/governments would prefer to keep their AI
             | conversations private. Definitely mainstream desire, not
             | niche.
        
               | psychphysic wrote:
               | Who does your government and corporate email? In the UK
               | it's all either Gmail (for government) and Outlook (NHS).
               | For compliance reasons they simply want data center
               | certification and location restrictions.
               | 
               | If you think a small corp is going to get a big gov
               | contract outside of a nepo-state you're in for a shock.
        
             | adeon wrote:
             | Given the Replika debacle, I personally suspect the AI
             | partner use case is not really very niche. Just few people
             | openly want to talk about wanting it because having an
             | emotional AI partner is seen as creepy.
             | 
             | And companies would not want to do that. Imagine you make
             | partner AI that goes unhinged like Bing did and tells you
             | to kill yourself or something similar. I can't imagine
             | companies would want that kind of risk.
        
               | [deleted]
        
               | psychphysic wrote:
               | If you AI partner data can't be stored in an Azure or
               | similar data centre you are a serious small niche person!
               | 
               | Even Jennifer Lawrence stored her nudes on iCloud.
        
       | make3 wrote:
       | Alpaca uses knowledge distillation (it's trained on outputs from
       | OpenAI models). It's something to keep in mind. You're teaching
       | your model to copy an other model's outputs.
        
         | thewataccount wrote:
         | > You're teaching your model to copy an other model's outputs.
         | 
         | Which itself was trained on human outputs to do the same thing.
         | 
         | Very soon it will be full Ouroboros as humans use the model's
         | output to finetune themselves.
        
         | visarga wrote:
         | > You're teaching your model to copy an other model's outputs.
         | 
         | That's a time honoured tradition in ML, invented by the father
         | of the field himself, Geoffrey Hinton, in 2015.
         | 
         | > Distilling the Knowledge in a Neural Network
         | 
         | https://arxiv.org/abs/1503.02531
        
       | thih9 wrote:
       | > as opposed to OpenAI's continuing practice of not revealing the
       | sources of their training data.
       | 
       | Looks like that choice makes it more difficult to adopt, trust,
       | or collaborate on the new tech.
       | 
       | What are the benefits? Is there more to that than competitive
       | advantage? If not, ClosedAI sounds more accurate.
        
       | holloworld wrote:
       | [dead]
        
       | whalesalad wrote:
       | Are there any training/ownership models like Folding@Home? People
       | could donate idle GPU resources in exchange for access to the
       | data, and perhaps ownership. Then instead of someone needing to
       | pony up $85k to train a model, a thousand people can train a
       | fraction of the model on their consumer GPU and pool the results,
       | reap the collective rewards.
        
         | dekhn wrote:
         | A few people have built frameworks to do this.
         | 
         | There is still a very large open problem in how to federate
         | large numbers of loosely coupled computers to speed up training
         | "interesting" models. I've worked in both domains (protein
         | folding via Folding@Home/protein folding using supercomputers,
         | and ML training on single nodes/ML training on supercomputers)
         | and at least so far, ML hasn't really been a good match for
         | embarrassingly parallel compute. Even in protein folding,
         | folding@home has a number of limitations that are much better
         | addressed on supercomputers (for example: if your problem
         | requires making extremely long individual simulations of large
         | proteins).
         | 
         | All that could change, but I think for the time being,
         | interesting/big models need to be trained on tightly coupled
         | GPUs.
        
           | whalesalad wrote:
           | Probably going to mirror the transition from single-threaded
           | to multi-threaded compute. Took a while until application
           | architectures took hold of the populous to utilize multi-
           | core.
        
             | PaulDavisThe1st wrote:
             | Probably not. Multicore has been a thing for 30 years (We
             | had a 32 core Sequent Systems and a 64 core KSR-1 at UW
             | CS&E in the early 1990s). Everything about these models has
             | been developed in a multicore computing context, and thus
             | far, it still isn't massively-parallel-distributable. An
             | algorithm can be massively parallel without being sensibly
             | distributable. Change the latency between compute nodes is
             | not always a neutral or even just linear decrease in
             | performance.
        
           | itissid wrote:
           | And you can rule out most of the monte carlo stuff too. Which
           | rules out parallelization modern statistical frameworks like
           | STAN used for explainable models; things like Finance
           | modeling of risk which is a sampling of posteriors using MCMC
           | also can't be parallelized.
        
             | MontyCarloHall wrote:
             | Assuming the chains can reach an equilibrium point (i.e.
             | burn in) quickly, M samples from an MCMC can be
             | parallelized by running N chains in parallel each for M/N
             | iterations. You still end up with M total samples from your
             | target distribution.
             | 
             | You're only out of luck if each iteration is too compute
             | intense to fit on one worker node, even if each iteration
             | might be embarrassingly parallelizable, since the overhead
             | of having to aggregate computations across workers at every
             | iteration would be too high.
        
         | neoromantique wrote:
         | How long until somebody creates a crypto project on that?
        
           | buildbuildbuild wrote:
           | Bittensor is one, not an endorsement. chat.bittensor.com
        
         | ellisv wrote:
         | That'd be cool but I don't think most idle consumer GPUs
         | (6-8GB) would have large enough memory for a single iteration
         | (batch size 1) of modern LLMs.
         | 
         | But I'd love to see more federated/distributed learning
         | platforms.
        
           | whalesalad wrote:
           | Is it possible to break the model apart? Or does the entire
           | thing need to be architected from the get-go such that an
           | individual GPU can own a portion end to end?
        
           | mirekrusin wrote:
           | 6GB can store 3 billion parameters, gpt3.5 has 175 billion
           | parameters.
        
         | mirekrusin wrote:
         | Unfortunately training is not emberassingly parallelisable [0]
         | problem. It would require new architecture. Current models
         | diverge too fast. By the time you'd download and/or calculate
         | your contribution the model would descend somewhere else and
         | your delta would not be applicable - based off wrong initial
         | state.
         | 
         | It would be great if merge-ability would exist. It would also
         | likely apply to efficient/optimal shrinking for models.
         | 
         | Maybe you could dispatch tasks to train on many variations of
         | similar tasks and take average of results? It could probably
         | help in some way, but you'd still have large serialized
         | pipeline to munch through and you'd likely require some serious
         | hardware ie. dual gtx 4090 on client side.
         | 
         | [0] https://en.wikipedia.org/wiki/Embarrassingly_parallel
        
           | amitport wrote:
           | hmmm... seems like you're reinventing distributed learning.
           | 
           | merge-ability does exist and you can average the results.
        
             | mirekrusin wrote:
             | You can if you have same base weights.
             | 
             | If you have similar variants of the same task you can
             | accelerate it more where the diff is.
             | 
             | You can't average on past results computed from historic
             | base weights - it's linear process.
             | 
             | If you could do that, you'd just map training examples to
             | diffs and merge them all.
             | 
             | Or take two distinct models and merge them to have model
             | that is roughly sum of them. You can't do it, it's not
             | linear process.
        
         | _trampeltier wrote:
         | Start a Boinc project.
         | 
         | https://boinc.berkeley.edu/projects.php
        
         | spyder wrote:
         | Learning@Home using Decentralized Mixture-of-Expert models:
         | 
         | https://learning-at-home.github.io/
         | 
         | https://training-transformers-together.github.io/
         | 
         | https://arxiv.org/abs/2002.04013
        
         | ftxbro wrote:
         | Yes there is petals/bloom https://github.com/bigscience-
         | workshop/petals but it's not so great. Maybe it will improve or
         | a better one will come.
        
           | whalesalad wrote:
           | Really interesting live monitor of the network:
           | http://health.petals.ml
        
           | polishdude20 wrote:
           | I wonder how they handle illegal content. Like, if you're
           | running training data on your computer, what's to stop
           | someone else's data that is illegal, from being uploaded to
           | your computer as part of training?
        
           | riedel wrote:
           | I read that it is only scoring the model collaboratively but
           | it allows some fine-tuning I guess.
           | 
           | Getting the actual gradient descent to parallelize is more
           | difficult because one needs to average the gradient when
           | using data/batch parallelism. It becomes more a network speed
           | than GPU speed problem. Or are LLMs somehow different?
        
       | ultrablack wrote:
       | If you could, you should have done it 6 months ago.
        
         | munk-a wrote:
         | I mean - is there a developer alive that'd be unable to write
         | the nascent version of Twitter? I think that Twitter as a
         | business exists entirely because of the concept - the code to
         | cover the core functionality is absolutely trivial to
         | replicate.
         | 
         | I don't think this is a very helpful statement because actually
         | finding the idea on what to build is the hard part - or even
         | just believing it's possible. The company I work at has been
         | using NLP for years now and we have a model that's great at
         | what we do... but if you asked if we could develop that into a
         | chatbot as functional as chatgpt two years ago you'd probably
         | be met with some pretty heavy skepticism.
         | 
         | Cloning something that has been proven possible is always
         | easier than taking the risk building the first version with no
         | real grasp of feasibility.
        
       | v4dok wrote:
       | Can someone at the EU, the only player in this thing with no
       | strategy yet just pool together enough resources so the open-
       | source people can train models. We don't ask much, just give
       | compute power
        
         | 0xfaded wrote:
         | No, that could risk public money benefitting a private party.
         | 
         | Feel free to form a multinational consortium and submit a grant
         | application to one of our distribution partners under the
         | Horizon program though.
         | 
         | Now, how do you plan to create jobs and reduce CO2?
        
       | alecco wrote:
       | Interesting blog but the extrapolations are way overblown. I
       | tried one of the 30bn models and it's not even remotely close to
       | GPT-3.
       | 
       | Don't get me wrong, this is very interesting and I hope more is
       | done in the open models. But let's not over-hype by 10x.
        
       | lmeyerov wrote:
       | It seems the quality goes up & cost goes down significantly with
       | Colossal AI's recent push:
       | https://medium.com/@yangyou_berkeley/colossalchat-an-open-so...
       | 
       | Their writeup makes it sounds like, net, 2X+ over Alpaca, and
       | that's an early run
       | 
       | The browser side is interesting too. Browser JS VMs have a memory
       | cap of 1GB, so that may ultimately be the bottleneck here...
        
         | SebJansen wrote:
         | does the 1gb limit extend to wasm?
        
           | jesse__ wrote:
           | WASM is specified to have 32-bit pointers, which is 4GB.
           | AFAIK browser implementations respect that (when I did some
           | nominal testing a couple years ago)
        
         | lmeyerov wrote:
         | Interesting, since I looked last year, Chrome has started
         | raising the caps internally on buffer allocation to potentially
         | 16GB:
         | https://chromium.googlesource.com/chromium/src/+/2bf3e35d7a4...
         | 
         | Last time I tried on a few engines, it was just 1-2GB for typed
         | arrays, which are essentially the backing structure for this
         | kind of work. Be interesting to try again..
         | 
         | For our product, we actually want to dump 10GB+ on to the WebGL
         | side, which may or may not get mirrored on the CPU side. Not
         | sure if additional limits there on the software side. And after
         | that, consumer devices often have another 10GB+ CPU RAM free,
         | which we'd also like to use for our more limited non-GPU stuff
         | :)
        
         | jesse__ wrote:
         | I thought the memory limit (in V8 at least) was 2GB due to the
         | GC not wanting to pass 64 bit pointers around, and using the
         | high bit of a 32-bit offset for .. something I now forget ..?
         | 
         | Do you have a source showing a JS runtime with a 1GB limit?
        
           | jesse__ wrote:
           | UPDATE: After a nominal amount of googling around it appears
           | valid sizes have increased on 64-bit systems to a maximum of
           | 8GB, and stayed at 2GB on 32-bit systems, for FF at least. I
           | guess it's probably 'implementation defined'
           | 
           | https://developer.mozilla.org/en-
           | US/docs/Web/JavaScript/Refe...
           | 
           | https://developer.mozilla.org/en-
           | US/docs/Web/JavaScript/Refe...
        
       | JasonZ2 wrote:
       | Does anyone know how the results from a 7B parameter model with
       | bloomz.cpp (https://github.com/NouamaneTazi/bloomz.cpp) compares
       | to the 7B parameter Alpaca model with llama.cpp
       | (https://github.com/ggerganov/llama.cpp)?
       | 
       | I have the latter working on a M1 Macbook Air with very good
       | results for what it is. Curious if bloomz.cpp is significantly
       | better or just about the same.
        
       | rspoerri wrote:
       | So cool it runs on a browser /sarcasm/ i might not even need a
       | computer. Or internet when we are at it.
       | 
       | It either runs locally or it runs on the cloud. Data could come
       | from both locations as well. So it's mostly technically
       | irrelevant if it's displaying in a browser or not.
       | 
       | Except when it comes to usability. I don't get it why people love
       | software running in a browser. I often close important tools i
       | have not saved when it's in a browser. I cant have offline tools
       | which work if i am in a tunnel (living in Switzerland this is an
       | issue) . Or it's incompatible because i am running LibreWolf.
       | 
       | /sorry to be nitpicking on this topic ;-)
        
         | ftxbro wrote:
         | > I don't get it why people love software running in a browser.
         | 
         | If you read the article, part of the argument was for the
         | sandboxing that the browser provides.
         | 
         | "Obviously if you're going to give a language model the ability
         | to execute API calls and evaluate code you need to do it in a
         | safe environment! Like for example... a web browser, which runs
         | code from untrusted sources as a matter of habit and has the
         | most thoroughly tested sandbox mechanism of any piece of
         | software we've ever created."
        
           | rspoerri wrote:
           | OSX does app sandboxing as well (not everywhere). But yeah,
           | you're right i only skimmed the content and missed that part.
        
           | rspoerri wrote:
           | Thinking about it...
           | 
           | I don't know exactly about the browser sandboxing. But isn't
           | it's purpose to prevent access to the local system, while it
           | mostly leaves access to the internet open?
           | 
           | Is that really a good way to limit and AI system's API
           | access?
        
             | simonw wrote:
             | The same-origin policy in browsers defaults to preventing
             | JavaScript from making API calls out to any domain other
             | than the one that hosts the page - unless those other
             | domains have the right CORS headers.
             | 
             | https://developer.mozilla.org/en-
             | US/docs/Web/Security/Same-o...
        
         | sp332 wrote:
         | Broswer software is great because I don't have to build
         | separate versions for Windows, Mac, and Linux, or deal with app
         | stores, or figure out how to update old versions.
        
         | pmoriarty wrote:
         | There are a bunch of reasons people/companies like web apps:
         | 
         | 1 - Everyone already has a web browser, so there's no software
         | to download (or the software is automatically downloaded,
         | installed and run, if you want to look at it that way... either
         | way, the experience is a lot easier and more seamless for the
         | user)
         | 
         | 2 - The website owner has control of the software, so they can
         | update it and manage user access as they like, and it's easier
         | to track users and usage that way
         | 
         | 3 - There are a ton of web developers out there, so it's easier
         | to find people to work on your app
         | 
         | 4 - You ostensibly don't need to rewrite your app for every OS,
         | but may need to modify it for every supported browser
        
           | rspoerri wrote:
           | Most of these aspects make it better for the company or
           | developer, only in some cases it makes it easier for the user
           | in my opinion. Some arguments against it are:
           | 
           | 1 - Not everyone has or wants fast access to the internet all
           | the time.
           | 
           | 2 - I try to prevent access of most of the apps to the
           | internet. I don't want companies to access my data or even
           | metadata of my usage.
           | 
           | 3 - sure, but it doesn't make it better for the user.
           | 
           | 4 - Also supporting different screen sizes and interaction
           | types (touch or mouse) can be a big part of the work.
           | 
           | The most important part for a user is if he/she is only using
           | the app rarely or once. Not having to install it will make
           | the difference between using it or not. However with the app
           | stores most OS's feature today this can change pretty soon
           | and be equally simple.
           | 
           | I might be old school on this, but i resent subscription
           | based apps. For applications that do not need to change,
           | deliver no additional service or aren't absolutely vital for
           | me i will never subscribe. And browser based app's are at the
           | core of this unfortunate development. But that's gone very
           | far from the original topic :-)
        
         | nanidin wrote:
         | Browser is the true edge compute.
        
       | fzliu wrote:
       | I was a bit skeptical about loading a _4GB_ model at first. Then
       | I double-checked: Firefox is using about 5GB of memory for me. My
       | current open tabs are mail, calendar, a couple Google Docs, two
       | Arxiv papers, two blog posts, two Youtube videos, milvus.io
       | documentation, and chat.openai.com.
       | 
       | A lot of applications and developers these days take memory
       | management for granted, so embedding a 4GB model to significantly
       | enhance coding and writing capabilities doesn't seem too far-
       | fetched.
        
       | munk-a wrote:
       | A wonderful thing about software development is that there is so
       | much reserved space for creativity that we have huge gaps between
       | costs and value. Whether the average person could do this for 85k
       | I'm uncertain of - but there is a very significant slice of
       | people that could do it for well under 85k now that the ground
       | work has been done. This leads to the hilarious paradox where a
       | software based business worth millions could be built on top of
       | code valued around 60k to write.
        
         | nico wrote:
         | > This leads to the hilarious paradox where a software based
         | business worth millions could be built on top of code valued
         | around 60k to write.
         | 
         | Or the fact that software based businesses just took a massive
         | hit in value overnight and cannot possibly defend such high
         | valuations anymore.
         | 
         | The value of companies is quickly going to shift from tech
         | moats to brands.
         | 
         | Think CocaCola - anyone can create a drink that tastes as good
         | or better than coke, but it's incredibly hard to compete with
         | the CocaCola brand.
         | 
         | Now think what would have happened if CocaCola had been super
         | expensive to make, and all of a sudden, in a matter of weeks,
         | it became incredibly cheap.
         | 
         | This is what happened to the saltpeter industry in 1909 when
         | synthetic saltpeter was invented. The whole industry was
         | extinct in a few years.
        
         | prerok wrote:
         | Nit: not to write but to run. The cost of development is not
         | considered in these calculations.
        
       | ftxbro wrote:
       | His estimate is that you could train a LLaMA-7B scale model for
       | around $82,432 and then fine-tune it for a total of less than
       | $85K. But when I saw the fine tuned LLaMA-like models they were
       | worse in my opinion even than GPT-3. They were like GPT-2.5 or
       | like that. Not nearly as good as ChatGPT 3.5 and certainly not
       | ChatGPT-beating. Of course, far enough in the future you could
       | certainly run one in the browser for $85K or much less, like even
       | $1 if you go far enough into the future.
        
         | icelancer wrote:
         | Yeah, the constant barrage of "THIS IS AS GOOD AS CHATGPT AND
         | IS PRIVATE" screeds from LLaMA-based marketing projects are
         | getting ridiculous. They're not even remotely close to the same
         | quality. And why would they be?
         | 
         | I want the best LLMs to be open source too, but I'm not
         | delusional enough to make insane claims like the hundreds of
         | GitHub forks out there.
        
           | robertlagrant wrote:
           | > I want the best LLMs to be open source too
           | 
           | How do you do this without being incredibly wealthy?
        
             | nickthegreek wrote:
             | crowd source to pay for the gpu rentals.
        
             | mejutoco wrote:
             | Pooling resources a la SETI@home would be an interesting
             | option I would love to see.
        
               | simonw wrote:
               | My understanding is that can work for model inference but
               | not for model training.
               | 
               | https://github.com/bigscience-workshop/petals is a
               | project that does this kind of thing for running
               | inference - I tried it out in Google Collab and it seemed
               | to work pretty well.
               | 
               | Model training is much harder though, because it requires
               | a HUGE amount of high bandwidth data exchange between the
               | machines doing the training - way more than is feasible
               | to send over anything other than a local network
               | connection.
        
             | crdrost wrote:
             | You (1) are a company who (2) understands the business
             | domain and has an appropriate business plan.
             | 
             | Sadly the reality of funding today makes it unlikely that
             | these two will both be simultaneously satisfied. The
             | problem is that history will look back on the necessary
             | business plan and deem it a failure even if it generates a
             | company that does a billion dollars plus in annual revenue.
             | 
             | This is actually not unique to large language models but
             | most innovation around computers. The basic problem is that
             | if you build a force-multiplier (spreadsheets, personal
             | computing, large-language models all come to mind) then
             | what will make it succeed is its versatility: people want a
             | hammer that can be used for smashing all manner of things,
             | not just your company's particular brand of matching nails.
             | And most people will only pick up that hammer once per week
             | or once per month, only like 1% of the economy if that will
             | be totally revolutionized, "we use this force-multiplier
             | every day, it is now indispensable, we can't imagine life
             | without it," and it's never predictable what that sector
             | will be -- it's going to be like "oh, who ever dreamed that
             | the killer application for LLMs would be them replacing
             | AutoCAD at mechanical contractors" or some shit.
             | 
             | In those strange eons, to wildly succeed, one must give up
             | on anticipating all usages of the software, one must cease
             | controlling it and set it free. "Well where's the profit in
             | that?" -- it is that this company was one of the first
             | players in the overall market, they got an early chance to
             | stake out as much territory as possible. But the market
             | exploded way larger than they could handle and then
             | everybody looks back on them and says "wow, what a failure,
             | they only captured 1% of that market, they could have been
             | so much more successful." Yeah, they captured 1% of a $100B
             | market, some failure, right?
             | 
             | But what actually happens is that companies see the
             | potential, investors get dollar signs in their eyes,
             | everyone starts to lock down and control these, "you may
             | use large language models but only in the ways that we say,
             | through the interfaces which we provide," and then the only
             | thing that you can use it for is to get generic
             | conversational advice about your hemorrhoids, so after 5-10
             | years the bubble of excitement fizzles out. Nobody ever
             | dreams to apply it to AutoCAD or whatever, and the world
             | remains unchanged.
        
               | javajosh wrote:
               | History is littered with great software that died because
               | no-one used it because the business model was terrible.
               | Capturing $1B of value is better than 0, and everyone
               | understands this. And who cares what history thinks
               | anyway?
               | 
               | OpenAI has spent a lot of money to get their result. It's
               | safe to assume it will take a lot of money to get a
               | similar result, and then to share it (although I assume
               | bit torrent will be good enough). Once people are running
               | their models, they can innovate to their hearts content.
               | It's not clear how or why they'd give money back to the
               | enabling technology. So how does money flow back to the
               | innovators in proportion to the value produced, if not a
               | SaaS?
        
               | ftxbro wrote:
               | what stage of capitalism is this
        
               | robertlagrant wrote:
               | If those are all that's required, why don't you start a
               | company with a business plan written so it satisfies your
               | criteria? Then you can lead the way with OSS LLMs.
        
             | ftxbro wrote:
             | Yes a rugged individual would have to be incredibly wealthy
             | to do it!
             | 
             | But maybe the governments will make one and maintain it
             | with taxes as an infrastructure service, like roads, giving
             | everyone expanded powers of cognition, memory, and
             | expertise, and raising the consciousnesses of humanity to
             | new heights. Probably in USA it wouldn't happen if we judge
             | ourselves only in zero sum relation to others - helping
             | everyone would be a wash and only waste our money!
        
               | szundi wrote:
               | Some governments probably alread do and use it against
               | so-called terrorists or enemies of the people...
        
         | simonw wrote:
         | Yeah, you're right. I wrote this a couple of weeks ago at the
         | height of LLaMA hype, but with further experience I don't think
         | the GPT-3 comparisons hold weight.
         | 
         | My biggest problem: I haven't managed to get a great
         | summarization out of a LLaMA derivative that runs on my laptop
         | yet. Maybe I haven't tried the right model or the right prompt
         | yet though, but that feels essential to me for a bunch of
         | different applications.
         | 
         | I still think a LLaMA/Alpaca fine-tuned for the ReAct pattern
         | that can execute additional tools would be a VERY interesting
         | thing to explore.
         | 
         | [ ReAct: https://til.simonwillison.net/llms/python-react-
         | pattern ]
        
           | avereveard wrote:
           | my biggest problem with these models is that they cannot
           | reliably produce structured data.
           | 
           | even davinci can be used as part of a chain, because you can
           | direct it to structure and unstructure data, and then extract
           | the single component and build them into tasks. cohere, llama
           | et al are currently struggling to consistently produce these
           | result reliably, even if you can chat with them and frankly
           | it's not about the chat
           | 
           | example from a stack overflow that split the questions before
           | sending it down chain for answering all points individually:
           | 
           | This is a customer question:
           | 
           | I'm a beginner RoR programmer who's planning to deploy my app
           | using Heroku. Word from my other advisor friends says that
           | Heroku is really easy, good to use. The only problem is that
           | I still have no idea what Heroku does...
           | 
           | I've looked at their website and in a nutshell, what Heroku
           | does is help with scaling but... why does that even matter?
           | How does Heroku help with:                   Speed - My
           | research implied that deploying AWS on the US East Coast
           | would be the fastest if I am targeting a US/Asia-based
           | audience.              Security - How secure are they?
           | Scaling - How does it actually work?              Cost
           | efficiency - There's something like a dyno that makes it easy
           | to scale.              How do they fare against their
           | competitors? For example, Engine Yard and bluebox?
           | 
           | Please use layman English terms to explain... I'm a beginner
           | programmer.
           | 
           | Extract the scenario from the question including a summary of
           | every detail, list every question, in JSON:
           | 
           | { "scenario": "A beginner RoR programmer is planning to
           | deploy their app using Heroku and is seeking advice about
           | deploying it.", "questions": [ "What does Heroku do?", "How
           | does deploying AWS on the US East Coast help with speed?",
           | "How secure is Heroku?", "How does scaling with Heroku
           | work?", "What is a dyno and why is it cost efficient?", "How
           | does Heroku compare to its competitors, such as Engine Yard
           | and Bluebox?" ] }
        
             | newhouseb wrote:
             | Last weekend I built some tooling that you can integrate
             | with huggingface transformers to force a given model to
             | _only_ output content that validates against a JSON schema
             | [1].
             | 
             | The challenge is that for it to work cost effectively you
             | need to be able to append what is basically a final network
             | layer to the model that is algorithmically designed and
             | until OpenAI exposes the full logits and/or some way to
             | modify them on the fly you're going to be stuck with open
             | source models. I've run things against GPT-2 mostly but
             | it's only list to try LLaMA.
             | 
             | [1] "Structural Alignment: Modifying Transformers (like
             | GPT) to Follow a JSON Schema" @
             | https://github.com/newhouseb/clownfish
        
             | simonw wrote:
             | This feels solvable to me. I wonder if you could use fine
             | tuning against LLaMA to teach it to do this better?
             | 
             | GPT-3 etc can only do this because they had a LOT of code
             | included in their training sets.
             | 
             | The LLaMA paper says Github was 4.5% of the training
             | corpus, so maybe it does have that stuff baked in and just
             | needs extra tuning or different prompts to tap into that
             | knowledge.
        
               | avereveard wrote:
               | I have done it trough stages, so first stages emits in
               | natural language in the format of "context: ... and
               | question: ...." and then the second stage collect it as
               | json, but then wait time doubles.
        
           | Tepix wrote:
           | Have you tried bigger models? Llama-65B can indeed compete
           | with GPT-3 according to various benchmarks. The next thing
           | would be to get the fine-tuning as good as OpenAI's.
        
             | mewpmewp2 wrote:
             | I wonder how accurate those benchmarks are in terms of
             | actual problem solving capability. I think there's a major
             | line at which point LLM becomes actually useful and it
             | actually feels like you are speaking to something
             | intelligent and that can be useful for you in terms of
             | productivity etc.
        
       | version_five wrote:
       | If you have ~100k to spend, aren't there options to buy a gpu
       | rather than just blow it all on cloud? How much is an 8xA100
       | machine?
       | 
       | 4xA100 is 75k, 8 is 140k https://shop.lambdalabs.com/deep-
       | learning/servers/hyperplane...
        
         | dekhn wrote:
         | you're comparing the capital cost of acquiring a GPU machine
         | with the operational cost of renting one in the cloud.
         | 
         | Ignoring the operational costs of on-prem hardware is pretty
         | common, but those costs are significant and can greatly change
         | the calculation.
        
           | digitallyfree wrote:
           | For a single unit one could have it in their home or office,
           | rather than a datacenter or colo. If the user sets up and
           | manages the machine themselves there is no additional IT
           | cost. The greatest operating expense would be the power cost.
        
             | dekhn wrote:
             | "If the user sets up and manages the machine themselves
             | there is no additional IT cost" << how much do you value
             | your time?
             | 
             | In my experience, physical hardware has a management
             | overhead over cloud resources. Backups, large disk storage
             | for big models, etc.
        
           | pessimizer wrote:
           | Or from another perspective, comparing the cost of training
           | one model in the cloud to the cost of training as many as you
           | want on your machine, then (as mentioned by siblings) selling
           | the machine for nearly as much as you paid for it, unless
           | there's some shortage, in which case you'll get more back
           | than you paid for it.
           | 
           | One is buying capital that produces models, the other is
           | buying a single model.
        
           | sounds wrote:
           | Remember to discount the tax depreciation for the hardware
           | and deduct any potential future gains from either reselling
           | it or using it.
        
           | capableweb wrote:
           | Heh, you work at AWS or Google Cloud perhaps? ;) (Only joking
           | about this as I constantly see employees from AWS/GCloud and
           | other cloud providers claim that cloud is always cheaper than
           | hosting things yourself)
           | 
           | Sure, if you're planning to service a large number of users,
           | building your infrastructure in-house might be a bit
           | overkill, as you'll need a infrastructure team to service it
           | as well.
           | 
           | If you're just want to buy 4 GPUs to put in one server to run
           | some training yourself, I don't think it's that much
           | overkill. Especially considering you can recover much of the
           | cost even after a year by selling much of the equipment you
           | bought. Most of your losses will be costs for electricity and
           | internet connection.
        
             | throwaway50601 wrote:
             | Cloud gives you very good price for what they offer -
             | excellent reliability, hyper-scalability. Most people don't
             | need either and use it as a glorified VPS host.
        
             | dekhn wrote:
             | I used to work for Google Cloud (I built a predecessor to
             | Preemptible VMs and also launched Google Cloud Genomics).
             | But even before I worked at Google I was a big fan of AWS
             | (EC2 and S3).
             | 
             | Buying and selling hardware isn't free; it comes with its
             | own cost. I would not want to be in the position of selling
             | a $100K box of computer equipment- ever.
        
               | capableweb wrote:
               | :)
               | 
               | True, but some things are harder to sell than others.
               | A100's in today's market would be easy to sell. Harder to
               | buy, because the supply is so low unless you're Google or
               | another big name, but if you're trying to sell them, I'm
               | sure you can get rid of them quickly.
        
           | jcims wrote:
           | No kidding. I worked for a company that had multiple billions
           | of dollars invested in a data center refresh in North America
           | and Europe.
        
           | version_five wrote:
           | For a server farm, sure, for one machine, I don't know.
           | Assuming it plugs into a normal 15A circuit, and you have a
           | we-work or something where you don't pay for power, is the
           | operational cost of one machine really material?
        
             | dekhn wrote:
             | it's hard to tell from what you're saying: you're planning
             | on putting an ML infrastructure training server on a
             | regular 15A circuit, not in a data center or machine room?
             | And power is paid for by somebody else?
             | 
             | My thinking about pricing doesn't include that option
             | because I wouldn't just hook a server like that up to a
             | regular outlet in an office and use it for production work.
             | If that works for you- you can happily ignore my comments.
             | But if you go ahead and build such a thing and operate it
             | for a year, please let us know if there were any costs-
             | either dollar or in suffering- associated with your
             | decision
             | 
             | [edit: adding in that the value of this machine also
             | suggests it cannot live unattended in an insecure location,
             | like an office]
             | 
             | signed, person who used to build closet clusters at
             | universities
        
               | KeplerBoy wrote:
               | Nvidia happily sells what you're describing. They call it
               | "DGX Station A100", it has 4 80GB A100 and retails for
               | 80k. Not sure i believe their claimed noise level of <37
               | dB though.
               | 
               | Of course that's still a very small system when talking
               | LLM training, the only reason why i would not put that in
               | a regular office is it's extreme price. Do you really
               | want something worth 80k in a form factor that could be
               | casually carried through the door?
        
               | amluto wrote:
               | If you live near an inexpensive datacenter, you can park
               | it there. Throw in a storage machine or two (TrueNAS MINI
               | R looks like a credible low-effort option). If your
               | workload is to run a year long computation on it and
               | otherwise mostly ignore it, then your operational costs
               | will be quite low.
               | 
               | Most people who rent cloud servers are not doing this
               | type of workload.
        
         | modernpink wrote:
         | You can sell the A100 after once you're done as well. Possibly
         | even at profit?
        
         | girthbrooks wrote:
         | These are wild pieces of hardware, thanks for linking. I wonder
         | how loud they get.
        
         | sacred_numbers wrote:
         | If you bought an 8xA100 machine for $140k you would have to run
         | it continuously for over 10,000 hours (about 14 months) to
         | train the 7B model. By that time the value of the A100s you
         | bought would have depreciated substantially; especially because
         | cloud companies will be renting/selling A100s at a discount as
         | they bring H100s online. It might still be worth it, but it's
         | not a home run.
        
           | inciampati wrote:
           | If 8-bit training methods take off, I think the calculus is
           | going to change rapidly, with newer cards that have decent
           | amounts of memory and 8-bit acceleration starting to become
           | dramatically more cost and time effective than the venerable
           | A100s.
        
       ___________________________________________________________________
       (page generated 2023-03-31 23:00 UTC)