[HN Gopher] Are GPUs Worth It for ML? ___________________________________________________________________ Are GPUs Worth It for ML? Author : varunkmohan Score : 81 points Date : 2022-08-29 18:34 UTC (4 hours ago) (HTM) web link (exafunction.com) (TXT) w3m dump (exafunction.com) | mpaepper wrote: | This also very much depends on the inference use case / context. | For example, I work in deep learning on digital pathology where | images can be up to 100000x100000pixels in size and inference | needs GPUs as it's just way too slow otherwise. | PeterStuer wrote: | " It feels wasteful to have an expensive GPU sitting idle while | we are executing the CPU portions of the ML workflow" | | What is expensive? Those 3090ti's are looking very tasteful at | current prices. | ummonk wrote: | What a clickbaity article. It's an interesting discussion of GPU | multiplexing for ML inference merged together with a sales pitch | but the clickbait title made me hate the article bait and switch. | This wasn't even an example of Betteridge's law but just | completely misleading headline. | Eridrus wrote: | Is everyone with relevant inference costs not doing this | already? | | I am so confused how there seems to be a startup around having | a work queue that does batching... | Kukumber wrote: | An interesting question, shows how insanely overpriced GPUs still | are, specially in the cloud environment | pqn wrote: | Disclaimer: I work at Exafunction | | I empathize a bit with the cloud providers as they have to | upgrade their data centers every few years with new GPU | instances and it's hard for them to anticipate demand. | | But if you can easily use every trick in the book (CPU version | of the model, autoscaling to zero, model compilation, keeping | inference in your own VPC, using spot instances, etc.) then | it's usually still worth it. | NavinF wrote: | *only in the cloud environment | | Throw some 3090s in a rack and you'll break even in 3 months | mistrial9 wrote: | the HPC crowd are not able to add GPUs, that I know of.. | deepLearning group of algorithms do kick butt for lots of kinds | of problems+data .. though I will advocate that dl is NOT the | only game in town, despite what you often read here | Frost1x wrote: | In what context? HPC and certain code bases have been | effectively leveraging heterogenous CPU GPU workloads for a | variety of applications for quite awhile. I know of some | doing so in at least 2009 and know plenty of prior art was | already there by that point, it's just a specific time I | happen to remember. | fancyfredbot wrote: | There are some pretty elegant solutions out there for the problem | of having the right ratio of CPU to GPU. One of the nicer ones is | rCUDA. | https://scholar.google.com/citations?view_op=view_citation&h... | varunkmohan wrote: | rCUDA is super cool! One of the issues though is for a lot of | the common model frameworks are not supported and a new release | has not come out a while. | fancyfredbot wrote: | Fair point. It's not obvious from the website which model | frameworks does exafunction supports, or when the last | exafunction release was. | PeterisP wrote: | For some reason they focus on the inference, which is the | computationally cheap part. If you're working on ML (as opposed | to deploying someone else's ML) then almost all of your workload | is training, not inference. | cardine wrote: | I have not found this to be true at all in my field (natural | language generation). | | We have a 7 figure GPU setup that is running 24/7 at 100% | utilization just to handle inference. | fartcannon wrote: | How do you train new models if your GPUs are being used for | inference? I guess the training happens significantly less | frequently? | | Forgive my ignorance. | jacquesm wrote: | Typically a different set of hardware for model training. | cardine wrote: | We have different servers for each. But the split is | usually 80%/20% for inference/training. As our product | grows in usage the 80% number is steadily increasing. | | That isn't because we aren't training that often - we are | almost always training many new models. It is just that | inference is so computationally expensive! | dheera wrote: | Also true of self-driving. You train a perception model for a | week and then log millions of vehicle-hours on inference. | MichaelBurge wrote: | Think Google: Every time you search, some model somewhere gets | invoked, and the aggregate inference cost would dwarf even very | large training costs if you have billions of searches. | | Marketing blogspam like this is always targeting big(not | Google, but big) companies hoping to divert their big IT | budgets to their coffers: "You have X million queries to your | model every day. Imagine if we billed you per-request, but | scaled the price so in aggregate it's slightly cheaper than | your current spending." | | People who are training-constrained are early-stage(i.e. | correlate with not having money), and then they need to buy an | entirely separate set of GPUs to support you(e.g. T4s are good | for inference, but they need V100s for training). So they | choose to ignore you entirely. | Jensson wrote: | If you are training models that are intended to be used in | production at scale then training is dirt cheap compared to | inference. There is a reason why Google focused on inference | first with their TPU's even though Google does a lot of ML | training. | dllthomas wrote: | I think another part of the question is whether you're | scaling on your own hardware or the customers' hardware. | gowld wrote: | jldugger wrote: | > If you're working on ML (as opposed to deploying someone | else's ML) then almost all of your workload is training, not | inference. | | Wouldn't that depend on the size of your customer base? Or at | least, requests per second? | karamanolev wrote: | With more customers usually the revenue and profit grow, then | the team becomes larger, wants to perform more experiments, | spends more on training and so on. Inference is just so | computationally cheap compared to training. | | That's what I've seen in my experience, but I concur that | there might be cases where the ML is a more-or-less solved | problem for a very large customer base where inference is | more. I've rarely seen it happen, but other people are | sharing scenarios where it happens frequently. So I guess it | massively depends on the domain. | mgraczyk wrote: | This depends a lot on what you're doing. If you are ranking 1M | qps in a recommender system, then training cost will be tiny | compared to inference. | lajamerr wrote: | I wonder if there's room for model caching. Sacrifice some | personalization for near similar results so you aren't | hitting the model so often. | mgraczyk wrote: | Yeah we did lots of things like this at Instagram. Can be | very brittle and dangerous though to share any caching | amongst multiple users. If you work at Facebook you can | search for some SEVs related to this lol | varunkmohan wrote: | Agreed that there are workloads where inference is not | expensive, but it's really workload dependent. For applications | that run inference over large amounts of data in the computer | vision space, inference ends up being a dominant portion of the | spend. | PeterisP wrote: | The way I see it, generally every new data point (on which | the production model inference gets run once) becomes part of | the data set which then gets used in training every next | model, processing the same data point many more times in | training, thus training unavoidably taking more effort than | inference. | | Perhaps I'm a bit biased towards all kinds of self-supervised | or human-in-the-loop or semi-supervised models, but the | notion of discarding large amounts of good domain-specific | data that get processed _only_ for inference and not used for | training afterward feels a bit foreign to me, because you | usually can extract an advantage from it. But perhaps that 's | the difference between data-starved domains and overwhelming- | data domains? | varunkmohan wrote: | Yup, exactly. It's a good point that for self-supervised | workloads, the training set can become arbitrarily large. | For a lot of other workloads in the vision space, most data | needs to be labeled to be able to used for training. | pdpi wrote: | There's one piece of the puzzle you're missing: field- | deployed devices. | | If I play chess on my computer, the games I play locally | won't hit the Stockfish models. When I use the feature on | my phone that allows me to copy text from a picture, it | won't phone home with all the frames. | version_five wrote: | What you say re saving all data is the ideal. I'd add a | couple caveats, one is that in many fields you often get | lots of redundant data that adds nothing to training (for | example if an image classifier looking for some rare class | you can be drowning in images of the majority class). Or | you can just have lots of data that is unambiguously and | correctly classified- some kind of active learning can tell | you what is worth keeping. | | The other is that for various reasons the customer doesnt | want to share their data (or at least have sharing built | into the inference system) so even if you'd like to have | everything they record, it's just not available. Obviously | something to discourage but it seems common | acchow wrote: | Is your inference running on some daily jobs? That's not a ton | of inference compared to running online for every live request | (10k QPS?) | sabotista wrote: | It depends a lot on your problem, of course. | | Game-playing (e.g. AlphaGo) is computationally hard but the rules | are immutable, target functions (e.g., heuristics) don't change | much, and you can generate arbitrarily sized clean data sets | (play more games). On these problems, ML-scaling approaches work | very well. For business problems where the value of data decays | rapidly, though, you probably don't need the power of a deer or | complex neural net with millions of parameters, and expensive | specialty hardware probably isn't worth it. | scosman wrote: | We did a big analysis of this a few years back. We ended up using | a big spot-instance cluster of CPU machines for our inference | cluster. Much more consistently available than spot GPU, at | greater scale, and at better price per inference (at least at the | time). Scaled well to many billion inferences. Of course, compare | cost per inference on your models to make sure logic applies. | Article on how it worked: https://www.freecodecamp.org/news/ml- | armada-running-tens-of-... | | Training was always GPUs (for speed), non-spot-instance (for | reliability), and cloud based (for infinite parallelism). | Training work tended to be chunky, never made sense to build | servers in house that would be idle some of the time, and queued | at other times. | machinekob wrote: | What cloud is even remotely worth it over buying 20x rtx 3090 | or even some quadro for training? Maybe if u have very small | team and small problems but if you have CV/Video tasks and team | more than 3 maybe even 2 people in house servers are always | better choice as you'll get your money back in 2-3 months of | training over cloud solution and maybe even more if you wait | for rtx 4090. | | And if you are solo dev its even easier choice as you can reuse | your rig for other stuff when you dont train anything (for | example gaming :D). | | Only possibility is if you get free 100k from AWS and then 100k | from GCP you can live with that for a year or even two if u | stack both providers but it is special case and im not sure how | easy it is to get 100k right now. | beecafe wrote: | You are years behind if you think you're training a model | worth anything on consumer grade GPUs. Table stakes these | days is 8x A100 pods, and lots of them. Luckily you can just | get DGX pods so you don't have to build racks but for many | orgs just renting the pods is much cheaper. | rockemsockem wrote: | 300 billion parameters or GTFO, eh? | | There is tons of value to be had from smaller models. Even | some state of the art results can be obtained on a | relatively small set of commodity GPUs. Not everything is | GPT-scale. | cjbgkagh wrote: | Years behind what? Table stakes for what? There is much | more to ML than the latest transformer and diffusion | models. While those get the attention the amount of | research not in that space dominates. | mushufasa wrote: | > You are years behind if you think you're training a model | worth anything on consumer grade GPUs | | Ah yes, my code can't be useful to people unless it takes a | long time to compile... | skimo8 wrote: | To be fair, I think ML workloads are quite a bit | different than the days of compiling over lunch breaks. | | What the above post was probably trying to get at is that | the ML specific hardware is far more efficient these days | than consumer GPUs. | machinekob wrote: | Ahh yes cause there is only one way to do Deep Learning and | it is ofc stacking models large enough to not be useful | outside pods of GPUs and this is for sure way to go if you | want to make money (from VC ofc cause you wont have much | users that are ever willing to pay so much that you'll ever | make even, as was OpenAI and other big model providers, | maybe you can get some money/sponsoring from state or uni). | | Market for local small and efficient models running on | device is pretty big maybe even biggest that exist right | now [ios, android and macos are pretty easy to monetize | with low cost models that are useful]. I can assure you of | that and you can do it on even 4x RTX 3090 [ it wont be | fast but you'll get there :) ] | scosman wrote: | As mentioned in the comment, ML training workloads tend to be | super chunky (at least in my experience). Some days we want | to train 50 models, some weeks we are evaluating and don't | need any compute. | | I'd rather be able to spin up 200 gpus in parallel when | needed (yes, at a premium), but ramp to 0 when not. Data | scientists waiting around are more expensive than GPUs. | Replacing/maintaining servers is more work/money than you | expect. And for us the training data was cloud native, so | transfer/privacy/security is easier; nothing on prem, data | scientists can design models without having access to raw | data, etc. | machinekob wrote: | If you are cloud only company then for sure it is just | easier but still it wont be cheaper just more convenient to | use. If data science team is very big probably "best" | solution without unlimited money is just to run local and | go cloud [premium] if you dont have free resources for your | teams (It was the case when i was working in pretty big EU | bank but it wasn't "true" Deep learning yet [about 4-5 | years ago]). | varunkmohan wrote: | You have a good point. I think for small enough workloads | self managing instances on-prem is more cost-effective. There | is a simplicity gain in being able to scale up and scale down | instances in the cloud but may not make sense if you can | self-manage without too much work. | varunkmohan wrote: | Disclaimer: I'm the Cofounder / CEO at Exafunction | | That's a great point. We'll be addressing this in an upcoming | post as well. | | We've served workloads that run entirely on spot GPUs where it | makes sense since a small number of spot GPUs can make up for a | large amount of spot CPU capacity. The best of all worlds is if | you can manage both spot and on-demand instances (with a | preference towards spot instances). Also, for latency sensitive | workloads, running on spot instances or CPUs sometimes is not | an option. | | I could definitely see cases where it makes sense to run on | spot CPUs though. | fortysixdegrees wrote: | Disclaimer != Disclosure | | Probably one of HNs most common mistakes in comments | adgjlsfhk1 wrote: | perhaps, but I think disclaimer in this context it's just | an abbreviation since the disclosure carries with it the | implicit disclaimer of "so the things I'm saying are | subconsciously influenced by the fact that they potentially | could make me money" | synergy20 wrote: | I think TPU is the way to go for ML, be it training or inference. | | We're using GPU(some contains a TPU block inside) due to | 'historical reasons'. With vector unit(x86 AVX, ARM SVE, RISC-V | RVV) that is part of the host cpu, either put a TPU on a separate | die of the chiplet, or just put it into a PCIe card will do the | heavy lift ML job fine. It shall be much cheaper than the GPU | model for ML nowadays, unless you are both a PC game player and a | ML engineer. | andrewmutz wrote: | At training time they sure are. The only thing more expensive | than fancy GPUs are the ML engineers whose productivity that are | improving. | jacquesm wrote: | This is an ad. | triknomeister wrote: | I thought this post would be about how ASICs are probably a | better bet. | rfrey wrote: | Not related to the article, but how would one begin to become | smart on optimizing GPU workloads? I've been charged with | deploying an application that is a mixture of heuristic search | and inference, that has been exclusively single-user to this | point. | | I'm sure every little thing I've discovered (e.g. measuring | cpu/gpu workloads, trying to multiplex access to the gpu, etc) | was probably covered in somebody's grad school notes 12 years | ago, but I haven't found a source of info on the topic. | pqn wrote: | Let's just take the topic of measuring GPU usage. This alone is | quite tricky -- tools like nvidia-smi will show full GPU | utilization even if not all SMs are running. And also the | workload may change behavior over time, if for instance inputs | to transformers got longer over time. And then it gets even | more complicated to measure when considering optimizations like | dynamic batching. I think if you peek into some ML Ops | communities you can get a flavor of these nuances, but not sure | if there are good exhaustive guides around right now. | einpoklum wrote: | > And CPUs are so much cheaper | | Doesn't look like it. Consumer: | | AMD ThreadRipper 3970X: ~3000 USD on NewEgg | | https://www.newegg.com/amd-ryzen-threadripper-2990wx/p/N82E1... | | NVIDIA RTX 3080 Ti Founders' Edition: ~2000 USD | | https://www.newegg.com/nvidia-900-1g133-2518-000/p/1FT-0004-... | | For servers, a comparison is even more complicated and it | wouldn't be fair to just give two numbers, but I still don't | think GPUs are more expensive. | | ... besides, none of that may matter if yours is a power budget. | 37ef_ced3 wrote: | For small-scale transformer CPU inference you can use, e.g., | Fabrice Bellard's https://bellard.org/libnc/ | | Similarly, for small-scale convolutional CPU inference, where you | only need to do maybe 20 ResNet-50 (batch size 1) per second per | CPU (cloud CPUs cost $0.015 per hour) you can use inference | engines designed for this purpose, e.g., https://NN-512.com | | You can expect about 2x the performance of TensorFlow or PyTorch. | tombert wrote: | Is there a thing that Fabrice Bellard hasn't built? I had no | idea that he was interested in something like machine learning, | but I guess I shouldn't have been surprised because he has | built every tool that I use. | mistrial9 wrote: | https://en.wikipedia.org/wiki/Fabrice_Bellard | [deleted] | [deleted] | [deleted] ___________________________________________________________________ (page generated 2022-08-29 23:00 UTC)