[HN Gopher] Train an AI model once and deploy on any cloud
       ___________________________________________________________________
        
       Train an AI model once and deploy on any cloud
        
       Author : GavCo
       Score  : 178 points
       Date   : 2023-07-08 07:54 UTC (15 hours ago)
        
 (HTM) web link (developer.nvidia.com)
 (TXT) w3m dump (developer.nvidia.com)
        
       | paganel wrote:
       | The AI shovels industry is doing good business. Other than that,
       | any major use-case behind the recent AI hype? One that has
       | brought tangible benefits, or at the very least a positive ROI.
        
         | comfypotato wrote:
         | It's only a matter of time before adoption catches up to the
         | tech. HN is the epitome of cutting edge when it comes to this
         | stuff. It's only natural that readers don't yet see the
         | adoption.
         | 
         | I was in an 800-level (PhD) course last semester, and the
         | professor made a fun lecture where each student had to present
         | a paper from the last 5 years that's been completely outdone by
         | GPT4. You wouldn't believe how it casually outperforms the
         | state of the art from just 5 years ago. My paper was about
         | natural language to bash commands. GPT4 is lightyears ahead of
         | the previous state of the art. You could probably make a
         | business off just a natural language interface to the Linux
         | operating system.
        
         | spaghetti1535 wrote:
         | I feel like the AI hype is putting people off but I do see
         | genuine value being created in all kinds of places.
         | 
         | major: chatgpt for answering questions, explaining topics and
         | helping with coding has brought me personally massive ROI
         | 
         | minor: a lot of companies are integrating LLMs to upgrade their
         | offerings and a lot of small SaaS now exist due to LLMs. I have
         | to guess at least some of those have a positive ROI
        
         | andrewcamel wrote:
         | I'm starting to outline them here: ctlresearch.com . Upcoming
         | interviews with Chief Architect at Intuit, Head of Procurement
         | at DoD, etc. DoD already shortened process of writing
         | structured "requests from industry" from 3 months to 1 day.
         | Makes it far easier to get requests out to vendors. Next step
         | is an auto-complete bot that helps vendors respond with
         | required language to RFPs.
         | 
         | I have 20 interviews coming down the pipe -- all of which have
         | highly tactical / near term valuable ideas like this.
        
           | moneywoes wrote:
           | Do you have a blog or are these market research ideas
        
             | andrewcamel wrote:
             | It's a collection of interviews posted in the form of a
             | library. So bit blog-like in structure, but just a
             | collection of ideas on how to leverage this new tech.
        
         | throwawaybbq1 wrote:
         | I work at an industry research lab. Key challenge for LLMs is
         | the legality and massive resources needed to train. I have
         | research colleagues that are convinced that even OpenAI may be
         | on shaky legal ground. A lot of non-profit and academic
         | liasoning helps to muddy the issue (academics have fair-use
         | exceptions).
         | 
         | If you don't see the potential of the tech and the rapid
         | advances, I can't help you. But the issue around deployment is
         | more legal (and perhaps not enough GPUs to go around).
        
           | Havoc wrote:
           | > even OpenAI may be on shaky legal ground
           | 
           | Not sure it matters. We're very much in do first ask
           | permission later territory here and nobody is putting the
           | genie back in the bottle.
           | 
           | The legal will have to bend towards reality
        
         | hospitalJail wrote:
         | Between generating code, recipes for a non profit, combining my
         | expertise with an obscure application, helping me do social
         | things, and we have a huge savings upcoming but need to use
         | local models... yes.
        
         | jabradoodle wrote:
         | Image recognition, interpreting/translating text, speech
         | recognition for video transcriptions. Machine learning for
         | boring stuff you wont see, making predictions with data etc.
         | 
         | There are lots of use cases, we seem to only talk about LLM's
         | recently.
        
           | paganel wrote:
           | Yeah, I had forgotten about image recognition, I agree that
           | the field has changed significantly because of AI.
           | 
           | Indeed, I was thinking mostly about LLMs, as it seems to me
           | that this type of news presented in the article is mostly
           | targeting that field.
        
       | villgax wrote:
       | It's more on the framework that you use than nvidia at this
       | point. Anything dockerized works with any compatible underlying
       | hardware with no issues. Any optimization is again fragmented
       | with FasterTransformer or TensorRT conversion with half baked
       | layer supports which lags by 6months or more pretty much.
       | 
       | NVAIE license is what nvidia wants enterprises to pay for using
       | their bespoke cards in shared VRAM configuration by knee capping
       | consumer cards which can very well do the same job better with
       | more cuda cores but lesser memory.
       | 
       | And don't even get me started on RIVA stack
       | 
       | FP8 emulation is also never going to get backported instead only
       | H100 & 4090s can make use of it
        
         | homarp wrote:
         | NVAIE aka Nvidia AI enterprise, https://docs.nvidia.com/ai-
         | enterprise/overview/0.1.0/platfor...
         | 
         | RIVA: NVIDIA(r) Riva, a premium edition of NVIDIA AI Enterprise
         | software, is a GPU-accelerated speech and translation AI SDK
         | 
         | FasterTransformer: https://github.com/NVIDIA/FasterTransformer
         | an highly optimized transformer-based encoder and decoder
         | component, supported on pytorch, tensorflow and triton
         | 
         | TensorRT, custom ml framework/ inference runtime from nvidia,
         | https://developer.nvidia.com/tensorrt, but you have to port
         | your models
        
         | codethief wrote:
         | > Any optimization is again fragmented with FasterTransformer
         | or TensorRT conversion with half baked layer supports which
         | lags by 6months or more pretty much.
         | 
         | Thanks, I came here to see whether anything had changed since I
         | last did ML stuff on Nvidia GPUs, and it looks like things are
         | still the same.
        
           | villgax wrote:
           | At this point the benefits of a GPU get outmatched by CPUs
           | even if the latency is 5-10X since you can scale CPU cores
           | cheaper than GPUs both on prem or on public cloud
        
             | kkielhofner wrote:
             | I'm not sure I agree with you.
             | 
             | An RTX 4090 has over 16,000 cores and 1 TB/s of memory
             | bandwidth. From what I understand (not really my thing)
             | DDR5 tops out at 51 GB/s per module.
             | 
             | CPUs and GPUs are so fundamentally different
             | architecturally but for extremely parallel tasks GPUs are
             | designed for CPU is very, very far behind.
             | 
             | When I've done performance tests between CPU and GPU for my
             | applications (speech) a $100 six year old GTX 1070 is 5x
             | faster than a AMD Ryzen Threadripper PRO 5955WX[0] while
             | consuming a fraction of the power and cost. If you look at
             | the table the RTX 3090 and RTX 4090 are 17x and 27x
             | respectively. The H100 benchmark of 12x is from a very
             | early access benchmark with some driver and other issues.
             | 
             | [0] - https://github.com/toverainc/willow-inference-
             | server/tree/wi...
        
       | thih9 wrote:
       | Very off topic, every time I see nvidia expand towards AI
       | products I'm reminded that they had every opportunity to expand
       | towards crypto products and didn't. I like that they work on what
       | they believe in - and skip if they don't. In a time when AI is
       | becoming a buzzword, this feels refreshing.
        
         | m3kw9 wrote:
         | These are real pros doing products, they know what's real, not
         | helping grifters pumping to pass bags, I.e hedge funds, banks,
         | startups, influencers
        
         | teaearlgraycold wrote:
         | I think you just aren't aware of what NVidia was doing over the
         | last few years.
        
           | [deleted]
        
         | jcq3 wrote:
         | Very naive to think profit oriented company have beliefs and
         | convictions... Religion of money is way stronger.
        
           | raincole wrote:
           | Weird statement. Of course Nvidia has belief: they believe AI
           | will brings more profit in long term than crypto will.
        
             | jcq3 wrote:
             | Following a juicy trend doesn't mean you believe in it,
             | also gpu mining is not a thing anymore in crypto. Nvidia
             | has nothing to bring to cryptos.
        
               | petesergeant wrote:
               | > is not a thing anymore
               | 
               | Isn't this very recent?
        
           | thih9 wrote:
           | Why not both?
           | 
           | Even if they believe in a technology because they believe
           | they can deliver a profitable product (and reject something
           | else because they think there's no long term gains), I still
           | prefer that to a company which would blindly try to profit
           | from everything short term.
        
         | YetAnotherNick wrote:
         | I wouldn't be very sure of that. It would be very hard to sell
         | $30k GPU for crypto, like they are doing for AI, as AI
         | requirement is different than gaming while crypto is not. The
         | flop/s difference in A100($15k card) and 4090($1.5k card) is
         | just 2x. Nvidia could constrain VRAM for consumer cards because
         | 24 GB is enough for games or AI inference.
        
           | Namidairo wrote:
           | > Nvidia could constrain VRAM for consumer cards because 24
           | GB is enough for games or AI inference.
           | 
           | Given some of the commentary in launch reviews for the 4000
           | series, I wouldn't be surprised if the overwhelming opinion
           | was that they already are.
        
         | quickthrower2 wrote:
         | Maybe they are hype immune - clearly crypto is zero sum and
         | somewhat seasonal. Machine learning (and matmul and relu in
         | particular) is here to stay and will expand.
        
         | callalex wrote:
         | How do you feel about the GeForce Partner Program?
        
         | Culonavirus wrote:
         | > and didn't
         | 
         | Uh huh.
         | 
         | > Nvidia will pay $5.5 million to settle charges that it
         | unlawfully obscured how many of its graphics cards were sold to
         | cryptocurrency miners...
         | 
         | And
         | 
         | > The CMP HX is a pro-level cryptocurrency mining GPU that
         | provides maximum performance...
         | 
         | Just a quick google away.
         | 
         | Nvidia will develop and sell whatever will make Nvidia more
         | money. They just think the world of AI is two or three orders
         | of magnitude more lucrative than mining ever was. Hence the
         | maximum push on the AI front.
        
           | KaoruAoiShiho wrote:
           | > Nvidia will pay $5.5 million to settle charges that it
           | unlawfully obscured how many of its graphics cards were sold
           | to cryptocurrency miners...
           | 
           | This is because they didn't serve the market... so they
           | didn't understand how many buyers were coming from crypto.
        
           | bushbaba wrote:
           | Crypto mining using GPUs has crashed. Ether was the main
           | source of profit, and the shift away from proof of work dried
           | that up. Bitcoin requires ASICs without the market for
           | nvidia, and recent conditions made this only worse.
           | 
           | Nvidia knows their biggest revenue sources today, which are
           | growing, and is investing into their business units based on
           | that data.
           | 
           | It's just smart business.
        
         | archerx wrote:
         | Do people have short memories? Nvidia did a lot of shady stuff
         | during the crypto boom, they made dedicated mining cards [1],
         | they even software gimped gaming cards to force people to buy
         | their mining cards as well[2]. Nvidia is a shitty anti consumer
         | company that has no issues fucking you over. Don't forget that
         | or let their PR department make you think otherwise.
         | 
         | [1] https://www.nvidia.com/en-us/cmp/ [2]
         | https://arstechnica.com/gadgets/2021/05/nvidia-will-add-anti...
        
           | rcme wrote:
           | They gimped consumer GPU cards because crypto miners were
           | buying them all and their core gamer market was being priced
           | out. Making dedicated mining cards was actually trying to do
           | less for crypto, not more.
        
             | gymbeaux wrote:
             | If Nvidia did something that sounds pro-consumer in any
             | way, it's not because they give a damn about consumers,
             | it's because it coincidentally made good business sense
             | also
        
             | tinco wrote:
             | I've heard someone say they did that because randomly this
             | new type of buyer affected their sales tremendously and
             | they had zero insight into how that market behaved. By
             | establishing separate product lines and sales channels they
             | could in theory better distinguish between their products
             | doing good because of competitive gaming performance, or
             | random fluctuations in the crypto market. That way an
             | investor/shareholder could more accurately price the stock.
             | 
             | I have no idea if they were successful at achieving that
             | goal, just thought it was interesting that market
             | differentiation wouldn't just be useful for marketing but
             | also for corporate accounting. They would even risk
             | alienating the crypto market and possibly lose revenue, if
             | it would mean they'd get a better handle on what they were
             | selling to whom.
        
               | rcme wrote:
               | I'm sure market segmentation was part of the decision. I
               | talked to a high up person at Nvidia about their general
               | strategy around gaming. Nvidia sells graphics cards by
               | having the absolute best graphics performance for gaming.
               | This isn't purely about raw compute power. There are lots
               | of graphics extensions and features available to game
               | developers on Nvidia that aren't available elsewhere. If
               | game developers use these extensions, they get a better
               | looking game when played on Nvidia hardware. This comes
               | with a cost, however; it's more work to use these extra
               | rendering features when developing a game. If gamers
               | can't buy Nvidia GPUs, then there isn't a reason for game
               | developers to use Nvidia's proprietary features. If game
               | developers don't use the proprietary features, then games
               | don't look that much better on an Nvidia card. This makes
               | Nvidia a less desirable choice for gamers.
        
             | blitzar wrote:
             | crypto miners were buying them all because nvidia were
             | selling directly to crypto miners entire production runs of
             | cards
        
         | getmeinrn wrote:
         | Fellow organic user, I also find the outlook and integrity of
         | Nvidia(tm) extremely refreshing. Finally a company we can
         | believe in to play the game The Way It's Meant To Be Played(tm)
        
           | smoldesu wrote:
           | "a company we can believe in" should be the subtitle of
           | Hacker News.
        
           | somsak2 wrote:
           | > Please don't post insinuations about astroturfing,
           | shilling, brigading, foreign agents, and the like. It
           | degrades discussion and is usually mistaken. If you're
           | worried about abuse, email hn@ycombinator.com and we'll look
           | at the data.
           | 
           | https://news.ycombinator.com/newsguidelines.html
        
             | flangola7 wrote:
             | That's a dumb rule that often deserves to be broken. Appeal
             | to authority is an abdication of responsibility.
        
             | jlund-molfese wrote:
             | Can we get a rule that bans copying and pasting the rules
             | into comments? It's just noise that lowers the quality of
             | discussion.
             | 
             | And most of the time, the person isn't even breaking any
             | rules. In this case, I'm pretty sure they were making a
             | joke and didn't actually think that a longtime HN user was
             | astroturfing
        
       | lee101 wrote:
       | [dead]
        
       | jokethrowaway wrote:
       | Cool!
       | 
       | Is the cost AWS level of waste - or something reasonable?
       | 
       | I can get an A4000 with 16GB vram which can run some models for
       | 140$ per month.
       | 
       | I can't say the setup is anything special really but not having
       | to do that has some value
        
       | zaalps wrote:
       | [flagged]
        
       | ommz wrote:
       | It would be nice if Nvidia did not enforce artificial driver and
       | legal kneecaps to consumer Geforce cards for cloud usage to prop
       | up their enterprise ones... but shareholder rights come before
       | anyone.
        
         | konschubert wrote:
         | If they were not such a monopoly they could not pull this off.
        
           | sanxiyn wrote:
           | NVIDIA became a monopoly by building superior products. It's
           | not like they became a monopoly by anti-competitive
           | practices.
        
             | konschubert wrote:
             | I'm not saying they did anything bad.
             | 
             | But a monopoly can be harmful for a market without anyone
             | doing anything illegal.
        
               | sanxiyn wrote:
               | True. But it is also self-correcting, since monopoly
               | profit will attract competitors. AMD seems to be the most
               | likely candidate.
        
         | ChuckNorris89 wrote:
         | But then what's stopping cloud customers from scalping up all
         | the consumer GeForce stocks for cheap and putting those in the
         | data center like in the crypto mining days?
         | 
         | Cloud customers can afford to pay more for those GPUs than
         | gamers because they generate revenue with them, gamers don't.
         | 
         | So it make sense to have some product segmentation in place to
         | prevent one market completely cannibalizing the other while
         | leaving Nvidia with less profits.
         | 
         | The current situation is still caused by manufacturing
         | constraints at TSMC for the cutting edge nodes which both the
         | consumer and data center parts occupy so it makes sense for
         | Nvidia to prioritize the higher margin parts.
         | 
         | There have been great points made that Nvidia should split into
         | Nvidia, the general compute company oriented to data center
         | customers with deep pockets, and in GeForce, the gaming GPU
         | company with access to all the cutting edge tech of Nvidia but
         | seeks to be more scrappy and optimize designs for rasterization
         | performance rather than generic compute and chases smaller die
         | sizes on cheaper nodes to be price competitive. This way the
         | data center compute market will stop cannibalizing consumer
         | gaming one and we'll be back to having better GPUs at
         | competitive prices.
        
           | kkielhofner wrote:
           | There are some debatable licensing terms in various Nvidia
           | driver releases that prohibit the use of consumer cards being
           | hosted in "datacenters".
           | 
           | But the real issue is physical form factor and power. As has
           | been noted in the press, etc, something like an RTX 3090 (and
           | more so 4090) is literally designed to push frames as fast as
           | possible power and heat be damned. They're multi-slot (which
           | results in poor density), have card design/cooling
           | challenges, power configuration issues, etc.
           | 
           | There's a story out there about the only dual-slot RTX 3090.
           | Gigabyte came up with one (I have several - they're great)
           | but supposedly Nvidia put pressure on them to pull them from
           | the market[0] because people were putting them in x8 server
           | configurations and using them instead of their much more
           | expensive datacenter products.
           | 
           | [0] - https://www.tomshardware.com/news/gigabyte-rains-
           | partners-pa...
        
         | neximo64 wrote:
         | You could always use a Geforce card at home. Are you saying the
         | cloud should use those Geforce cards and completely distort the
         | price of the GPUs for home use?
        
         | rmbyrro wrote:
         | They're just trying to eat the consumer surplus from enterprise
         | customers, which are higher up in the demand curve. Everyone
         | does that.
         | 
         | An individual developer is happy to charge a higher salary for
         | its services from a larger corporation in comparison to working
         | for an SME, simple because in a large org its services generate
         | more value, allowing it to capture more of it.
        
           | cj wrote:
           | I don't disagree, but I think that's a poor analogy. I don't
           | think devs take into account the business value their future
           | job will bring their employer when negotiating salary. And if
           | they do, they only do so when the balance is in their favor
           | and they definitely wouldn't lower their salary if they think
           | the job has less impact than another job.
        
           | __MatrixMan__ wrote:
           | As a human, I do not want a level playing field when it comes
           | to humans exploiting corporations vs corporations exploiting
           | humans.
        
             | smoldesu wrote:
             | You have long since missed the boat on changing that. This
             | is how business is done: "well we _can_ charge you 5x the
             | market price for the RAM /SSD upgrade, so we will!"
        
               | __MatrixMan__ wrote:
               | In some cases, yes. But not entirely. Open source exists
               | to give people a way to opt out of would-be exploitation
               | of a related kind.
               | 
               | Things can still get a lot worse: The fight isn't over
               | until all roads are toll roads and you have to pay for
               | the oxygen you consume.
        
           | immibis wrote:
           | That's because developers are people and corporations aren't.
        
         | izacus wrote:
         | nVidia making sure that their consumer business isn't
         | outscalped and destroyed by VC funded companies is a good
         | thing.
         | 
         | This is how they also came out on top from the crypto craze
         | without destroying their gaming market.
        
           | sdflhasjd wrote:
           | They didn't come out on top, they revelled in it. What
           | brought us back to some relative normalcy was the crypto
           | crash & Etherium's switch away from PoW; even after that, the
           | 40 series pricing and range seems to be nVidia cashing in on
           | the scalper prices
        
             | KaoruAoiShiho wrote:
             | nvidia maintained MSRP of 30 series cards during the WFH
             | boom and did not allow AIBs to increase prices, this was
             | one of the main complaints from EVGA that ended up with
             | them pulling out of the GPU market. The scalping was done
             | by third parties.
        
       | hospitalJail wrote:
       | We need local models for our confidential data. Nvidia, we
       | already can train using OpenAI or a beefy hosted server.
       | 
       | But this particular data is air gapped.
        
       | politelemon wrote:
       | I'm failing to see why k8s needs to be involved here - it's
       | overkill for most model serving cases but its involvement here
       | now adds additional overhead. So it's not really any cloud, it's
       | any cloud where you're running your EKS/AKS etc.
        
         | ianpurton wrote:
         | Kubernetes means you don't have to learn each clouds way of
         | doing a deployment. You just learn the k8s way then use that
         | with Google, Azure or whatever.
         | 
         | So your skillset is reusable.
        
           | oceanplexian wrote:
           | > Kubernetes means you don't have to learn each clouds way of
           | doing a deployment.
           | 
           | So instead of learning how to deploy on GCP, AWS, and Azure,
           | which is only 3x more complicated than deploying to a single
           | cloud, you should learn K8s, which is 10-15x more
           | complicated, in addition to still having to learn about all
           | the various ingress controllers and weird quirks that are
           | completely different on each cloud provider. Doesn't really
           | track for me.
        
             | echelon wrote:
             | > which is 10-15x more complicated
             | 
             | You can learn k8s in a day. It's really simple.
             | 
             | > various ingress controllers and weird quirks that are
             | completely different on each cloud provider
             | 
             | Which are thoroughly documented and not that hard to
             | implement or understand. You'd be reading about each
             | cloud's nonstandard ingress even without k8s.
             | 
             | The beauty of k8s is you can run your software locally and
             | have a much easier time lifting and shifting to another
             | cloud.
             | 
             | Fitting to the shape of a cloud provider is a great way to
             | never leave.
             | 
             | Another benefit of k8s is that you treat your services as
             | cattle you can easily spawn and kill. Adoption of k8s
             | naturally leads to anti-fragility, anti-brittle best
             | practices.
        
               | bg24 wrote:
               | Having spent 4 years working with kubernetes (though as a
               | PM, but pretty hands-on), getting started is easy - like
               | in less a week. The problem happens when you run into
               | issues. That can suck up lot of time. Also if you are new
               | to containers, it might not be a good step to venture
               | into k8s.
        
               | hosh wrote:
               | I think being able to abstract the cloud providers is
               | secondary to k8s's ability to self-heal and its
               | modularity.
               | 
               | But I think most developers don't care, and instead,
               | should interact with a platform built with Kubernetes as
               | a foundation.
        
               | profunctor wrote:
               | Maybe this is because I'm not that smart but I could not
               | learn real kubernetes in a day. I had to build a system
               | for loading models and returning predictions over a HTTPS
               | api. It had to connect to storage to load the model,
               | needed secrets etc. It took more than a day. And I think
               | it would take most people more than a day to go from zero
               | to able to create a useful, real world deployment in a
               | day. I'm sure you can rush through the documentation in a
               | day but I wouldn't call that learning.
        
             | beebmam wrote:
             | No shot that kubernetes is 10-15x more complicated than
             | cloud offerings.
        
             | hosh wrote:
             | K8S is not _that_ complicated. Once you know the big ideas
             | behind it, and how to reason with it, it becomes a very
             | versatile platform substrate.
             | 
             | Probably the biggest one is understanding you don't ever do
             | anything directly with Kubernetes.
        
               | el_benhameen wrote:
               | Do you have any favorite learning resources?
        
               | doctoboggan wrote:
               | I've learned K8S over the past few months and what was
               | absolutely instrumental to my understanding was: 1. use
               | Helm, and 2. daily chat sessions with gpt-4.
               | 
               | I use gpt-4 through the API where you can set your own
               | system prompt. I developed one that basically instructed
               | it to give me kubectl commands to solve my problems and
               | then wait for me to give it the result before continuing.
               | Through this I learned the practical techniques and which
               | kubectl commands you use on a daily basis, which is so
               | much more helpful than reading the documentation which
               | just gives all commands equal weight.
               | 
               | EDIT: Oh, and definitely watch a few "TechWorld with
               | Nana" videos on YouTube. She does a great job of
               | explaining the architecture, terminology, and philosophy
               | of k8s which I think it very helpful to know.
        
           | berkle4455 wrote:
           | This argument is reminiscent of ORMs and "you can switch your
           | database and only change the config!"
           | 
           | Switching your database, just like switching your cloud
           | provider, rarely happens in practice.
        
             | danryan wrote:
             | That feature was always a byline at best.
        
           | finikytou wrote:
           | no it doesnt mean that. you still need to know how to operate
           | k8s on a specific cloud.
        
             | hhh wrote:
             | Ideally the developer doesn't. At scale some platform or
             | infra team should.
        
             | artdigital wrote:
             | I run stuff on hosted k8s from DO, Google and Vultr. I can
             | absolutely reuse my knowledge, and deployments are almost
             | identical (minus smaller differences like storage csi
             | driver, etc)
        
               | oceanplexian wrote:
               | I work at a place running a million containers deployed
               | in all 3 (Azure, AWS, GCP). I can assure you they are
               | radically different; autoscaling works differently, the
               | load balancers work differently, the networking
               | infrastructure is completely different, the failure modes
               | and limits behaviors are different, the instances perform
               | differently, observability is different, and they all
               | suck in unique and different ways that we discover on a
               | daily basis. Shit even AWS can't keep their regions
               | consistent; each region has different products and
               | features and they fail in different ways.
               | 
               | If you are the one maintaining it it's a full time job
               | handling all these edge cases, it's completely miserable
               | and I wouldn't recommend it to anyone.
        
               | Infernal wrote:
               | > I work at a place running a million containers deployed
               | in all 3 (Azure, AWS, GCP)
               | 
               | Are you using AKS, EKS, GKE on those providers, or
               | deploying your own k8s on top of the compute those
               | providers offer? It sounds to me like the former.
        
               | hosh wrote:
               | I've done smaller deployments on GKE and another on EKS,
               | and I can tell you, they are different enough. It's when
               | you start having to autoscale, optimize resources by
               | instance types, and manage network ingress that these
               | quirks start really come out. The essential ideas are
               | invariant across cloud providers though.
               | 
               | But I enjoy working with Kubernetes.
        
               | Infernal wrote:
               | I should've been clearer about what I was getting at. I
               | agree AKS, EKS, GKE etc (cloud gnostic k8s) are different
               | enough to cause a growth of complexity when managing a
               | mixed environment of them.
               | 
               | The post I was replying to seemed to be saying (by
               | analogy) "Linux is hard to manage because I run into all
               | sorts of trouble trying to support a mixed environment of
               | SuSE, Ubuntu and RHEL, therefore Linux is just too
               | complicated".
        
               | gymbeaux wrote:
               | The latter wouldn't make sense. In such a case, there's
               | already little value to being in the cloud, but to be in
               | several?..
        
               | floomk wrote:
               | Once you get to the scale of a million containers (or
               | "apps") spread over multiple clouds then everything will
               | be miserable.
        
               | hosh wrote:
               | The essential ideas that Kubernetes exposes concretely
               | are invariant across cloud providers. There absolutely
               | are nuances and quirks that are different for each cloud
               | provider, and unique for the workload you have. However,
               | those same ideas also act as a kind of mental framework
               | in which these quirks can be understood from. It isn't as
               | if those quirks are randomly there, unconnected to
               | anything, and therefore not part of a coherent design.
               | 
               | For example, the consistent use of labels as a way to
               | identifying groups of resources that need to coordinate
               | with each other is very useful for any distributed
               | system. I find myself looking for them in say, CI/CD
               | systems (in the form of agent tags), or at the
               | application level in say, matching players to game
               | servers.
        
             | Art9681 wrote:
             | No you don't have to. You can deploy your own cluster
             | instead of using the managed option if you want to. A good
             | SRE can deploy and manage EKS. A great SRE can deploy and
             | manage a cluster to any Cloud without ever touching the
             | dashboards.
        
             | okamiueru wrote:
             | Some part of it creates cloud specific resources, and you
             | might also for good reasons have cloud manged database or
             | data storage that your k8s services use. However, "k8s on a
             | specific cloud", is mostly the same, except for the outer
             | edges.
        
               | hosh wrote:
               | Until you want to scale and optimize resources.
               | 
               | I enjoy working with Kubernetes, but forcing a complex
               | domain into something legible is a recipe for
               | catastrophe. There are quirks, across cloud providers,
               | and this is just another day in Ops, with or without
               | Kubernetes. (See:
               | https://www.ribbonfarm.com/2010/07/26/a-big-little-idea-
               | call... )
        
             | jml78 wrote:
             | Just like you have to know how to operate each cloud in
             | general.
             | 
             | There is no free lunch. But if you learn k8s, moving from
             | AWS EKS to Google GKE to DigitalOceans hosted k8s is easy.
        
             | api wrote:
             | It's more like learning different Linux distributions than
             | learning different OSes.
        
             | quickthrower2 wrote:
             | hell yes.
        
           | jcims wrote:
           | Kubernetes gets a lot of shit on HN but for all of its
           | challenges it has proven to be a fantastic method to abstract
           | many of the idiosyncrasies of hosting on-prem vs various
           | cloud providers. I've worked for two companies now with 8
           | figure monthly cloud spend and hundreds/thousands of
           | applications operating in one or more of the main cloud
           | providers and k8s has been essential in making that happen.
           | Teams can migrate to an on-premise hosted option if they
           | want, then transition to cloud if/when it makes sense, or
           | just stay where they are.
        
             | stavros wrote:
             | _Takes notes_
             | 
             | "Kubernetes has been essential in making 8-figure monthly
             | cloud spend happen"
        
               | jcims wrote:
               | When you're spending billions of dollars on data center
               | refresh, it's a bargain.
        
               | windexh8er wrote:
               | The last startup I was in was obsessed with putting
               | everything into k8s for no apparent reason. Even the
               | product they were selling, which most customers hated
               | because it forced our customer base to either have to
               | deal with the pain of paying for k8s because they had no
               | need or intention to use it other than for our product or
               | work with a cross-functional team which now created a
               | time sink and dependency that wouldn't have been there
               | otherwise.
               | 
               | The best though was when I ran across someone in the org
               | trying to run a single container to run a periodic job in
               | its own cluster. They spent half the day trying to get it
               | to work with ingress.
               | 
               | You can imagine how it came to a head when the company
               | realized they were spending hundreds of thousands per
               | month on idle clusters in AWS.
        
               | immibis wrote:
               | One company I worked at was obsessed with k8s for a
               | while, on a local arrangement of about 4 servers, each
               | build would start up a new container on kubernetes and
               | rebuild an entire operating system from scratch.
        
         | bjornsing wrote:
         | > So it's not really any cloud, it's any cloud where you're
         | running your EKS/AKS etc.
         | 
         | As I understand it this new Nvidia VM image comes with
         | Kubernetes on the inside so to speak, perhaps microk8s with
         | nvidia extension enabled.
         | 
         | BTW this is how I've started running my own little AI
         | experiments. Sure, there's some overhead. But compared with
         | constantly downloading new versions of drivers it's quite
         | lightweight. Also K8S is turning into the ligua franca of
         | sodtware platforms, so well worth learning and paying the
         | overhead on IMHO.
        
       | csears wrote:
       | Congrats to the Run:ai team. This looks like a pretty big
       | endorsement from Nvidia.
        
       ___________________________________________________________________
       (page generated 2023-07-08 23:00 UTC)