[HN Gopher] Use one big server ___________________________________________________________________ Use one big server Author : pclmulqdq Score : 841 points Date : 2022-08-02 14:43 UTC (8 hours ago) (HTM) web link (specbranch.com) (TXT) w3m dump (specbranch.com) | londons_explore wrote: | Hybrid! | | If you are at all cost sensitive, you should have some of your | own infrastructure, some rented, and some cloud. | | You should design your stuff to be relatively easily moved and | scaled between these. Build with docker and kubernetes and that's | pretty easy to do. | | As your company grows, the infrastructure team can schedule which | jobs run where, and get more computation done for less money than | just running everything in AWS, and without the scaling headaches | of on-site stuff. | dekhn wrote: | Science advances as RAM on a single machine increases. | | For many years, genomics software was non-parallel and depending | on having a lot of RAM- often a terabyte or more- to store data | in big hash tables. Converting that to distributed computing was | a major effort and to this day many people still just get a Big | Server With Lots of Cores, RAM, and SSD. | | Personally after many years of working wiht distributed, I | absolutely enjoy working on a big fat server that I have all to | myself. | bee_rider wrote: | On the other hand in science, it sure is annoying that the size | of problems that fit in a single node is always increasing. | PARDISO running on a single node will always be nipping at your | heels if you are designing a distributed linear system | solver... | notacoward wrote: | > Science advances as RAM on a single machine increases. | | Also as people learn that correlation does not equal causation. | ;) | rstephenson2 wrote: | It seems like lots of companies start in the cloud due to low | commitments, and then later when they have more stability and | demand and want to save costs, making bigger cloud commitments | (RIs, enterprise agreements etc) are a turnkey way to save money | but always leave you on the lower-efficiency cloud track. Has | anyone had good experiences selectively offloading workloads from | the cloud to bare metal servers nearby? | reillyse wrote: | Nope. Multiple small servers. | | 1) you need to get over the hump and build in multiple servers | into your architecture from the get go (the author says you need | two servers minimum), so really we are talking about two big | servers. | | 2) having multiple small servers allows us to spread our service | into different availability zones | | 3) multiple small servers allows us to do rolling deploys without | bringing down our entire service | | 4) once we use the multiple small servers approach it's easy to | scale up and down our compute by adding or removing machines. | Having one server it's difficult to scale up or down without | buying more machines. Small servers we can add incrementally but | with the large server approach scaling up requires downtime and | buying a new server. | zhte415 wrote: | It completely depends on what you doing. This was pointed out | in the first paragraph of the article: | | > By thinking about the real operational considerations of our | systems, we can get some insight into whether we actually need | distributed systems for most things. | Nextgrid wrote: | > you need to get over the hump and build in multiple servers | into your architecture from the get go (the author says you | need two servers minimum), so really we are talking about two | big servers. | | Managing a handful of big servers can be done manually if | needed - it's not pretty but it works and people have been | doing it just fine before the cloud came along. If you | intentionally plan on having dozens/hundreds of small servers, | manual management becomes unsustainable and now you need a | control plane such as Kubernetes, and all the complexity and | failure modes it brings. | | > having multiple small servers allows us to spread our service | into different availability zones | | So will 2 big servers in different AZs (whether cloud AZs or | old-school hosting providers such as OVH). | | > multiple small servers allows us to do rolling deploys | without bringing down our entire service | | Nothing prevents you from starting multiple instances of your | app on one big server nor doing rolling deploys with big bare- | metal assuming one server can handle the peak load (so you take | out your first server out of the LB, upgrade it, put it back in | the LB, then do the same for the second and so on). | | > once we use the multiple small servers approach it's easy to | scale up and down our compute by adding or removing machines. | Having one server it's difficult to scale up or down without | buying more machines. Small servers we can add incrementally | but with the large server approach scaling up requires downtime | and buying a new server. | | True but the cost premium of the cloud often offsets the | savings of autoscaling. A bare-metal capable of handling peak | load is often cheaper than your autoscaling stack at low load, | therefore you can just overprovision to always meet peak load | and still come out ahead. | SoftTalker wrote: | I manage hundreds of servers, and use Ansible. It's simple | and it gets the job done. I tried to install Kubernetes on a | cluster and couldn't get it to work. I mean I know it works, | obviously, but I could not figure it out and decided to stay | with what works for me. | eastbound wrote: | But it's specific, and no-one will want to take over your | job. | | The upside of a standard AWS CloudFormation file is that | engineers are replaceable. They're cargo-cult engineers, | but they're not worried for their career. | Nextgrid wrote: | > But it's specific, and no-one will want to take over | your job. | | It really depends what's on the table. Offer just half of | the cost savings vs an equivalent AWS setup as a bonus | (and pocket the other half) and I'm sure you'll find | people who will happily do it (and you'll be happy to | pocket the other half). For a lot of companies even just | _half_ of the cost savings would be a significant sum | (reminds me of an old client who spent _thousands_ per | month on an RDS cluster that not only was slower than my | entry-level MacBook, but ended up crapping out and stuck | in an inconsistent state for 12 hours and required manual | intervention from AWS to recover - so much for managed | services - ended up restoring a backup but I wish I could | 've SSH'd in and recovered it in-place). | | As someone who uses tech as a means to an end and is more | worried about the _output_ said tech produces than the | tech itself (aka I 'm not looking for a job nor resume | clout nor invites to AWS/Hashicorp/etc conferences, | instead I bank on the business problems my tech solves), | I'm personally very happy to get my hands dirty with old- | school sysadmin stuff if it means I don't spend 10-20x | the money on infrastructure just to make Jeff Bezos | richer - my end customers don't know nor care either way | while my wallet appreciates the cost savings. | [deleted] | rubiquity wrote: | The line of thinking you follow is what is plaguing this | industry with too much complexity and simultaneously throwing | away incredible CPU and PCIe performance gains in favor of | using the network. | | Any technical decisions about how many instances to have and | how they should be spread out needs to start as a business | decision and end in crisp numbers about recovery point/time | objections, and yet somehow that nearly never happens. | | To answer your points: | | 1) Not necessarily. You can stream data backups to remote | storage and recover from that on a new single server as long as | that recovery fits your Recovery Time Objective (RTO). | | 2) What's the benefit of multiple AZs if the SLA of a single AZ | is greater than your intended availability goals? (Have you | checked your provider's single AZ SLA?) | | 3) You can absolutely do rolling deploys on a single server. | | 4) Using one large server doesn't mean you can't compliment it | with smaller servers on an as-needed basis. AWS even has a | service for doing this. | | Which is to say: there aren't any prescriptions when it comes | to such decisions. Some businesses warrant your choices, the | vast majority do not. | reillyse wrote: | Ok, so to your points. | | "It depends" is the correct answer to the question, but the | least informative. | | One Big Server or multiple small servers? It depends. | | It always depends. There are many workloads where one big | server is the perfect size. There are many workloads where | many small servers are the perfect solution. | | What my point is, is that the ideas put forward in the | article are flawed for the vast majority of use cases. | | I'm saying that multiple small servers are a better solution | on a number of different axis. | | For 1) "One Server (Plus a Backup) is Usually Plenty" Now I | need some kind of remote storage streaming system and some | kind of manual recovery, am I going to fail over to the | backup (and so it needs to be as big as my "One server" or | will I need to manually recover from my backup? | | 2) Yes it depends on your availability goals, but you get | this as a side effect of having more than one small instance | | 3) Maybe I was ambiguous here. I don't just mean rolling | deploys of code. I also mean changing the server code, | restarting, upgrading and changing out the server. What | happens when you migrate to a new server (when you scale up | by purchasing a different box). Now we have a manual process | that doesn't get executed very often and is bound to cause | downtime. | | 4) Now we have "Use one Big Server - and a bunch of small | ones" | | I'm going to add a final point on reliability. By far the | biggest risk factor for reliability is me the engineer. I'm | responsible for bringing down my own infra way more than any | software bug or hardware issue. The probability of me messing | up everything when there is one server that everything | depends on is much much higher, speaking from experience. | | So. Like I said, I could have said "It depends" but instead I | tried to give a response that was someway illuminating and | helpful, especially given the strong opinions expressed in | the article. | | I'll give a little color with the current setup for a site I | run. | | moustachecoffeeclub.com runs on ECS | | I have 2 on-demand instances and 3 spot instances | | One tiny instance running my caches (redis, memcache) One | "permanent" small instance running my web server | | Two small spot instances running web server One small spot | instance running background jobs | | small being about 3 GB and 1024 CPU units | | And an RDS instance with backup about $67 / month | | All in I'm well under $200 per month including database. | | So you can do multiple small servers inexpensively. | | Another aspect is that I appreciate being able to go on | vacation for a couple of weeks, go camping or take a plane | flight without worrying if my one server is going to fall | over when I'm away and my site is going to be down for a | week. In a big company maybe there is someone paid to monitor | this, but with a small company I could come back to a smoking | hulk of a company and that wouldn't be fun. | bombcar wrote: | > Any technical decisions about how many instances to have | and how they should be spread out needs to start as a | business decision and end in crisp numbers about recovery | point/time objections, and yet somehow that nearly never | happens. | | Nobody wants to admit that their business or their department | actually has a SLA of "as soon as you can, maybe tomorrow, as | long as it usually works". So everything is pretend- | engineered to be fifteen nines of reliability (when in | reality it sometimes explodes _because_ of the "attempts" to | make it robust). | | Being honest about the _actual_ requirements can be extremely | helpful. | bob1029 wrote: | > Nobody wants to admit that their business or their | department actually has a SLA of "as soon as you can, maybe | tomorrow, as long as it usually works". So everything is | pretend-engineered to be fifteen nines of reliability (when | in reality it sometimes explodes because of the "attempts" | to make it robust). | | I have yet to see my principal technical frustrations | summarized so concisely. This is at the heart of | _everything_. | | If the business and the engineers can get over their | ridiculous obsession of statistical outcomes and strict | determinism, they would be able to arrive at a much more | cost effective, simple and human-friendly solution. | | The # of businesses that are _actually_ sensitive to >1 | minute of annual downtime are already running on top of IBM | mainframes and have been for decades. No one's business is | as important as the federal reserve or pentagon, but they | don't want to admit it to themselves or others. | marcosdumay wrote: | > The # of businesses that are actually sensitive to >1 | minute of annual downtime are already running on top of | IBM mainframes and have been for decades. | | Is there any? | | My bank certainly has way less than 5 9s of availability. | It's not a problem at all. Credit/debit card processors | seem to stay around 5 nines, and nobody is losing sleep | over it. As long as your unavailability isn't all on the | Christmas promotion day, I never saw anybody losing any | sleep over web-store unavailability. The FED probably | doesn't have 5 9's of availability. It's way overkill for | a central bank, even if it's one that process online | interbank transfers (what the FED doesn't). | | The organizations that need more than 5 9's are probably | all on the military and science sectors. And those aren't | using mainframes, they certainly use good old redundancy | of equipment with simple failure modes. | bob1029 wrote: | > simultaneously throwing away incredible CPU and PCIe | performance gains | | We _really_ need to double down on this point. I worry that | some developers believe they can defeat the laws of physics | with clever protocols. | | The amount of time it takes to round trip the network _in the | same datacenter_ is roughly 100,000 to 1,000,000 nanoseconds. | | The amount of time it takes to round trip L1 cache is around | half a nanosecond. | | A trip down PCIe isn't much worse, relatively speaking. Maybe | hundreds of nanoseconds. | | Lots of assumptions and hand waving here, but L1 cache _can | be_ around 1,000,000x faster than going across the network. | SIX orders of magnitude of performance are _instantly_ | sacrificed to the gods of basic physics the moment you decide | to spread that SQLite instance across US-EAST-1. Sure, it | might not wind up a million times slower on a relative basis, | but you 'll never get access to those zeroes again. | roflyear wrote: | I agree! Our "distributed cloud database" just went down last | night for a couple of HOURS. Well, not entirely down. But | there were connection issues for hours. | | Guess what never, never had this issue? The hardware I keep | in a datacenter lol! | dvfjsdhgfv wrote: | > The line of thinking you follow is what is plaguing this | industry with too much complexity and simultaneously throwing | away incredible CPU and PCIe performance gains in favor of | using the network. | | It will die out naturally once people realize how much the | times have changed and that the old solutions based on weaker | hardware are no longer optimal. | deathanatos wrote: | > _2) What 's the benefit of multiple AZs if the SLA of a | single AZ is greater than your intended availability goals? | (Have you checked your provider's single AZ SLA?)_ | | ... my providers single AZ SLA is less than my company's | intended availability goals. | | (IMO our goals are also nuts, too, but it is what it is.) | | Our provider, in the worse case (a VM using a managed hard | disk) has an SLA of 95% within a month (I ... think. Their | SLA page uses incorrect units on the top line items. The | examples in the legalese -- examples are normative, right? -- | use a unit of % / mo...). | | You're also assuming a provider a.) typically meets their | SLAs and b.) if they don't, honors them. IME, (a) is highly | service dependent, with some services being just _stellar_ at | it, and (b) is usually "they will if you can _prove_ to them | with your own metrics they had an outage, and push for a | credit. Also (c.) the service doesn 't fail in a way that's | impactful, but not covered by SLA. (E.g., I had a cloud | provider once whose SLA was over "the APIs should return | 2xx", and the APIs during the outage, always returned "2xx, | I'm processing your request". You then polled the API and got | "2xx your request is pending". Nothing was happening, because | they were having an outage, but that outage could continue | indefinitely without impacting the SLA! _That_ was a fun | support call...) | | There's also (d) AZs are a myth; I've seen multiple global | outages. E.g., when something like the global authentication | service falls over and takes basically every other service | with it. (Because nothing can authenticate. What's even | better is the provider then listing those services as "up" / | not in an outage, because _technically_ it 's not _that_ | service that 's down, it is just the authentication service. | Cause God forbid you'd have to give out _that_ credit. But | the provider calling a service "up" that is failing 100% of | the requests sent its way is just rich, from the customer's | view.) | ericd wrote: | On a big server, you would probably be running VMs rather than | serving directly. And then it becomes easy to do most of what | you're talking about - the big server is just a pool of | resources from which to make small, single purpose VMs as you | need them. | Koshkin wrote: | Why VMs when you can use containers? | ericd wrote: | If you prefer those, go for it. I like my infra tech to be | about as boring and battle tested as I can get it without | big negatives in flexibility. | Koshkin wrote: | In theory, VMs should only be needed to run different | OSes on one big box. Otherwise, what should have sufficed | (speaking of what I 'prefer') is a multiuser OS that does | not require additional layers to ensure security and | proper isolation of users and their work environments | from each other. Unfortunately, looks like UNIX and its | descendants could not deliver on this basic need. (I | wonder if Multics had something of a better design in | this regard.) | cestith wrote: | Why containers when you can use unikernel applications? | Koshkin wrote: | But can unikernel applications share a big server | (without themselves running inside VMs)? | mixmastamyk wrote: | Better support when at least in the neighborhood of the | herd. | PeterCorless wrote: | We have a different take on running "one big database." At | ScyllaDB we prefer vertical scaling because you get better | utilization of all your vCPUs, but we still will keep a | replication factor of 3 to ensure that you can maintain [at | least] quorum reads and writes. | | So we would likely recommend running 3x big servers. For those | who want to plan for failure, though, they might prefer to have | 6x medium servers, because then the loss of any one means you | don't take as much of a "torpedo hit" when any one server goes | offline. | | So it's a balance. You want to be big, but you don't want to be | monolithic. You want an HA architecture so that no one node kills | your entire business. | | I also suggest that people planning systems create their own | "torpedo test." We often benchmark to tell maximal optimum | performance, presuming that everything is going to go right. | | But people who are concerned about real-world outage planning may | want to "torpedo" a node to see how a 2-out-of-3-nodes-up cluster | operates, versus a 5-out-of-6-nodes-up cluster. | | This is like planning for major jets, to see if you can work with | 2 of 3 engines, or 1 of 2. | | Obviously, if you have 1 engine, there is nothing you can do if | you lose that single point of failure. At that point, you are | updating your resume, and checking on the quality of your | parachute. | vlovich123 wrote: | > At that point, you are updating your resume, and checking on | the quality of your parachute | | The ordering of these events seems off but that's | understandable considering we're talking about distributed | systems. | pclmulqdq wrote: | I think this is the right approach, and I really admire the | work you do at ScyllaDB. For something truly critical, you | really do want to have multiple nodes available (at least 2, | and probably 3 is better). However, you really should want to | have backup copies in multiple datacenters, not just the one. | | Today, if I were running something that absolutely needed to be | up 24/7, I would run a 2x2 or 2x3 configuration with async | replication between primary and backup sites. | PeterCorless wrote: | Exactly. Regional distribution can be vital. Our customer | Kiwi.com had a datacenter fire. 10 of their 30 nodes were | turned to a slag heap of ash and metal. But 20 of 30 nodes in | their cluster were in completely different datacenters so | they lost zero data and kept running non-stop. This is a rare | story, but you do NOT want to be one of the thousands of | others that only had one datacenter, and their backups were | also stored there and burned up with their main servers. Oof! | | https://www.scylladb.com/2021/03/23/kiwi-com-nonstop- | operati... | zokier wrote: | If you have just two servers how are you going to load-balance | and fail-over them? Generally you need at least 3 nodes for any | sort of quorum? | titzer wrote: | Last year I did some consulting for a client using Google cloud | services such as Spanner and cloud storage. Storing and indexing | mostly timeseries data with a custom index for specific types of | queries. It was difficult for them to define a schema to handle | the write bandwidth needed for their ingestion. In particular it | required a careful hashing scheme to balance load across shards | of the various tables. (It seems to be a pattern with many | databases to suck at append-often, read-very-often patterns, like | logs). | | We designed some custom in-memory data structures in Java but | also also some of the standard high-performance concurrent data | structures. Some reader/write locks. gRPC and some pub/sub to get | updates on the order of a few hundred or thousand qps. In the | end, we ended up with JVM instances that had memory requirements | in the 10GB range. Replicate that 3-4x for failover, and we could | serve queries at higher rates and lower latency than hitting | Spanner. The main thing cloud was good for was the storage of the | underlying timeseries data (600GB maybe?) for fast server | startup, so that they could load the index off disk in less than | a minute. We designed a custom binary disk format to make that | blazingly fast, and then just threw binary files into a cloud | filesystem. | | If you need to serve < 100GB of data and most of it is | static...IMHO, screw the cloud, use a big server and replicate it | for fail-over. Unless you got really high write rates or have | seriously stringent transactional requirements, then man, a | couple servers will do it. | | YMMV, but holy crap, servers are huge these days. | eastbound wrote: | When you say "screw the cloud", you mean "administer an EC2 | machine yourself" or really "buy your own hardware"? | titzer wrote: | The former, mostly. You don't necessarily have to use EC2, | but that's easy to do. There are many other, smaller | providers if you really want to get out from under the big 3. | I have no experience managing hardware, so I personally | wouldn't take that on myself. | sllabres wrote: | I would think that it can hold 1TB of RAM _per_socket_ (with 64GB | DIMM), so _2TB_ total. | bob1029 wrote: | > 1 million IOPS on a NoSQL database | | I have gone well beyond this figure by doing clever tricks in | software and batching multiple transactions into IO blocks where | feasible. If your average transaction is substantially smaller | than the IO block size, then you are probably leaving a lot of | throughput on the table. | | The point I am trying to make is that even if you think "One Big | Server" might have issues down the road, there are always some | optimizations that can be made. Have some faith in the vertical. | | This path has worked out _really_ well for us over the last | ~decade. New employees can pick things up much more quickly when | you don 't have to show them the equivalent of a nuclear reactor | CAD drawing to get started. | mathisonturing wrote: | > batching multiple transactions into IO blocks where feasible. | If your average transaction is substantially smaller than the | IO block size, then you are probably leaving a lot of | throughput on the table. | | Could you expand on this? A quick Google search didn't help. | Link to an article or a brief explanation would be nice! | bob1029 wrote: | Sure. If you are using some micro-batched event processing | abstraction, such as the LMAX Disruptor, you have an | opportunity to take small batches of transactions and process | them as a single unit to disk. | | For event sourcing applications, multiple transactions can be | coalesced into a single IO block & operation without much | drama using this technique. | | Surprisingly, this technique also _lowers_ the amount of | latency that any given user should experience, despite the | fact that you are "blocking" multiple users to take | advantage of small batching effects. | lanstin wrote: | I didn't see a point of cloudy services being easier to manage. | If some team gets a capital budget to buy that one big server, | they will put every thing on it, no matter your architectural | standards. Cron jobs editing state on disk, tmux sessions shared | between teams, random web servers doing who knows what, non-DBA | team Postgres installs, etc. at least in cloud you can limit | certain features and do charge back calculations. | | Not sure if that is a net win for cloud or physical, of course, | but I think it is a factor | kgeist wrote: | One of our projects uses 1 big server and indeed, everyone | started putting everything on it (because it's powerful): the | project itself, a bunch of corporate sites, a code review tool, | and god knows what else. Last week we started having issues | with the projects going down because something is overloading | the system and they still can't find out what exactly without | stopping services/moving them to a different machine | (fortunately, it's internal corporate stuff, not user-facing | systems). The main problem I've found with this setup is that | random stuff can accumulate with time and then one | tool/process/project/service going out of control can bring | down the whole machine. If it's N small machines, there's | greater isolation. | pclmulqdq wrote: | It sounds like you need some containers. | kbenson wrote: | One server is for a hobby, not a business. Maybe that's fine, but | keep that in mind. Backups at that level are something that keeps | you from losing all data, not something that keeps you running | and gets you up in any acceptable timeframe for most businesses. | | That doesn't mean you need to use the cloud, it just means one | big piece of hardware with all its single points of failure is | often not enough. Two servers gets you so much more than one. You | can make one a hot spare, or actually split services between them | and have each be ready to take over for specific services for the | other, greatly including your burst handling capability and | giving you time to put more resources in place to keep n+1 | redundancy going if you're using more than half of a server's | resources. | secabeen wrote: | This is exactly the OPs recommended solution: | | > One Server (Plus a Backup) is Usually Plenty | kbenson wrote: | The I guess my first sentence is about _eqally_ as click- | baity as the article title. ;) | vitro wrote: | Let's Encrypt's database server [1] would beg to differ. For | businesses at certain scale two servers are really an overkill. | | [1] https://letsencrypt.org/2021/01/21/next-gen-database- | servers... | mh- wrote: | That says they use a single _database_ , as in a logical | MySQL database. I don't see any claim that they use a single | _server_. In fact, the title of the article you 've linked | suggests they use multiple. | simonw wrote: | https://letsencrypt.status.io/ shows a list of their | servers, which look to be spread across three data centers | (one "public", two "high availability"). | kbenson wrote: | Do we know if it shows cold spares? That's all I think is | needed at a minimum to avoid the problems I'm talking | about, and I doubt they would note those if they don't | necessarily have a hostname. | kbenson wrote: | Do they actually say they don't have a slave to that database | ready to take over? I seriously doubt Let's Encrypt has no | spare. | | Note I didn't say you shouldn't run one service (as in | daemon) or set of services from one box, just that one box is | not enough and you need that spare. | | It Let's Encrypt actually has no spare for their database | server and they're one hardware failure away from being down | for what may be a large chunk of time (I highly doubt it), | then I wouldn't want to use them even if free. Thankfully, I | doubt your interpretation of what that article is saying. | vitro wrote: | You're right, from the article: | | > The new AMD EPYC CPUs sit at about 25%. You can see in | this graph where we promoted the new database server from | replica (read-only) to primary (read/write) on September | 15. | kubb wrote: | As per usual, don't copy Google if you don't have the same | requirements. Google Search never goes down. HN goes down from | time and nobody minds. Google serves tens (hundreds?) of | thousands of queries per second. HN serves ten. HN is fine with | one server because it's small. How big is your service going to | be? Do that boring math :) | FartyMcFarter wrote: | Even Google search has gone down apparently, for five minutes | in 2013: | | https://www.cnet.com/tech/services-and-software/google-goes-... | terafo wrote: | There were huge availability issues as recent as December | 14th 2020, for 45 minutes. | roflyear wrote: | Correct. I like to ask "how much money do we lose if the site | goes down for 1hr? a day?" etc.. and plan around that. If you | are losing 1m an hour, or 50m if it goes down for a day, hell | yeah you should spend a few million on making sure your site | stays online! | | But, it is amazing how often c-levels cannot answer this | question! | _nhh wrote: | I agree | rbanffy wrote: | I wouldn't recommend one, but at least two, for redundancy. | londons_explore wrote: | Don't be scared of 'one big server' for reliability. I'd bet that | if you hired a big server today in a datacenter, the hardware | will have more uptime than something cloud-native with az- | failover hosted on AWS. | | Just make sure you have a tested 30 minute restoration plan in | case of permanent hardware failure. You'll probably only use it | once every 50 years on average, but it will be an expensive event | when it happens. | cpursley wrote: | You've got features to ship. Stick your stuff on Render.com and | don't think about it again. Even a dummy like me can manage that. | alexpotato wrote: | My favorite summary of why not to use microservices is from Grug: | | "grug wonder why big brain take hardest problem, factoring system | correctly, and introduce network call too | | seem very confusing to grug" | | https://grugbrain.dev/#grug-on-microservices | fleddr wrote: | Our industry summarized: | | Hardware engineers are pushing the absolute physical limits of | getting state (memory/storage) as close as possible to compute. A | monumental accomplishment as impactful as the invention of | agriculture and the industrial revolution. | | Software engineers: let's completely undo all that engineering by | moving everything apart as far as possible. Hmmm, still too fast. | Let's next add virtualization and software stacks with shitty | abstractions. | | Fast and powerful browser? Let's completely ignore 20 years of | performance engineering and reinvent...rendering. Hmm, sucks a | bit. Let's add back server rendering. Wait, now we have to render | twice. Ah well, let's just call it a "best practice". | | The mouse that I'm using right now (an expensive one) has a 2GB | desktop Electron app that seems to want to update itself twice a | week. | | The state of us, the absolute garbage that we put out, and the | creative ways in which we try to justify it. It's like a mind | virus. | | I want my downvotes now. | GuB-42 wrote: | Actually, for those who push for these cloudy solutions, they | do that in part to make data close to you. I am talking mostly | about CDNs, I don't thing YouTube and Netflix would have been | possible without them. | | Google is a US company, but you don't want people in Australia | to connect to the other side of the globe every time they need | to access Google services, it would be an awful waste of | intercontinental bandwidth. Instead, Google has data centers in | Australia to serve people in Australia, and they only hit US | servers when absolutely needed. And that's when you need to | abstract things out. If something becomes relevant in | Australia, move it in there, and move it out when it no longer | matters. When something big happens, copy it everywhere, and | replace the copies by something else as interest wanes. | | Big companies need to split everything, they can't centralize | because the world isn't centralized. The problem is when small | businesses try to do the same because "if Google is so | successful doing that, it must be right". Scale matters. | Foomf wrote: | You've more or less described Wirth's Law: | https://en.wikipedia.org/wiki/Wirth%27s_law | fleddr wrote: | I had no idea, thanks. Consider this a broken clock being | sometimes right. | kkielhofner wrote: | Great article overall with many good points worth considering. | Nothing is one size fits all so I won't get into the crux of the | article: "just get one big server". I recently posted a comment | breaking down the math for my situation: | | https://news.ycombinator.com/item?id=32250470#32253635 | | For the most "extreme" option of buying your own $40k server from | Dell I'm always surprised at how many people don't consider | leasing. No matter what it breaks the cost into an operating | expense vs a capital one which is par with the other options in | terms of accounting and doesn't require laying out $40k. | | Adding on that, in the US we have some absolutely wild tax | advantages for large "capital expenditures" that also apply to | leasing: | | https://www.section179.org/section_179_leases/ | phendrenad2 wrote: | The problem with "one big server" is, you really need good | IT/ops/sysadmin people who can think in non-cloud terms. (If you | catch them installing docker on it, throw them into a lava pit | immediately). | henry700 wrote: | What's the problem with installing Docker so you can run | containers of diferent distros, languages & flavors using the | same one big server though? | londons_explore wrote: | One-big-VM is another approach... | | A big benefit is some providers will let you resize the VM bigger | as you grow. The behind-the-scenes implementation is they migrate | your VM to another machine with near-zero downtime. Pretty cool | tech, and takes away a big disadvantage of bare metal which is | growth pains. | lrvick wrote: | A consequence of one-big-server is decreased security. You become | discouraged from applying patches because you must reboot. Also | if one part of the system is compromised, every service is now | compromised. | | Microservices on distinct systems offer damage control. | jvanderbot wrote: | No thanks. I have a few hobby sites, a personal vanity page, and | some basic CPU expensive services that I use. | | Moving to Aws server-less has saved me so much headache with | system updates, certificate management, archival and backup, | networking, and so much more. Not to mention with my low-but- | spikey load, my breakeven is a long way off. | SassyGrapefruit wrote: | >Use the Cloud, but don't be too Cloudy | | The number of applications I have inherited that were messes | falling apart at the seams because of misguided attempts to avoid | "vendor lockin" with the cloud can not be understated. There is | something I find ironic about people paying to use a platform but | not using it because they feel like using it too much will make | them feel compelled to stay there. Its basically starving | yourself so you don't get too familiar with eating regularly. | | Kids this PSA is for you. Auto Scaling Groups are just fine as | are all the other "Cloud Native" services. Most business partners | will tell you a dollar of growth is worth 5x-10x the value of a | dollar of savings. Building a huge tall computer will be cheaper | but if it isn't 10x cheaper(And that is Total Cost of Ownership | not the cost of the metal) and you are moving more slowly than | you otherwise would its almost a certainty you are leaving money | on the table. | meeks wrote: | The whole argument comes down to bursty vs. non-bursty workloads. | What type of workloads make up the fat part of the distribution? | If most use cases are bursty (which I would argue they are) then | the author's argument only applies for specific applications. | Therefore, most people do indeed see cost benefits from the | cloud. | galkk wrote: | One of first experiences in my professional career was situation | when "one big server" that was serving the system that was making | money actually failed on Friday, HP's warranty was like next or 2 | business days to get a replacement. | | The entire situation ended up having conference call with | multiple department directors who were deciding which server from | other systems to cannibalize (even if it is underpowered) to get | the system going. | | Since that time I'm quite skeptical about "one", and to me this | is one of big benefits of cloud provides, as, most likely, there | is another instance and stockouts are more rare. | jmull wrote: | The article is really talking about one big server plus a | backup vs. cloud providers. | mochomocha wrote: | > Why Should I Pay for Peak Load? [...] someone in that supply | chain is charging you based on their peak load | | Oh it's even worse than that: this someone oversubscribe your | hardware a little during your peak and a lot during your trough, | padding their great margins at the expense of extra cache | misses/perf degradation of your software that most of the time | you won't notice if they do their job well. | | This is one of the reasons why large companies such as my | employer (Netflix) are able to invest into their own compute | platforms to reclaim some of these gains back, so that any | oversubscription & collocation gains materialize into a lower | cloud bill - instead of having your spare CPU cycles be funneled | to a random co-tenant customer of your cloud provider, the latter | capturing the extra value. | robertlagrant wrote: | This is why I like Cloudflare's worker model. It feels like the | usefulness of cloud deployments, but with a pretty restrained | pricing model. | system2 wrote: | It blows my mind people are spending $2000+ per month for a | server they can get used for $4000-5000 one time only cost. | | VMWare + Synology Business Backup + Synology C2 backup is our way | of doing business and never failed us for over 7 years. Why do | people spend so much money for cloud while they can host it | themselves less than 5% of the cost? (2 year usage assumed). | adlpz wrote: | I've tried it all except this, including renting bare metal. | Nowadays I'm in the cloud but not _cloudy_ camp. Still, I 'm | intrigued. | | Apart from the $4-5k server, what are your running costs? | Licenses? Colocation? Network? | vgeek wrote: | https://www.he.net/colocation.html | | They have been around forever and their $400 deal is good, | but that is for 42U, 1G and only 15 amps. With beefier | servers, you will need more current (both BW and amperage) if | you intend on filling the rack. | soruly wrote: | that's why letsencrypt use a single database on a powerful server | https://letsencrypt.org/2021/01/21/next-gen-database-servers... | wahnfrieden wrote: | I've started augmenting one big server with iCloud (CloudKit) | storage, specifically syncing local Realm DBs to the user's own | iCloud storage. Which means I can avoid taking custody of | PII/problematic data, can include non-custodial privacy in | product value/marketing, and means I can charge enough of a | premium for the one big server to keep it affordable. I know how | to scale servers in and out, so I feel the value of avoiding all | that complexity. This is a business approach that leans into | that, with a way to keep the business growing with domain | complexity/scope/adoption (iCloud storage, probably other good | APIs like this to work with along similar lines). | dugmartin wrote: | I think Elixir/Erlang is uniquely positioned to get more traction | in the inevitable microservice/kubernetes backlash and the return | to single server deploys (with a hot backup). Not only does it | usually sip server resources but it also scales naturally as more | cores/threads are available on a server. | lliamander wrote: | Going _from_ an Erlang "monolith" to a java/k8s cluster, I was | amazed at how much more work it is takes to build a "modern" | microservice. Erlang still feels like the future to me. | dougmoscrop wrote: | Can you imagine if even a fraction of the effort poured in to | k8s tooling had gone in to the Erlang/OTP ecosystem instead? | dboreham wrote: | This is the norm. It's only weird things like Node.js and Ruby | that don't have this property. | hunterloftis wrote: | While individual Node.js processes are single-threaded, | Node.js includes a standard API that distributes its load | across multiple processes, and therefor cores. | | - https://nodejs.org/api/cluster.html#cluster | throwaway787544 wrote: | I have been doing this for two decades. Let me tell you about | bare metal. | | Back in the day we had 1,000 physical servers to run a large | scale web app. 90% of that capacity was used only for two months. | So we had to buy 900 servers just to make most of our money over | two events in two seasons. | | We also had to have 900 servers because even one beefy machine | has bandwidth and latency limits. Your network switch simply | can't pump more than a set amount of traffic through its | backplane or your NICs, and the OS may have piss-poor packet | performance too. Lots of smaller machines allow easier scaling of | network load. | | But you can't just buy 900 servers. You always need more | capacity, so you have to predict what your peak load will be, and | buy for that. And you have to do it well in advance because it | takes a long time to build and ship 900 servers and then assemble | them, run burn-in, replace the duds, and prep the OS, firmware, | software. And you have to do this every 3 years (minimum) because | old hardware gets obsolete and slow, hardware dies, disks die, | support contracts expire. But not all at once, because who knows | what logistics problems you'd run into and possibly not get all | the machines in time to make your projected peak load. | | If back then you told me I could turn on 900 servers for 1 month | and then turn them off, no planning, no 3 year capital outlay, no | assembly, burn in, software configuration, hardware repair, etc | etc, I'd call you crazy. Hosting providers existed but _nobody_ | could just give you 900 servers in an hour, _nobody_ had that | capacity. | | And by the way: cloud prices are _retail prices_. Get on a | savings plan or reserve some instances and the cost can be half. | Spot instances are a quarter or less the price. Serverless is | pennies on the dollar with no management overhead. | | If you don't want to learn new things, buy one big server. I just | pray it doesn't go down for you, as it can take up to several | days for some cloud vendors to get some hardware classes in some | regions. And I pray you were doing daily disk snapshots, and can | get your dead disks replaced quickly. | MrStonedOne wrote: | i handled a 8x increase in traffic to my website from a | youtuber reviewing our game, by increasing the cache timer and | fixing the wiki creating session table entries for logged out | users on a wiki that required accounts to edit it. | | we already get multiple millions of page hits a months for this | happened. | | This server had 8 cores but 5 of them were reserved for the | 10tb a month in bandwidth game servers running on the same | machine. | | If you needed 1,000 physical computers to run your webapp, you | fucked up somewhere along the line. | toast0 wrote: | > I have been doing this for two decades. Let me tell you about | bare metal. | | > Back in the day we had 1,000 physical servers to run a large | scale web app. 90% of that capacity was used only for two | months. So we had to buy 900 servers just to make most of our | money over two events in two seasons. | | > We also had to have 900 servers because even one beefy | machine has bandwidth and latency limits. Your network switch | simply can't pump more than a set amount of traffic through its | backplane or your NICs, and the OS may have piss-poor packet | performance too. Lots of smaller machines allow easier scaling | of network load. | | I started working with real (bare metal) servers on real | internet loads in 2004 and retired in 2019. While there's truth | here, there's also missing information. In 2004, all my servers | had 100M ethernet, but in 2019, all my new servers had 4x10G | ethernet (2x public, 2x private), actually some of them had 6x, | but with 2x unconnected, I dunno why. In the meantime, cpu, | nics, and operating systems have improved such that if you're | not getting line rate for full mtu packets, it's probably | becsause your application uses a lot of cpu, or you've hit a | pathological case in the OS (which happens, but if you're | running 1000 servers, you've probably got someone to debug | that). | | If you still need 1000 beefy 10G servers, you've got a pretty | formidable load, but splitting it up into many more smaller | servers is asking for problems of different kinds. Otoh, if | your load really scales to 10x for a month, and you're at that | scale, cloud economics are going to work for you. | | My seasonal loads were maybe 50% more than normal, but usage | trends (and development trends) meant that the seasonal peak | would become the new normal soon enough; cloud managing the | peaks would help a bit, but buying for the peak and keeping it | running for the growth was fine. Daily peaks were maybe 2-3x | the off-peak usage, 5 or 6 days a week; a tightly managed cloud | provisioning could reduce costs here, but probably not enough | to compete with having bare metal for the full day. | taylodl wrote: | That's a good point about cloud services being retail. My | company gets a very large discount from one of the most well- | known cloud providers. This is available to everybody - | typically if you commit to 12 months of a minimum usage then | you can get substantial discounts. What I know is so far | everything we've migrated to the cloud has resulted in | _significantly_ reduced total costs, increased reliability, | improved scalability, and is easier to enhance and remediate. | Faster, cheaper, better - that 's been a huge win for us! | fleddr wrote: | The entire point of the article is that your dated example no | longer applies: you can fit the vast majority of common loads | on a single server now, they are this powerful. | | Redundancy concerns are also addressed in the article. | PaulDavisThe1st wrote: | > If you don't want to learn new things, buy one big server. I | just pray it doesn't go down for you | | There's intermediate ground here. Rent one big server, reserved | instance. Cloudy in the sense that you get the benefits of the | cloud provider's infrastructure skills and experience, and | uptime, plus easy backup provisioning; non-cloudy in that you | can just treat that one server instance like your own hardware, | running (more or less) your own preferred OS/distro, with | "traditional" services running on it (e.g. in our case: nginx, | gitea, discourse, mantis, ssh) | yardie wrote: | Let me take you back to March, 2020. When millions of Americans | woke up to find out there was a pandemic and they would be | working from home now. Not a problem, I'll just call up our | cloud provider and request more cloud compute. You join a queue | of a thousand other customers calling in that morning for the | exact same thing. A few hours on hold and the CSR tells you | they aren't provisioning anymore compute resources. east-us is | tapped out, central-europe tapped out hours ago, California got | a clue and they already called to reserve so you can't have | that either. | | I use cloud all the time but there are also blackswan events | where your IaaS can't do anymore for you. | tempnow987 wrote: | I never had this problem on AWS though I did see some | startups struggle with some more specialized instances. Are | midsize companies actually running into issues with non- | specialized compute on AWS? | kardianos wrote: | That sounds like you have burst load. Per the article, cloud | away, great fit. | | The point was most people don't have that and even their bursts | can fit in a single server. This is my experience as well. | maxbond wrote: | The thing that confuses me is, isn't every publicly | accessible service bursty on a long timescale? Everything | looks seasonal and predictable until you hit the front page | of Reddit, and you don't know what day that will be. You | don't decide how much traffic you get, the world does. | genousti wrote: | Funily hitting reddit front page might ruin you if you run | on aws | NorwegianDude wrote: | Hitting the front page of reddit is insignificant, it's not | like you'll get anywhere near thousands upon thousands of | requests each second. If you have a somewhat normal website | and you're not doing something weird then it's easily | handled with a single low-end server. | | If I get so much traffic that scaling becomes a problem | then I'll be happy as I would make a ton of money. No need | to build to be able to handle the whole world at the same | time, that's just a waste of money in nearly all | situations. | taylodl wrote: | If you're hosting on-prem then you have a cluster to configure | and manage, you have multiple data centers you need to provision, | you need data backups you have to manage plus the storage | required for all those backups. Data centers also require power, | cooling, real estate taxes, administration - and you need at | least two of them to handle systemic outages. Now you have to | manage and coordinate your data between those data centers. None | of this is impossible of course, companies have been doing this | everyday for decades now. But let's not pretend it doesn't all | have a cost - and unless your business is running a data center, | none of these costs are aligned with your business' core mission. | | If you're running a start-up it's pretty much a no-brainer you're | going to start off in the cloud. | | What's the real criteria to evaluate on-prem versus the cloud? | Load consistency. As the article notes, serverless cloud | architectures are perfect for bursty loads. If your traffic is | highly variable then the ability to quickly scale-up and then | scale-down will be of benefit to you - and there's a lot of | complexity you don't have to manage to boot! Generally speaking | such a solution is going to be cheaper and easier to configure | and manage. That's a win-win! | | If your load isn't as variable and you therefore have cloud | resources always running, then it's almost always cheaper to host | those applications on-prem - assuming you have on-prem hosting | available to you. As I noted above, building data centers isn't | cheap and it's almost always cheaper to stay in the cloud than it | is to build a new data center, but if you already have data | center(s) then your calculus is different. | | Another thing to keep in mind at the moment is even if you decide | to deploy on-prem you may not be able to get the hardware you | need. A colleague of mine is working on a large project that's to | be hosted on-prem. It's going to take 6-12 months to get all the | required hardware. Even prior to the pandemic the backlog was 3-6 | months because the major cloud providers are consuming all the | hardware. Vendors would rather deal with buyers buying hardware | by the tens of thousands than a shop buying a few dozen servers. | You might even find your hardware delivery date getting pushed | out as the "big guys" get their orders filled. It happens. | ozim wrote: | You know you can run a server in the cellar under your stairs. | | You know that if you are a startup you can just keep servers in | a closet and hope that no one turns on coffee machine while | airco runs because it will pop circuit breakers, which will | take down your server or maybe you might have UPS at least so | maybe not :) | | I have read horror stories about companies having such setups. | | While they don't need multiple data centers, power, cooling and | redundancy sounds for them like some kind of STD - getting | cheap VPS should be default for such people. That is a win as | well. | nostrebored wrote: | As someone who's worked in cloud sales and no longer has any skin | in the game, I've seen firsthand how cloud native architectures | improve developer velocity, offer enhanced reliability and | availability, and actually decrease lock-in over time. | | Every customer I worked with who had one of these huge servers | introduced coupling and state in some unpleasant way. They were | locked in to persisted state, and couldn't scale out to handle | variable load even if they wanted to. Beyond that, hardware | utilization became contentious at any mid-enterprise scale. | Everyone views the resource pool as theirs, and organizational | initiatives often push people towards consuming the same types of | resources. | | When it came time to scale out or do international expansion, | every single one of my customers who had adopted this strategy | had assumptions baked into their access patterns that made sense | given their single server. When it came time to store some part | of the state in a way that made sense for geographically | distributed consumers, it was months not sprints of time spent | figuring out how to hammer this in to a model that's | fundamentally at odds. | | From a reliability and availability standpoint, I'd often see | customers tell me that 'we're highly available within a single | data center' or 'we're split across X data centers' without | considering the shared failure modes that each of these data | centers had. Would a fiber outage knock out both of your DCs? | Would a natural disaster likely knock something over? How about | _power grids_? People often don't realize the failure modes | they've already accepted. | | This is obviously not true for every workload. It's tech, there | are tradeoffs you're making. But I would strongly caution any | company that expects large growth against sitting on a single- | server model for very long. | secabeen wrote: | The common element in the above is scaling and reliability. | While lots of startups and companies are focused on the 1% | chance that they are the next Google or Shopify, the reality is | that nearly all aren't, and the overengineering and redundancy- | first model that cloud pushes does cost them a lot of runway. | | It's even less useful for large companies; there is no world in | which Kellogg is going to increase sales by 100x, or even 10x. | nostrebored wrote: | But most companies aren't startups. Many companies are | established, growing businesses with a need to be able to | easily implement new initiatives and products. | | The benefits of cloud for LE are completely different. I'm | happy to break down why, but I addressed the smb and mid- | enterprise space here because most large enterprises already | know they shouldn't run on a single rack. | secabeen wrote: | > I addressed the smb and mid-enterprise space here because | most large enterprises already know they shouldn't run on a | single rack. | | This is a straw man. No one, anywhere in this thread or in | the OPs original article proposed a single-rack solution. | | From the OP: > Running a primary and a backup server is | usually enough, keeping them in different datacenters. | nostrebored wrote: | This is just a complete lack of engagement with the post. | Most LE's know they shouldn't run a two rack setup | either. That is not the size or layout of any LE that | I've interacted with. The closest is a bank in the | developing world that had a few racks split across data | centers in the same city and was desperately trying to | move away given power instability in the country. | tboyd47 wrote: | Could confirmation bias affect your analysis at all? | | How many companies went cloud-first and then ran out of money? | You wouldn't necessary know anything about them. | | Were the scaling problems your single-server customers called | you to solve unpleasant enough put their core business in | danger? Or was the expense just a rounding error for them? | nostrebored wrote: | From this and the other comment, it looks like I wasn't clear | about talking about SMB/ME rather than a seed/pre-seed | startup, which I understand can be confusing given that we're | on HN. | | I can tell you that I've never seen a company run out of | money from going cloud-first (sample size of over 200 that I | worked with directly). I did see multiple businesses scale | down their consumption to near-zero and ride out the | pandemic. | | The answer to scaling problems being unpleasant enough to put | the business in danger is yes, but that was also during the | pandemic when companies needed to make pivots to slightly | different markets. Doing this was often unaffordable from an | implementation cost perspective at the time when it had to | happen. I've seen acquisitions fall through due to an | inability to meet technical requirements because of stateful | monstrosities. I've also seen top-line revenue get severely | impacted when resource contention causes outages. | | The only times I've seen 'cloud-native' truly backfire were | when companies didn't have the technical experience to move | forward with these initiatives in-house. There are a lot of | partners in the cloud implementation ecosystem who will | fleece you for everything you have. One such example was a | k8s microservices shop with a single contract developer | managing the infra and a partner doing the heavy lifting. The | partner gave them the spiel on how cloud-native provides | flexibility and allows for reduced opex and the customer was | very into it. They stored images in a RDBMS. Their database | costs were almost 10% of the company's operating expenses by | the time the customer noticed that something was wrong. | stevenjgarner wrote: | If you are not maxing out or even getting above 50% utilization | of _128 physical cores (256 threads), 512 GB of memory, and 50 | Gbps of bandwidth for $1,318 /month_, I really like the approach | of multiple low-end consumable computers as servers. I have been | using arrays of Intel NUCs at some customer sites for years with | considerable cost savings over cloud offerings. Keep an extra | redundant one in the array ready to swap out a failure. | | Another often overlooked option is that in several fly-over | states it is quite easy and cheap to register as a public | telecommunication utility. This allows you to place a powered | pedestal in the public right-of-way, where you can get situated | adjacent to an optical meet point and get considerable savings on | installation costs of optical Internet, even from a tier 1 | provider. If your server bandwidth is peak utilized during | business hours and there is an apartment complex nearby you can | use that utility designation and competitively provide | residential Internet service to offset costs. | warmwaffles wrote: | > I have been using arrays of Intel NUCs at some customer sites | for years | | Stares at the 3 NUCs on my desk waiting to be clustered for a | local sandbox. | titzer wrote: | This is pretty devious and I love it. | tzs wrote: | I don't understand the pedestal approach. Do you put your | server in the pedestal, so the pedestal is in effect your data | center? | saulrh wrote: | > competitively provide residential > Internet service to | offset costs. | | I uh. Providing residential Internet for an apartment complex | feels like an entire business in and of itself and wildly out | of scope for a small business? That's a whole extra competency | and a major customer support commitment. Is there something I'm | missing here? | stevenjgarner wrote: | It depends on the scale - it does not have to be a major | undertaking. You are right, it is _a whole extra competency | and a major customer support commitment_ , but for a lot of | the entrepreneurial folk on HN quite a rewarding and | accessible learning experience. | | The first time I did anything like this was in late 1984 in a | small town in Iowa where GTE was the local telecommunication | utility. Absolutely abysmal Internet service, nothing | broadband from them at the time or from the MSO (Mediacom). I | found out there was a statewide optical provider with cable | going through the town. I incorporated an LLC, became a | utility and built out less than 2 miles of single mode fiber | to interconnect some of my original software business | customers at first. Our internal moto was "how hard can it | be?" (more as a rebuke to GTE). We found out. The whole 24x7 | public utility thing was very difficult for just a couple of | guys. But it grew from there. I left after about 20 years and | today it is a thriving provider. | | Technology has made the whole process so much easier today. I | am amazed more people do not do it. You can get a small rack- | mount sheet metal pedestal with an AC power meter and an HVAC | unit for under $2k. Being a utility will allow you to place | that on a concrete pad or vault in the utility corridor | (often without any monthly fee from the city or county). You | place a few bollards around it so no one drives into it. You | want to get quotes from some tier 1 providers [0]. They will | help you identify the best locations to engineer an optical | meet and those are the locations you run by the | city/county/state utilities board or commission. | | For a network engineer wanting to implement a fault tolerant | network, you can place multiple pedestals at different | locations on your provider's/peer's network to create a route | diversified protected network. | | After all, when you are buying expensive cloud based services | that literally is all your cloud provider is doing ... just | on a completely more massive scale. The barrier to entry is | not as high as you might think. You have technology offerings | like OpenStack [1], where multiple competitive vendors will | also help you engineer a solution. The government also | provides (financial) support [2]. | | The best perk is the number of parking spaces the requisite | orange utility traffic cone opens up for you. | | [0] https://en.wikipedia.org/wiki/Tier_1_network | | [1] https://www.openstack.org/ | | [2] https://www.usda.gov/reconnect | MockObject wrote: | In 1984, I am guessing the only use case for broadband | internet was running an NNTP server? | marktangotango wrote: | This is some old school stuff right here. I have a hard | time believing this sort of gumption and moxy are as | prevalent today. | | > The best perk is the number of parking spaces the | requisite orange utility traffic cone opens up for you. | | That's hilarious. | bombcar wrote: | You're missing "apartment complex" - you as the service | provider contract with the apartment management company to | basically cover your costs, and they handle the day-to-day | along with running the apartment building. | | Done right, it'll be cheaper for them (they can advertise | "high speed internet included!" or whatever) and you won't | have much to do assuming everything on your end just works. | | The days where small ISPs provided things like email, web | hosting, etc, are long gone; you're just providing a DHCP IP | and potentially not even that if you roll out carrier-grade | NAT. | erichocean wrote: | > _it is quite easy and cheap to register as a public | telecommunication utility_ | | Is North Carolina one of those states? I'm intrigued... | stevenjgarner wrote: | I have only done a few midwestern states. Call them and ask | [0] - (919) 733-7328. You may want to first call your | proposed county commissioner's office or city hall (if you | are not rural), and ask them who to talk with about a new | local business providing Internet service. If you can show | the Utilities Commission that you are working with someone at | the local level I have found they will treat you more | seriously. In certain rural counties, you can even qualify | for funding from the Rural Utilities Service of the USDA. | | [0] https://www.ncuc.net/ | | EDIT: typos + also most states distinguish between | facilities-based ISP's (ie with physical plant in the | regulated public right-of-way) and other ISPs. Tell them you | are looking to become a facilities-based ISP. | xen2xen1 wrote: | What other benefits are there to being a "public | telecommunication utility"? | stevenjgarner wrote: | The benefit that is obvious to the regulators is that you | can charge money for services. So for example, offering | telephone services requires being a LEC (local exchange | carrier) or CLEC (competitive local exchange carrier). | But even telephone services have become considerably | unregulated through VoIP. It's just that at some point, | the VoIP has to terminate/interface with a (C)LEC | offering real dial tone and telephone numbering. You can | put in your own Asterisk server [0] and provide VoIP | service on your burgeoning optical utilities network, | together with other bundled services including | television, movies, gaming, metering etc.. All of these | offerings can be resold from wholesale services, where | all you need is an Internet feed. | | Other benefits to being a "public telecommunication | utility" include the competitive right to place your own | facilities on telephone/power poles or underground in | public right-of-way under the Telecommunications Act of | 1996. You will need to enter into and pay for a pole | attachment agreement. Of course local governments can | reserve the right to tariff your facilities, which has | its own ugliness. | | One potentially valuable thing a utility can do is place | empty conduit in public right of way that can be | used/resold in the future at a (considerable) gain. For | example, before highways, roadways, airports and other | infrastructure is built, it is orders of magnitude | cheaper just to plow conduit under bare ground before the | improvements are placed. | | [0] https://www.asterisk.org/ | eek2121 wrote: | > Other benefits to being a "public telecommunication | utility" include the competitive right to place your own | facilities on telephone/power poles or underground in | public right-of-way under the Telecommunications Act of | 1996. You will need to enter into and pay for a pole | attachment agreement. Of course local governments can | reserve the right to tariff your facilities, which has | its own ugliness. | | Note that in many parts of the country, the | telcos/cablecos themselves own the poles. Google had a | ton of trouble with AT&T in my state thanks to this. They | lost to AT&T in court and gave up. | count wrote: | While VOIP is mostly unregulated, be acutely aware of | e-911 laws and requirements. This isn't the Wild West | shitshow it was in 2003 when I was doing similar things | :) | | https://www.intrado.com/life-safety/e911-regulations has | a good overview and links to applicable CFR/rules. | erichocean wrote: | Thanks! | stevenjgarner wrote: | Feel free to reach out at my gmail [0] | | [0] https://news.ycombinator.com/user?id=stevenjgarner | cfors wrote: | Yep, there's a premium on making your architecture more cloudy. | However, the best point for Use One Big Server is not necessarily | running your big monolithic API server, but your database. | | Use One Big Database. | | Seriously. If you are a backend engineer, nothing is worse than | breaking up your data into self contained service databases, | where everything is passed over Rest/RPC. Your product asks will | consistently want to combine these data sources (they don't know | how your distributed databases look, and oftentimes they really | do not care). | | It is so much easier to do these joins efficiently in a single | database than fanning out RPC calls to multiple different | databases, not to mention dealing with inconsistencies, lack of | atomicity, etc. etc. Spin up a specific reader of that database | if there needs to be OLAP queries, or use a message bus. But keep | your OLTP data within one database for as long as possible. | | You can break apart a stateless microservice, but there are few | things as stagnant in the world of software than data. It will | keep you nimble for new product features. The boxes that they | offer on cloud vendors today for managed databases are giant! | s_dev wrote: | >Use One Big Database. | | It may be reasonable to have two databases e.g. a class a and | class b for pci compliance. So context still deeply matters. | | Also having a dev DB with mock data and a live DB with real | data is a common setup in many companies. | belak wrote: | This is absolutely true - when I was at Bitbucket (ages ago at | this point) and we were having issues with our DB server | (mostly due to scaling), almost everyone we talked to said "buy | a bigger box until you can't any more" because of how complex | (and indirectly expensive) the alternatives are - sharding and | microservices both have a ton more failure points than a single | large box. | | I'm sure they eventually moved off that single primary box, but | for many years Bitbucket was run off 1 primary in each | datacenter (with a failover), and a few read-only copies. If | you're getting to the point where one database isn't enough, | you're either doing something pretty weird, are working on a | specific problem which needs a more complicated setup, or have | grown to the point where investing in a microservice | architecture starts to make sense. | thayne wrote: | One issue I've seen with this is that if you have a single, | very large database, it can take a very, very long time to | restore from backups. Or for that matter just taking backups. | | I'd be interested to know if anyone has a good solution for | that. | rszorness wrote: | Try out pg_probackup. It works on database files directly. | Restore is as fast as you can write on your ssd. | | I've setup a pgsql server with timescaledb recently. | Continuing backup based on WAL takes seconds each hour and | a complete restore takes 15 minutes for almost 300 GB of | data because the 1 GBit connection to the backup server is | the bottleneck. | Svenstaro wrote: | I found this approach pretty cool in that regard: | https://github.com/pgbackrest/pgbackrest | dsr_ wrote: | Here's the way it works for, say, Postgresql: | | - you rsync or zfs send the database files from machine A | to machine B. You would like the database to be off during | this process, which will make it consistent. The big | advantage of ZFS is that you can stop PG, snapshot the | filesystem, and turn PG on again immediately, then send the | snapshot. Machine B is now a cold backup replica of A. Your | loss potential is limited to the time between backups. | | - after the previous step is completed, you arrange for | machine A to send WAL files to machine B. It's well | documented. You could use rsync or scp here. It happens | automatically and frequently. Machine B is now a warm | replica of A -- if you need to turn it on in an emergency, | you will only have lost one WAL file's worth of changes. | | - after that step is completed, you give machine B | credentials to login to A for live replication. Machine B | is now a live, very slightly delayed read-only replica of | A. Anything that A processes will be updated on B as soon | as it is received. | | You can go further and arrange to load balance requests | between read-only replicas, while sending the write | requests to the primary; you can look at Citus (now open | source) to add multi-primary clustering. | hamandcheese wrote: | Do you even have to stop Postgres if using ZFS snapshots? | ZFS snapshots are atomic, so I'd expect that to be fine. | If it wasn't fine, that would also mean Postgres couldn't | handle power failure or other sudden failures. | dsr_ wrote: | You have choices. | | * shut down PG. Gain perfect consistency. | | * use pg_dump. Perfect consistency at the cost of a | longer transaction. Gain portability for major version | upgrades. | | * Don't shut down PG: here's what the manual says: | | However, a backup created in this way saves the database | files in a state as if the database server was not | properly shut down; therefore, when you start the | database server on the backed-up data, it will think the | previous server instance crashed and will replay the WAL | log. This is not a problem; just be aware of it (and be | sure to include the WAL files in your backup). You can | perform a CHECKPOINT before taking the snapshot to reduce | recovery time. | | * Midway: use SELECT pg_start_backup('label', false, | false); and SELECT * FROM pg_stop_backup(false, true); to | generate WAL files while you are running the backup, and | add those to your backup. | mgiampapa wrote: | This isn't really a backup, it's redundancy which is good | thing but not the same as a backup solution. You can't | get out of a drop table production type event this way. | hamandcheese wrote: | If you stop at the first bullet point then you have a | backup solution. | dsr_ wrote: | Precisely so. | thayne wrote: | It doesn't solve the problem that sending that snapshot | to a backup location takes a long time. | maxclark wrote: | Going back 20 years with Oracle DB it was common to use | "triple mirror" on storage to make a block level copy of | the database. Lock the DB for changes, flush the logs, | break the mirror. You now have a point in time copy of | the database that could be mounted by a second system to | create a tape backup, or as a recovery point to restore. | | It was the way to do it, and very easy to manage. | Twisell wrote: | The previous commenter was probably unaware of the | various way to backup recent postgresql release. | | For what you describe a "point in time recovery" backup | would probably be the more adequate flavor | https://www.postgresql.org/docs/current/continuous- | archiving... | | It was first release around 2010 and gained robustness | with every release hence not everyone is aware of it. | | The for instance I don't think it's really required | anymore to shutdown the database to do the initial sync | if you use the proper tooling (for instance pg_basebackup | if I remember correctly) | mike_hearn wrote: | Presumably it doesn't matter if you break your DB up into | smaller DBs, you still have the same amount of data to back | up no matter what. However, now you also have the problem | of snapshot consistency to worry about. | | If you need to backup/restore just one set of tables, you | can do that with a single DB server without taking the rest | offline. | thayne wrote: | > you still have the same amount of data to back up no | matter what | | But you can restore/back up the databases in parallel. | | > If you need to backup/restore just one set of tables, | you can do that with a single DB server without taking | the rest offline. | | I'm not aware of a good way to restore just a few tables | from a full db backup. At least that doesn't require | copying over all the data (because the backup is stored | over the network, not on a local disk). And that may be | desirable to recover from say a bug corrupting or | deleting a customer's data. | nick__m wrote: | On mariadb you can tell the replica to enter into a | snapshotable state[1] and take a simple lvm snapshot, tell | the the database it's over, backup your snapshot somewhere | else and finally delete the snapshot. | | 1) https://mariadb.com/kb/en/storage-snapshots-and-backup- | stage... | altdataseller wrote: | What if your product simply stores a lot of data (ie a search | engine) How is that weird? | skeeter2020 wrote: | This is not typically going to be stored in an ACID- | compliant RDBMS, which is where the most common scaling | problem occurs. Search engines, document stores, adtech, | eventing, etc. are likely going to have a different storage | mechanism where consistency isn't as important. | rmbyrro wrote: | a search engine won't need joins, but other things (ie text | indexing) that can be split in a relatively easier way. | belak wrote: | That's fair - I added "are working on a specific problem | which needs a more complicated setup" to my original | comment as a nicer way of referring to edge cases like | search engines. I still believe that 99% of applications | would function perfectly fine with a single primary DB. | zasdffaa wrote: | Depends what you mean by a database I guess. I take it to | mean an RDBMS. | | RDBMSs provide guarantees that web searching doesn't need. | You can afford to lose a pieces of data, provide not-quite- | perfect results for web stuff. It's just wrong for an | RDBMS. | altdataseller wrote: | What if you are using the database as a system of record | to index into a real search engine like Elasticsearch? | For a product where you have tons of data to search from | (ie text from web pages) | IggleSniggle wrote: | In regards to Elasticsearch, you basically opt-in to | which behavior you want/need. You end up in the same | place: potentially losing some data points or introducing | some "fuzziness" to the results in exchange for speed. | When you ask Elasticsearch to behave in a guaranteed | atomic manner across all records, performing locks on | data, you end up with similar constraints as in a RDBMS. | | Elasticsearch is for search. | | If you're asking about "what if you use an RDBMS as a | pointer to Elasticsearch" then I guess I would ask: why | would you do this? Elasticsearch can be used as a system | of record. You could use an RDBMS over top of | Elasticsearch without configuring Elasticsearch as a | system of record, but then you would be lying when you | refer to your RDBMS as a "system of record." It's not a | "system of record" for your actual data, just a record of | where pointers to actual data were at one point in time. | | I feel like I must be missing what you're suggesting | here. | altdataseller wrote: | Having just an Elasticsearch index without also having | the data in a primary store like a RDMS is an anti- | pattern and not recommended by almost all experts. | Whether you want to call it a "system of record", i wont | argue semantics. But the point is, its recommended hacing | your data in a primary store where you can index into | elasticsearch. | ladyattis wrote: | At my current job we have four different databases so I concur | with this assessment. I think it's okay to have some data in | different DBs if they're significantly different like say the | user login data could be in its own database. But anything that | we do which is a combination of e-commerce and | testing/certification I think they should be in one big | database so I can do reasonable queries for information that we | need. This doesn't include two other databases we have on-prem | which one is a Salesforce setup and another is an internal | application system that essentially marries Salesforce to that. | It's a weird wild environment to navigate when adding features. | jasonwatkinspdx wrote: | A relative worked for a hedge fund that used this idea. They | were a C#/MSSQL shop, so they just bought whatever was the | biggest MSSQL server at the time, updating frequently. They | said it was a huge advantage, where the limit in scale was more | than offset by productivity. | | I think it's an underrated idea. There's a lot of people out | there building a lot of complexity for datasets that in the end | are less than 100 TB. | | But it also has limits. Infamously Twitter delayed going to a | sharded architecture a bit too long, making it more of an ugly | migration. | manigandham wrote: | Server hardware is so cheap and fast today that 99% of | companies will never hit that limit in scale either. | AtNightWeCode wrote: | If you get your services right there is little or no | communications between the services since a microservice should | have all the data it needs in it's own store. | HeavyStorm wrote: | > they don't know how your distributed databases look, and | oftentimes they really do not care | | Nor should they. | markandrewj wrote: | Just FYI, you can have one big database, without running it on | one big server. As an example, databases like Cassandra are | designed to be scaled horizontally (i.e. scale out, instead of | scale up). | | https://cassandra.apache.org/_/cassandra-basics.html | 1500100900 wrote: | Cassandra may be great when you have to scale your database | that you no longer develop significantly. The problem with | this DB system is that you have to know all the queries | before you can define the schema. | threeseed wrote: | > The problem with this DB system is that you have to know | all the queries before you can define the schema | | Not true. | | You just need to optimise your schema if you want the best | performance. Exactly the same as an RDBMS. | mdasen wrote: | There are trade-offs when you scale horizontally even if a | database is designed for it. For example, DataStax's Storage | Attached Indexes or Cassandra's hidden-table secondary | indexing allow for indexing on columns that aren't part of | the clustering/partitioning, but when you're reading you're | going to have to ask all the nodes to look for something if | you aren't including a clustering/partitioning criteria to | narrow it down. | | You've now scaled out, but you now have to ask each node when | searching by secondary index. If you're asking every node for | your queries, you haven't really scaled horizontally. You've | just increased complexity. | | Now, maybe 95% of your queries can be handled with a | clustering key and you just need secondary indexes to handle | 5% of your stuff. In that case, Cassandra does offer an easy | way to handle that last 5%. However, it can be problematic if | people take shortcuts too much and you end up putting too | much load on the cluster. You're also putting your latency | for reads at the highest latency of all the machines in your | cluster. For example, if you have 100 machines in your | cluster with a mean response time of 2ms and a 99th | percentile response time of 150ms, you're potentially going | to be providing a bad experience to users waiting on that | last box on secondary index queries. | | This isn't to say that Cassandra isn't useful - Cassandra has | been making some good decisions to balance the problems | engineers face. However, it does come with trade-offs when | you distribute the data. When you have a well-defined | problem, it's a lot easier to design your data for efficient | querying and partitioning. When you're trying to figure | things out, the flexibility of a single machine and much | cheaper secondary index queries can be important - and if you | hit a massive scale, you figure out how you want to partition | it then. | markandrewj wrote: | Cassandra was just an example, but most databases can be | scaled either vertically or horizontally via sharding. You | are right if misconfigured performance can be hindered, but | this is also true for a database which is being scaled | vertically. Generally speaking you will get better | performance if you have a large dataset by growing | horizontally then you would by growing vertically. | | https://stackoverflow.blog/2022/03/14/how-sharding-a- | databas... | robertlagrant wrote: | > Your product asks will consistently want to combine these | data sources (they don't know how your distributed databases | look, and oftentimes they really do not care). | | I'm not sure how to parse this. What should "asks" be? | cfors wrote: | The feature requests (asks) that product wants to build - | sorry for the confusion there. | delecti wrote: | The phrase "Your product asks will consistently " can be de- | abbreviated to "product owners/product managers you work with | will consistently request". | wefarrell wrote: | "Your product asks will consistently want to combine these data | sources (they don't know how your distributed databases look, | and oftentimes they really do not care)." | | This isn't a problem if state is properly divided along the | proper business domain and the people who need to access the | data have access to it. In fact many use cases require it - | publicly traded companies can't let anyone in the organization | access financial info and healthcare companies can't let anyone | access patient data. And of course are performance concerns as | well if anyone in the organization can arbitrarily execute | queries on any of the organization's data. | | I would say YAGNI applies to data segregation as well and | separations shouldn't be introduced until they are necessary. | Mavvie wrote: | "combine these data sources" doesn't necessarily mean data | analytics. Just as an example, it could be something like | "show a badge if it's the user's birthday", which if you had | a separate microservice for birthdays would be much harder | than joining a new table. | wefarrell wrote: | Replace "people" with "features" and my comment still | holds. As software, features, and organizations become more | complex the core feature data becomes a smaller and smaller | proportion of the overall state and that's when | microservices and separate data stores become necessary. | lmm wrote: | If you do this then you'll have the hardest possible migration | when the time comes to split it up. It will take you literally | years, perhaps even a decade. | | Shard your datastore from day 1, get your dataflow right so | that you don't need atomicity, and it'll be painless and scale | effortlessly. More importantly, you won't be able to paper over | crappy dataflow. It's like using proper types in your code: | yes, it takes a bit more effort up-front compared to just | YOLOing everything, but it pays dividends pretty quickly. | riku_iki wrote: | > Shard your datastore from day 1 | | what about using something like cocroach from day 1? | lmm wrote: | I don't know the characteristics of bikesheddb's upstream | in detail (if there's ever a production-quality release of | bikesheddb I'll take another look), but in general using | something that can scale horizontally (like Cassandra or | Riak, or even - for all its downsides - MongoDB) is a great | approach - I guess it's a question of terminology whether | you call that "sharding" or not. Personally I prefer that | kind of datastore over an SQL database. | riku_iki wrote: | > over an SQL database | | it is actually distributed SQL Db with auto sharding, | their goal is to be SQL compatible with Postgres. | Rantenki wrote: | This is true IFF you get to the point where you have to split | up. | | I know we're all hot and bothered about getting our apps to | scale up to be the next unicorn, but most apps never need to | scale past the limit of a single very high-performance | database. For most people, this single huge DB is sufficient. | | Also, for many (maybe even most) applications, designated | outages for maintenance are not only acceptable, but industry | standard. Banks have had, and continue to have designated | outages all the time, usually on weekends when the impact is | reduced. | | Sure, what I just wrote is bad advice for mega-scale SaaS | offerings with millions of concurrent users, but most of us | aren't building those, as much as we would like to pretend | that we are. | | I will say that TWO of those servers, with some form of | synchronous replication, and point in time snapshots, are | probably a better choice, but that's hair-splitting. | | (and I am a dyed in the wool microservices, scale-out Amazon | WS fanboi). | lmm wrote: | > I know we're all hot and bothered about getting our apps | to scale up to be the next unicorn, but most apps never | need to scale past the limit of a single very high- | performance database. For most people, this single huge DB | is sufficient. | | True _if_ the reliability is good enough. I agree that many | organisations will never get to the scale where they need | it as a performance /data size measure, but you often will | grow past the reliability level that's possible to achieve | on a single node. And it's worth saying that the various | things that people do to mitigate these problems - read | replicas, WAL shipping, and all that - can have a pretty | high operational cost. Whereas if you just slap in a | horizontal autoscaling datastore with true master-master HA | from day 1, you bypass all of that trouble and just never | worry about it. | | > Also, for many (maybe even most) applications, designated | outages for maintenance are not only acceptable, but | industry standard. Banks have had, and continue to have | designated outages all the time, usually on weekends when | the impact is reduced. | | IME those are a minority of applications. Anything | consumer-facing, you absolutely do lose out (and even if | it's not a serious issue in itself, it makes you look bush- | league) if someone can't log into your system at 5AM on | Sunday. Even if you're B2B, if your clients are serving | customers then they want you to be online whenever their | customers are. | johnbellone wrote: | I agree with this sentiment but it is often misunderstood as a | means to force everything into a single database schema. More | people need to learn about logically separating schemas with | their database servers! | clairity wrote: | > "Use One Big Database." | | yah, this is something i learned when designing my first server | stack (using sun machines) for a real business back during the | dot-com boom/bust era. our single database server was the | beefiest machine by far in the stack, 5U in the rack (we also | had a hot backup), while the other servers were 1U or 2U in | size. most of that girth was for memory and disk space, with | decent but not the fastest processors. | | one big db server with a hot backup was our best tradeoff for | price, performance, and reliability. part of the mitigation was | that the other servers could be scaled horizontally to | compensate for a decent amount of growth without needing to | scale the db horizontally. | FpUser wrote: | >"Use One Big Database." | | I do, it is running on the same big (relatively) server as my | native C++ backend talking to the database. The performance | smokes your standard cloudy setup big time. Serving thousand | requests per second on 16 core without breaking sweat. I am all | for monoliths running on real no cloudy hardware. As long as | the business scale is reasonable and does not approach FAANG | (like for 90% of the businesses) this solution is superior to | everything else money, maintenance, development time wise. | BenoitEssiambre wrote: | I'm glad this is becoming conventional wisdom. I used to argue | this in these pages a few years ago and would get downvoted | below the posts telling people to split everything into | microservices separated by queues (although I suppose it's | making me lose my competitive advantage when everyone else is | building lean and mean infrastructure too). | | In my mind, reasons involve keeping transactional integrity, | ACID compliance, better error propagation, avoiding the | hundreds of impossible to solve roadblocks of distributed | systems (https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS- | TM-394...). | | But also it is about pushing the limits of what is physically | possible in computing. As Admiral Grace Hopper would point out | (https://www.youtube.com/watch?v=9eyFDBPk4Yw ) doing distance | over network wires involves hard latency constraints, not to | mention dealing with congestions over these wires. | | Physical efficiency is about keeping data close to where it's | processed. Monoliths can make much better use of L1, L2, L3, | and ram caches than distributed systems for speedups often in | the order of 100X to 1000X. | | Sure it's easier to throw more hardware at the problem with | distributed systems but the downsides are significant so be | sure you really need it. | | Now there is a corollary to using monoliths. Since you only | have one db, that db should be treated as somewhat sacred, you | want to avoid wasting resources inside it. This means being a | bit more careful about how you are storing things, using the | smallest data structures, normalizing when you can etc. This is | not to save disk, disk is cheap. This is to make efficient use | of L1,L2,L3 and ram. | | I've seen boolean true or false values saved as large JSON | documents. {"usersetting1": true, "usersetting2":fasle | "setting1name":"name" etc.} with 10 bits of data ending up as a | 1k JSON document. Avoid this! Storing documents means, the | keys, the full table schema is in every row. It has its uses | but if you can predefine your schema and use the smallest types | needed, you are gaining much performance mostly through much | higher cache efficiency! | Swizec wrote: | > I'm glad this is becoming conventional wisdom | | My hunch is that computers caught up. Back in the early | 2000's horizontal scaling was the only way. You simply | couldn't handle even reasonably mediocre loads on a single | machine. | | As computing becomes cheaper, horizontal scaling is starting | to look more and more like unnecessary complexity for even | surprisingly large/popular apps. | | I mean you can buy a consumer off-the-shelf machine with | 1.5TB of memory these days. 20 years ago, when microservices | started gaining popularity, 1.5TB RAM in a single machine was | basically unimaginable. | FpUser wrote: | >"I'm glad this is becoming conventional wisdom. " | | Yup, this is what I've always done and it works wonders. | Since I do not have bosses, just a clients I do not give a | flying fuck about latest fashion and do what actually makes | sense for me and said clients. | tsmarsh wrote: | 'over the wire' is less obvious than it used to be. | | If you're in k8s pod, those calls are really kernel calls. | Sure you're serializing and process switching where you could | be just making a method call, but we had to do something. | | I'm seeing less 'balls of mud' with microservices. Thats not | zero balls of mud. But its not a given for almost every code | base I wander into. | threeseed wrote: | > I'm glad this is becoming conventional wisdom | | It's not though. You're just seeing the most popular opinion | on HN. | | In reality it is nuanced like most real-world tech decisions | are. Some use cases necessitate a distributed or sharded | database, some work better with a single server and some are | simply going to outsource the problem to some vendor. | rbanffy wrote: | > Use One Big Database. | | I emphatically disagree. | | I've seen this evolve into tightly coupled microservices that | could be deployed independently in theory, but required | exquisite coordination to work. | | If you want them to be on a single server, that's fine, but | having multiple databases or schemas will help enforce | separation. | | And, if you need one single place for analytics, push changes | to that space asynchronously. | | Having said that, I've seen silly optimizations being employed | that make sense when you are Twitter, and to nobody else. Slice | services up to the point they still do something meaningful in | terms of the solution and avoid going any further. | marcosdumay wrote: | Yeah... Dividing your work into microservices while your data | is in an interdependent database doesn't lead to great | results. | | If you are creating microservices, you must segment them all | the way through. | zmmmmm wrote: | I have to say I disagree with this ... you can only | separate them if they are really, truly independent. Trying | to separate things that are actually coupled will quickly | take you on a path to hell. | | The problem here is that most of the microservice | architecture divisions are going to be driven by Conway's | law, not what makes any technical sense. So if you insist | on separate databases per microservice, you're at high risk | of ending up with massive amounts of duplicated and | incoherent state models and half the work of the team | devoted to synchronizing between them. | | I quite like an architecture where services are split | _except_ the database, which is considered a service of its | own. | Joeri wrote: | I have done both models. My previous job we had a monolith on | top of a 1200 table database. Now I work in an ecosystem of | 400 microservices, most with their own database. | | What it fundamentally boils down to is that your org chart | determines your architecture. We had a single team in charge | of the monolith, and it was ok, and then we wanted to add | teams and it broke down. On the microservices architecture, | we have many teams, which can work independently quite well, | until there is a big project that needs coordinated changes, | and then the fun starts. | | Like always there is no advice that is absolutely right. | Monoliths, microservices, function stores. One big server vs | kubernetes. Any of those things become the right answer in | the right context. | | Although I'm still in favor of starting with a modular | monolith and splitting off services when it becomes apparent | they need to change at a different pace from the main body. | That is right in most contexts I think. | zmmmmm wrote: | > splitting off services when it becomes apparent they need | to change at a different pace from the main body | | yes - this seems to get lost, but the microservice argument | is no different to the bigger picture software design in | general. When things change independently, separate and | decouple them. It works in code and so there is no reason | it shouldn't apply at the infrastructure layer. | | If I am responsible for the FooBar and need to update it | once a week and know I am not going to break the FroggleBot | or the Bazlibee which are run by separate teams who don't | care about my needs and update their code once a year, hell | yeah I want to develop and deploy it as a separate service. | manigandham wrote: | There's no need for "microservices" in the first place then. | That's just logical groupings of functionality that can be | separate as classes, namespaces or other modules without | being entirely separate processes with a network boundary. | danpalmer wrote: | To clarify the advice, at least how I believe it should be | done... | | Use One Big Database Server... | | ... and on it, use one software database per application. | | For example, one Postgres server can host many databases that | are mostly* independent from each other. Each application or | service should have its own database and be unaware of the | others, communicating with them via the services if | necessary. This makes splitting up into multiple database | servers fairly straightforward if needed later. In reality | most businesses will have a long tail of tiny databases that | can all be on the same server, with only bigger databases | needing dedicated resources. | | *you can have interdependencies when you're using deep | features sometimes, but in an application-first development | model I'd advise against this. | goodoldneon wrote: | OP mentioned joining, so they were definitely talking about | a single database | danpalmer wrote: | You can still do a ton of joining. | | I'd start with a monolith, that's a single app, single | database, single point of ownership of the data model, | and a ton of joins. | | Then as services are added after the monolith they can | still use the main database for ease of infra | development, simpler backups and replication, etc. but | those wouldn't be able to be joined because they're | cross-service. | [deleted] | riquito wrote: | Not suggesting it, but for the sake of knowledge you can | join tables living in different databases, as long as | they are on the same server (e.g. mysql, postgresql, SQL | server supports it - doesn't necessarily come for free) | yellowapple wrote: | In PostgreSQL's case, it doesn't even need to be the same | server: https://www.postgresql.org/docs/current/postgres- | fdw.html | giardini wrote: | _" >Use One Big Database Server... | | ... and on it, use one software database per | application.<"_ | | FWIW that is how it is usually is done(and has been done | for decades) on mainframes (IBM & UNISYS). | | ----------------------- | | _" Plus ca change, plus c'est la meme chose."_ | | English: _" the more things change, the more they stay the | same."_ | | - old French expression. | ryanisnan wrote: | Definitely use a big database, until you can't. My advice to | anyone starting with a relational data store is to use a proxy | from day 1 (or some point before adding something like that | becomes scary). | | When you need to start sharding your database, having a proxy | is like having a super power. | chromatin wrote: | Are there postgres proxies that can specifically facilitate | sharding / partitioning later? | _ben_ wrote: | Disclaimer: I am the founder of PolyScale [1]. | | We see both use cases: single large database vs multiple | small, decoupled. I agree with the sentiment that a large | database offer simplicity, until access patterns change. | | We focus on distributing database data to the edge using | caching. Typically this eliminates read-replicas and a lot of | the headache that goes with app logic rewrites or scaling | "One Big Database". | | [1] https://www.polyscale.ai/ | bartread wrote: | Not to mention, backups, restores, and disaster recovery are so | much easier with One Big Database(tm). | 1500100900 wrote: | How is backup restoration any easier if your whole PostgreSQL | cluster goes back in time when you only wanted to rewind that | one tenant? | fleddr wrote: | Your scenario is data recovery, not backup restoration. | Wildly different things. | wizofaus wrote: | Surely having separate DBs all sit on the One Big Server is | preferable in many cases. For cases where you really to extract | large amounts of data that is derived from multiple DBs, | there's no real harm in having some cross-DB joins defined in | views somewhere. If there are sensible logical ways to break a | monolithic service into component stand-alone services, and | good business reasons to do (or it's already been designed that | way), then having each talk to their own DB on a shared server | should be able to scale pretty well. | abraae wrote: | Another area for consolidation is auth. Use one giant keycloak, | with individual realms for every one of the individual apps you | are running. Your keycloak is back ended by your one giant | database. | doctor_eval wrote: | I agree that 1BDB is a good idea, but having one ginormous | schema has its own costs. So I still think data should be | logically partitioned between applications/microservices - in | PG terms, one "cluster" but multiple "databases". | | We solved the problem of collecting data from the various | databases for end users by having a GraphQL layer which could | integrate all the data sources. This turned out to be | absolutely awesome. You could also do something similar using | FDW. The effort was not significant relative to the size of the | application. | | The benefits of this architecture were manifold but one of the | main ones is that it reduces the complexity of each individual | database, which dramatically improved performance, and we knew | that if we needed more performance we could pull those | individual databases out into their own machine. | throwaway894345 wrote: | I'm pretty happy to pay a cloud provider to deal with managing | databases and hosts. It doesn't seem to cause me much grief, | and maybe I could do it better but my time is worth more than | our RDS bill. I can always come back and Do It Myself if I run | out of more valuable things to work on. | | Similarly, paying for EKS or GKE or the higher-level container | offerings seems like a much better place to spend my resources | than figuring out how to run infrastructure on bare VMs. | | Every time I've seen a normal-sized firm running on VMs, they | have one team who is responsible for managing the VMs, and | _either_ that team is expecting a Docker image artifact or they | 're expecting to manage the environment in which the | application runs (making sure all of the application | dependencies are installed in the environment, etc) which | typically implies a lot of coordination between the ops team | and the application teams (especially regarding deployment). | I've never seen that work as smoothly as deploying to | ECS/EKS/whatever and letting the ops team work on automating | things at a higher level of abstraction (automatic certificate | rotation, automatic DNS, etc). | | That said, I've never tried the "one big server" approach, | although I wouldn't want to run fewer than 3 replicas, and I | would want reproducibility so I know I can stand up the exact | same thing if one of the replicas go down as well as for | higher-fidelity testing in lower environments. And since we | have that kind of reproducibility, there's no significant | difference in operational work between running fewer larger | servers and more smaller servers. | cogman10 wrote: | > Use One Big Database. | | > Seriously. If you are a backend engineer, nothing is worse | than breaking up your data into self contained service | databases, where everything is passed over Rest/RPC. Your | product asks will consistently want to combine these data | sources (they don't know how your distributed databases look, | and oftentimes they really do not care). | | This works until it doesn't and then you land in the position | my company finds itself in where our databases can't handle the | load we generate. We can't get bigger or faster hardware | because we are using the biggest and fastest hardware you can | buy. | | Distributed systems suck, sure, and they make querying cross | systems a nightmare. However, by giving those aspects up, what | you gain is the ability to add new services, features, etc | without running into scotty yelling "She can't take much more | of it!" | | Once you get to that point, it becomes SUPER hard to start | splitting things out. All the sudden you have 10000 "just a one | off" queries against several domains that are broken by trying | carve out a domain into a single owner. | Flow wrote: | Do you have Spectre countermeasures active in the kernel of | that machine? | runjake wrote: | What does it matter, in this context? | | If it's about bare metal vs. virtual machines, know that | Spectre affects virtual machines, too. | rkagerer wrote: | I think they are implying disabling them (if on) could | squeeze you out a bit more performance. | kedean wrote: | Many databases can be distributed horizontally if you put in | the extra work, would that not solve the problems you're | describing? MariaDB supports at least two forms of | replication (one master/replica and one multi-master), for | example, and if you're willing to shell out for a MaxScale | license it's a breeze to load balance it and have automatic | failover. | hot_gril wrote: | Not without big compromises and a lot of extra work. If you | want a truly horizontally scaling database, and not just | multi-master for the purpose of availability, a good | example solution is Spanner. You have to lay your data out | differently, you're very restricted in what kinds of | queries you can make, etc. | kbenson wrote: | For what it's worth, I think distributing horizontally is | also much easier if you're already limited your database to | specific concerns by splitting it up in different ways. | Sharding a very large database with lots of data deeply | linked sounds like much more of a pain than something with | a limited scope that isn't too deeply linked with data | because it's already in other databases. | | To some degree, sharding brings in a lot of the same | complexities as different microservices with their own data | store, in that you sometimes have to query across multiple | sources and combine in the client. | throwaway9870 wrote: | How do you use one big database when some of your info is stuck | in an ERP system? | marcosdumay wrote: | > Use One Big Database | | Yep, with a passive replica or online (log) backup. | | Keeping things centralized can reduce your hardware requirement | by multiple orders of magnitude. The one huge exception is a | traditional web service, those scale very well, so you may not | even want to get big servers for them (until you need them). | Closi wrote: | Breaking apart a stateless microservice and then basing it | around a giant single monolithic database is pretty pointless - | at that stage you might as well just build a monolith and get | on with it as every microservice is tightly coupled to the db. | adrianmsmith wrote: | That's true, unless you need | | (1) Different programming languages e.g. you're written your | app in Java but now you need to do something for which the | perfect Python library is available. | | (2) Different parts of your software need different types of | hardware. Maybe one part needs a huge amount of RAM for a | cache, but other parts are just a web server. It'd be a shame | to have to buy huge amounts of RAM for every server. | Splitting the software up and deploying the different parts | on different machines can be a win here. | | I reckon the average startup doesn't need any of that, not | suggesting that monoliths aren't the way to go 90% of the | time. But if you do need these things, you can still go the | microservices route, but it still makes sense to stick to a | single database if at all possible, for consistency and | easier JOINs for ad-hoc queries, etc. | Closi wrote: | These are both true - but neither requires service- | oriented-architecture. | | You can split up your applicaiton into chunks that are | deployed on seperate hardware, and use different languages, | without composing your whole architecture into | microservices. | | A monolith can still have a seperate database server and a | web server, or even many different functions split across | different servers which are horizontally scalable, and be | written in both java and python. | | Monoliths have had seperate database servers since the 80s | (and probably before that!). In fact, part of these | applications defining characteristics at the enterprise | level is that they often shared one big central database, | as often they were composed of lots of small applications | that would all make changes to the central database, which | would often end up in a right mess of software that was | incredibly hard to de-pick! (And all the software writing | to that database would, as you described, be written in | lots of different languages). People would then come along | and cake these central databases full of stored procedures | to make magic changes to implement functionality that | wasn't available in the legacy applications that they can't | change because of the risk and then you have even more of a | mess! | AtNightWeCode wrote: | Agree. Nothing worse than having different programs changing | data in the same database. The database should not be an | integration point between services. | jethro_tell wrote: | if you have multiple micro services updating the database | you need to have a database access layer service as well. | | there's some real value with abstraction and microservices | but you can try to run them against a monolithic database | service | bergkvist wrote: | No amount of abstraction is going to save you from the | problem of 2 processes manipulating the same state | machine. | noduerme wrote: | I disagree. Suppose you have an enormous DB that's mainly | written to by workers inside a company, but has to be widely | read by the public outside. You want your internal services | on machines with extra layers of security, perhaps only | accessible by VPN. Your external facing microservices have | other things like e.g. user authentication (which may be tied | to a different monolithic database), and you want to put them | closer to users, spread out in various data centers or on the | edge. Even if they're all bound to one database, there's a | lot to recommend keeping them on separate, light cheap | servers that are built for http traffic and occasional DB | reads. And even more so if those services do a lot of | processing on the data that's accessed, such as building up | reports, etc. | Closi wrote: | You've not really built microservices then in the purest | sense though - i.e. all the microservices aren't | independently deployable components. | | I'm not saying what you are proposing isn't a perfectly | valid architectural approach - it's just usually considered | an anti-pattern with microservices (because if all the | services depend on a single monolith, and a change to a | microservice functionality also mandates a change to the | shared monolith which then can impact/break the other | services, we have lost the 'independence' benefit that | microservices supposedly gives us where changes to one | microservice does not impact another). | | Monoliths can still have layers to support business logic | that are seperate to the database anyway. | roflyear wrote: | Absolutely. I know someone who considers "different domains" | (as in web domains) to count as a microservice! | | What is the point of that? it doesn't add anything. Just more | shit to remember and get right (and get wrong!) | manigandham wrote: | Why would you break apart a microservice? Any why do you need | to use/split into microservices anyway? | | 99% of apps are best fit as monolithic apps _and_ databases | and should focus on business value rather than scale they 'll | never see. | Gigachad wrote: | Where I work we are looking at it because we are starting | to exceed the capabilities of one big database. Several | tables are reaching the billions of rows mark and just | plain inserts are starting to become too much. | nicoburns wrote: | Yeah, the at the billions of rows mark it definitely | makes sense to start looking at splitting things up. On | the other hand, the company I worked for split things up | from the start, and when I joined - 4 years down the line | - their biggest table had something like 50k rows, but | their query performance was awful (tens of seconds in | cases) because the data was so spread out. | threeseed wrote: | > 99% of apps are best fit as monolithic apps and databases | and should focus on business value rather than scale | they'll never see | | You incorrectly assume that 99% of apps are building these | architectures for scalability reasons. | | When in reality it's far more for development productivity, | security, use of third party services, different languages | etc. | jethro_tell wrote: | reliability, sometimes sharding just means you don't have | to get up in the middle of the night. | Closi wrote: | Totally agree. | | I guess I just don't see the value in having a monolith | made up of microservices - you might as well just build a | monolith if you are going down that route. | | And if your application fits the microservices pattern | better, then you might as well go down the microservices | pattern properly and not give them a big central DB. | adgjlsfhk1 wrote: | The one advantage of microservice on a single database | model is that it lets you test the independent components | much more easily while avoiding the complexity of | database sharding. | [deleted] | radu_floricica wrote: | To note that quite a bit of the performance problems come | when writing stuff. You can get away with A LOT if you accept | 1. the current service doesn't do (much) writing and 2. it | can live with slightly old data. Which I think covers 90% of | use cases. | | So you can end up with those services living on separate | machines and connecting to read only db replicas, for | virtually limitless scalability. And when it realizes it | needs to do an update, it either switches the db connection | to a master, or it forwards the whole request to another | instance connected to a master db. | cfors wrote: | No disagreement here. I love a good monolith. | Guid_NewGuid wrote: | I think a strong test a lot of "let's use Google scale | architecture for our MVP" advocates fail is: can your | architecture support a performant paginated list with dynamic | sort, filter and search where eventual consistency isn't | acceptable? | | Pretty much every CRUD app needs this at some point and if | every join needs a network call your app is going to suck to | use and suck to develop. | SkyPuncher wrote: | > Pretty much every CRUD app needs this at some point and if | every join needs a network call your app is going to suck to | use and suck to develop. | | _at some point_ is the key word here. | | Most startups (and businesses) can likely get away with this | well into Series A or Series B territory. | threeseed wrote: | > if every join needs a network call your app is going to | suck to use and suck to develop. | | And yet developers do this every single day without any | issue. | | It is bad practice to have your authentication database be | the same as your app database. Or you have data coming from | SaaS products, third party APIs or a cloud service. Or even | simply another service in your stack. And with complex | schemas often it's far easier to do that join in your | application layer. | | All of these require a network call and join. | mhoad wrote: | I've found the following resource invaluable for designing | and creating "cloud native" APIs where I can tackle that kind | of thing from the very start without a huge amount of hassle | https://google.aip.dev/general | | The patterns section covers all of this and more | gnat wrote: | This is a great resource but the RFC-style documentation | says what you SHOULD and MUST do, not HOW to do it ... | lmm wrote: | I don't believe you. Eventual consistency is how the real | world works, what possible use case is there where it | wouldn't be acceptable? Even if you somehow made the display | widget part of the database, you can't make the reader's | eyeballs ACID-compliant. | skyde wrote: | thanks a lot for this comment. I will borrow this as an | interview question :) | cdkmoose wrote: | >>(they don't know how your distributed databases look, and | oftentimes they really do not care) | | Nor should they, it's the engineer's/team's job to provide the | database layer to them with high levels of service without them | having to know the details | z3t4 wrote: | The rule is: Keep related data together. Exceptions are: | Different customers (usually don't require each others data) | can be isolated. And if the database become the bottleneck you | can separate unrelated services. | bebrws wrote: | Someone call Brahm | notacoward wrote: | At various points in my career, I worked on Very Big Machines and | on Swarms Of Tiny Machines (relative to the technology of their | respective times). Both kind of sucked. Different reasons, but | sucked nonetheless. I've come to believe that the best approach | is generally somewhere in the middle - enough servers to ensure a | sufficient level of protection against failure, _but no more_ to | minimize coordination costs and data movement. Even then there | are exceptions. The key is _don 't run blindly toward the | extremes_. Your utility function is probably bell shaped, so you | need to build at least a rudimentary model to explore the problem | space and find the right balance. | mamcx wrote: | Yes, totally. | | Among the setups the one that I think is _the golden_ is BIG Db | Server, 1-4 front-end(web /api/cache) servers. Off-hand the | backups and CDN. | | That is. | rcarmo wrote: | I once fired up an Azure instance with 4TB of RAM and hundreds of | cores for a performance benchmark. | | htop felt incredibly roomy, and I couldn't help thin how my three | previous projects would fit in with room to spare (albeit lacking | redundancy, of course). | gregmac wrote: | > However, cloud providers have often had global outages in the | past, and there is no reason to assume that cloud datacenters | will be down any less often than your individual servers. | | A nice thing about being in a big provider is when they go down a | massive portion of the internet goes down, and it makes news | headlines. Users are much less likely to complain about _your_ | service being down when it 's clear you're just caught up in the | global outage that's affecting 10 other things they use. | arwhatever wrote: | When migrating from [no-name CRM] to [big-name CRM] at a recent | job, the manager pointed out that when [big-name CRM] goes | down, it's in the Wall Street Journal, and when [no-name] goes | down, it's hard to get their own Support Team to care! | ramesh31 wrote: | Nobody ever got fired for buying IBM! | notjustanymike wrote: | We may need to update this one, I would definitely fire | someone today for buying IBM. | kkielhofner wrote: | Nobody ever got fired for buying AWS! | lanstin wrote: | The AWS people now are just like the IBM people in the | 80s - mastering a complex and not standards based array | of products and optional product add-ons. The internet | solutions were open and free for a few decades and now | it's AWS SNADS I mean AWS load balancers and edge | networks. | namose wrote: | AWS services are usually based on standards anyway. If | you use an architecturally sound approach to AWS you | could learn to develop for GCP or Azure pretty easily. | riku_iki wrote: | that's funny, since IBM is actually promoting one very fat | and reliable server. | dtparr wrote: | These days we just call it licensing Red Hat. | ustolemyname wrote: | This has given me a brilliant idea: deferring maintenance | downtime until some larger user-visible service is down. | | This is terrible for many reasons, but I wouldn't be surprised | to hear someone has done this. | gorjusborg wrote: | Ah yes, the 'who cut the cheese?' maintenance window. | pdpi wrote: | Another advantage is that the third-party services you depend | on are also likely to be on one of the big providers, so it's | one less point of failure. | hsn915 wrote: | No. Your users have no idea that you rely on AWS (they don't | even know what it is), and they don't think of it as a valid or | reasonable excuse as to why your service is down. | andrepew wrote: | This is a huge one -- value in outsourcing blame. If you're | down because of a major provider outage in the news, you're | viewed more as a victim of a natural disaster rather than | someone to be blamed. | oceanplexian wrote: | I hear this repeated so many times at my workplace, and it's | so totally and completely uninformed. | | Customers who have invested millions of dollars into making | their stack multi-region, multi-cloud, or multi-datacenter | aren't going to calmly accept the excuse that "AWS Went Down" | when you can't deliver the services you contractually agreed | to deliver. There are industries out there where having your | service casually go down a few times a year is totally | unacceptable (Healthcare, Government, Finance, etc). I worked | adjacent to a department that did online retail a while ago | and even an hour of outage would lose us $1M+ in business. | darkr wrote: | > Customers who have invested millions of dollars > ... > | an hour of outage would lose us $1M+ in business | | Given (excluding us-east-1) you're looking at maybe an hour | a year on average of regional outage, sounds like best case | break even on that investment? | oceanplexian wrote: | I'm going to say that an hour a year is wildly | optimistic. But even then, that puts you at 4 nines | (99.99%) which is comparatively awful, consider that an | old fashioned telephone using technology from the 1970s | will achieve on average, 5 9's of reliability, or 5.26 | minutes of downtime per year, and that most IT shops | operating their own infrastructure contractually expect 5 | 9's from even fairly average datacenters and transit | providers. | nicoburns wrote: | I was amused when I joined my current company to find | that our contracts only stipulate one 9 of reliability | (98%). So ~30 mins a day or ~14 hours a month is | permissible. | rapind wrote: | I wonder if the aggregate outage time from misconfigured | and over-architected high availability services is greater | than the average AWS outage per year. | | Similar to security, the last few 9s of availability come | at a heavily increasing (log) complexity / price. The | cutoff will vary case by case, and I'm sure the decision on | how many 9s you need is often irrational (CEO says it can | never go down! People need their pet food delivered on | time!). | mahidhar wrote: | Agreed. Recently I was discussing the same point with a non- | technical friend who was explaining that his CTO had decided | to move from Digital Ocean to AWS, after DO experienced some | outage. Apparently the CEO is furious at him and has assumed | that DO are the worst service provider because their services | were down for almost an entire business day. The CTO probably | knows that AWS could also fail in a similar fashion, but by | moving to AWS it becomes more or less an Act of God type of | situation and he can wash his hands of it. | tjoff wrote: | This seems like a recently popular exaggeration, I'd wager no | one but a select few in the HN-bubble actually cares. | | You will primarily be judged by how much of an inconvenience | the outage was to every individual. | | The best you can hope for is that the local ISP gets the | blame, but honestly. It can't be more than a rounding error | in the end. | treis wrote: | I think it's more of a shield against upper management. AWS | going down is treated like an act of god rendering everyone | blameless. But if it's your one big server that goes down | then it's your fault. | phkahler wrote: | >> AWS going down is treated like an act of god rendering | everyone blameless. | | Someone decided to use AWS, so there is blame to go | around. I'm not saying if that blame is warranted or not, | just that it sounds like a valid thing to say for people | who want to blame someone. | flatiron wrote: | "Nobody gets fired for using aws" is pretty big now a | days. We use GCP but if they have an issue and it bubbles | down to me nobody bats an eye when I say the magical | cloud man made ut oh whoopsie and it wasn't me. | sebzim4500 wrote: | I doubt anyone has ever been fired for choosing AWS. I | know for a fact that people have been fired after | deciding to do it on bare metal and then it didn't work | very well. | jasonlotito wrote: | "I think it's more of a shield against upper management." | | "Someone decided to use AWS, so there is blame to go | around." | | Upper management. | ozim wrote: | So it does not really work in B2B. | | I don't really have much to do with contracts - but my | company is stating that we have up time of 99.xx%. | | In terms of contract customers don't care if I have Azure/AWS | or I keep my server in the box under the stairs. Yes they do | due diligence and would not buy my services if I keep it in | shoe box. | | But then if they loose business they come to me .. I can go | after Azure/AWS but I am so small they will throw some free | credits and me and tell to go off. | | Maybe if you are in B2C area then yeah - your customers will | probably shrug and say it was M$ or Amazon if you write sad | blog post with excuses. | zerkten wrote: | It's going to depend on the penalties for being | unavailable. Small B2B customers are very different from | enterprise B2B customers too, so you ultimately have to | build for your context. | | If you have to give service credits to customers then with | "one box" you have to give 100% of customers a credit. If | your services are partitioned across two "shards" then one | of those shards can go down, but your credits are only paid | out at 50%. | | Getting to this place doesn't prevent a 100% outage and it | imposes complexity. This kind of design can be planned for | enterprise B2B apps when the team are experienced with | enterprise clients. Many B2B SaaS are tech folk with zero | enterprise experience, so they have no idea of relatively | simple things that can be done to enable a shift to this | architecture. | | Enterprise customers do care where things are hosted. They | very likely have some users in the EU, or other locations, | which care more about data protection and sovereignty than | the average US organization. Since they are used to hosting | on-prem and doing their own due diligence they will often | have preferences over hosting. In industries like | healthcare, you can find out what the hosting preferences | are, as well as understand how the public clouds are | addressing them. While not viewed as applicable by many on | HN due to the focus on B2C and smaller B2B here, this is | the kind of thing that can put a worse product ahead in the | enterprise scenario. | HWR_14 wrote: | Because you have a vendor/customer relationship. The big | thing for AWS is employer/employee relationships. If you | were a larger company, and AWS goes down, who blames you? | Who blames anyone in the company? At the C-level, does the | CEO expect more uptime than _Amazon_? Of course not. And so | it goes. | | Whereas if you do something other than the industry | standard of AWS (or Azure/GCP) and it goes down, clearly | it's _your fault_. | andrepew wrote: | Depends on scale of B2B. Between enterprises, not as much. | Between small businesses, works very well (at least in my | experience, we are tiny B2B). | lanstin wrote: | It really varies a lot. I have seen very large lazy sites | suddenly pick up a client that wanted RCA for each bad | transaction, and suddenly get religion quickly (well | quickly as a large org can). Those are precious clients | because they force investment into useful directions of | availability instead of just new features. | travisgriggs wrote: | "Value in outsourcing blame" | | The real reason that talented engineers secretly support all | of the middle management we vocally complain about. | ocdtrekkie wrote: | I find this entire attitude disappointing. Engineering has | moved from "provide the best reliability" to "provide the | reliability we won't get blamed for the failure of". Folks | who have this attitude missed out on the dang ethics course | their college was teaching. | | If rolling your own is faster, cheaper, and more reliable (it | is), then the only justification for cloud is assigning | blame. But you know what you also don't get? Accolades. | | I throw a little party of one here when Office 365 or Azure | or AWS or whatever Google calls it's cloud products this week | is down but all our staff are able to work without issue. =) | jeroenhd wrote: | If you work in B2B you can put the blame on Amazon and your | customers will ask "understandable, take the necessary steps | to make sure it doesn't happen again". AWS going down isn't | an act of God, it's something you should've planned for, | especially if it happened before. | nrmitchi wrote: | There is also the consideration that this isn't even an | argument of "other things are down too!" or "outsourcing blame" | as much as, depending on what your service is of course, you | are unlikely to be operating in a bubble. You likely have some | form of external dependencies, or you are an external | dependency, or have correlated/cross-dependency usage with | another service. | | Guaranteeing isolation between all of these different moving | parts is _very difficult_. Even if you 're not directly | affected by a large cloud outage, it's becoming less-and-less | common that you, or your customers, are truely isolated. | | As well, if your AWS-hosted service mostly exists to service | AWS-hosted customers, and AWS is down, it doesn't matter if you | are down. None of your customers are operational anyways. Is | this a 100% acceptable solution? Of course not. But for 95% of | services/SaaS out there, it really doesn't matter. | [deleted] | taylodl wrote: | Users are much more sympathetic to outages when they're | widespread. But, if there's a contractual SLA then their | sympathy doesn't matter. You have to meet your SLA. That | usually isn't a big problem as SLAs tend to account for some | amount of downtime, but it's important to keep the SLA in mind. | hans1729 wrote: | This just holds when you are b2b. If you're serving end | users, they don't care about the contract, they care about | their UX. | z3t4 wrote: | You also have to calculate in the complexity of running | thousands of servers vs running just one server. If you run | just one server it's unlikely to go down even once in it's | lifetime. Meanwhile cloud providers are guaranteed to have | outages due to the share complexity of managing thousands of | servers. | bilekas wrote: | I can't tell if this is a good thing or a bad thing though! | | Imagine the clout of saying : "we stayed online while AWS died" | dghlsakjg wrote: | Depends on how technical your customer base is. Even as a | developer I would tend not to ascribe too much signal to that | message. All it tells me is that you don't use AWS. | | "We stayed online when GCP, AWS, and Azure go down" is a | different story. On the other hand, if those three go down | simultaneously, I suspect the state of the world will be such | that I'm not worried about the internet. | lanstin wrote: | I would expect there are BGP issues that could do that, at | least for large swaths of the internet. | [deleted] | namose wrote: | I do also remember in one of the recent AWS outages, the | google cloud compute service had lower availability due to | failovers hitting all at once | Nextgrid wrote: | HN implicitly gets this clout - it became the _real_ status | page of most of the internet. | cal85 wrote: | > In comparison, buying servers takes about 8 months to break | even compared to using cloud servers, and 30 months to break even | compared to renting. | | Can anyone help me understand why the cloud/renting is still this | expensive? I'm not familiar with this area, but it seems to me | that big data centers must have some pretty big cost-saving | advantages (maintenance? heat management?). And there are several | major providers all competing in a thriving marketplace, so I | would expect that to drive the cost down. How can it still be so | much cheaper to run your own on-prem server? | WJW wrote: | Several points: | | - The price for on-prem conveniently omits costs for power, | cooling, networking, insurance and building space, it's only | the purchase price. | | - The price for the cloud server includes (your share of) the | costs of replacing a broken power supply or hard drive, which | is not included in the list price for on-prem. You will have to | make sure enough of your devs know how to do that or else hire | a few sysadmin types. | | - As the article already mentions, the cloud has to provision | for peak usage instead of average usage. If you buy an on-prem | server you always have the same amount of computing power | available and can't scale up quickly if you need 5x the | capacity because of a big event. That kind of flexibility costs | money. | cal85 wrote: | Thank you, that explains it. | zucker42 wrote: | Not included in the break even calculation was the cost of | colocation, or the cost of hiring someone to make sure the | computer is in working order, or the less hassle upon hardware | failures. | | Also, as the author even mention in an article, a modern server | basically obsoletes a 10 year old server. So you're going to | have to replace your server at least every 10 years. So the | break even in the case of renting makes sense when you consider | that the server depreciates really quickly. | manigandham wrote: | You're paying a premium for _flexibility_. If you don 't need | that then there are far cheaper options like some managed | hosting from your local datacenter. | klysm wrote: | The huge capital required to get a data center with those cost | savings serves as a nice moat to let people price things high. | marcosdumay wrote: | Renting is not very expensive. 30 months is a large share of a | computer's lifetime, and you are paying for space, electricity, | and internet access too. | merb wrote: | > If you compare to the OVHCloud rental price for the same | server, the price premium of buying your compute through AWS | lambda is a factor of 25 | | and there is a factor of 25 that ovh is not a company where you | should rent servers: | | https://www.google.com/search?q=ovh+fire | siliconc0w wrote: | One thing to keep in mind is separation. The prod environment | should be completely separated from the dev ones (plural, it | should be cheap/fast to spin up dev environments). Access to | production data should be limited to those that need it (ideally | for just the time they need it). Teams should be able to deploy | their app separately and not have to share dependencies (i.e | operating system libraries) and it should be possible to test OS | upgrades (containers do not make you immune from this). It's | _kinda_ possible to sort of do this with 'one big server' but | then you're running your own virtualized infrastructure which has | it's own costs/pains. | | Definitely also don't recommend one big database, as that becomes | a hairball quickly - it's possible to have several logical | databases for one physical 'database 'server' though. | lordleft wrote: | Interesting write-up that acknowledges the benefits of cloud | computing while starkly demonstrating the value proposition of | just one powerful, on-prem server. If it's accurate, I think a | lot of people are underestimating the mark-up cloud providers | charge for their services. | | I think one of the major issues I have with moving to the cloud | is a loss of sysadmin knowledge. The more locked in you become to | the cloud, the more that knowledge atrophies within your | organization. Which might be worth it to be nimble, but it's a | vulnerability. | phpisthebest wrote: | Given that AWS holds up the entire Amazon Company, and is a | large part of Bezo's personal wealth, I think the market up is | pretty good. | evilotto wrote: | Many people will respond that "one big server" is a massive | single point of failure, but in doing so they miss that it is | also a single point of success. If you have a distributed system, | you have to test and monitor lots of different failure scenarios. | With a SPOS, you only have one thing to monitor. For a lot of | cases the reliability of that SPOS is plenty. | | Bonus: Just move it to the cloud, because AWS is definitely not | its own SPOF and it never goes down taking half the internet with | it. | MrStonedOne wrote: | /tg/station, the largest open source multiplayer video game on | github, gets cloudheads trying to help us "modernize" the game | server for the cloud all the time. | | Here's how that breaks down: | | The servers (sorry, i mean compute) cost the same (before | bandwidth, more on that at the bottom) to host one game server as | we pay (amortized) per game server to host 5 game servers on a | rented dedicated server. ($175/month for the rented server with | 64gb of ram and a 10gbit uplink) | | They run twice as slow because high core count slow clock speed | servers aren't all they are cracked up to be, and our game engine | is single threaded, but even if it wasn't, there is an overhead | to multithreading things which combined with most high core count | servers also having slow clock speed, rarely squares out to an | actual increase in real world performance. | | You can get the high clock speed units, they are twice to three | times as expensive. And still run 20% slower over windows vms on | rented bare metal because the sad fact is enterprise cpus by | either intel or amd have slower clock speeds and single threaded | performance then their gaming cpu counterparts, and getting | gaming cpus for rented servers is piss easy, but next to | impossible for cloud servers. | | Each game server uses 2tb of bandwidth to host 70 player high | pops. This works with 5 servers on 1 machine because our hosting | provider gives us 15tb of bandwidth included in the price of the | server. | | Well now the cloud bill just got a new 0. 10 to 30x more | expensive once you remember to price in bandwidth isn't looking | too great. | | "but it would make it cheaper for small downstreams to start out" | until another youtuber mentions our tiny game, and every game | server is hitting the 120 hard pop cap, and a bunch of | downstreams get a surprise 4 digit bill for what would normally | run 2 digits. | | The take away from this being that even adding in docker or k8s | deployment support to the game server is seen as creating the | risk some kid bankrupts themselves trying to host a game server | of their favorite game off their mcdonalds paycheck, and we tell | such tech "pros" to sod off with their trendy money wasters. | mwcampbell wrote: | > $175/month for the rented server with 64gb of ram and a | 10gbit uplink) | | Wow, what provider is that? | corford wrote: | Hetzner's PX line offers 64GB ECC RAM, Xeon CPU, dual 1TB | NVME for < $100/month. A dedicated 10Gbit b/w link (plus | 10Gbit NIC) is then an extra ~$40/month on top (incls. | 20TB/month traffic, with overage billed at $1/TB). | twblalock wrote: | YetAnotherNick wrote: | This post raises small issues like reliability, but missed lot of | much bigger issues like testing, upgrades, reproducibility, | backups and even deployments. Also, the author is comparing on | demand pricing, which to me doesn't make sense if you are paying | for the server with reserved pricing. Still I agree there would | be a difference of 2-3x(unless your price is dominated by AWS | egress fees), but most server with fixed workload, even for very | popular but simple sites, it could be done in $1k/month in cloud, | less than 10% of one developer salary. For non fixed workload | like ML training, you would anyways need some cloudy setup. | softfalcon wrote: | So... I guess these folks haven't heard of latency before? Fairly | sure you have to have "one big server" in every country if you do | this. I feel like that would get rather costly compared to | geographically distributed cloud services long term. | gostsamo wrote: | The article explicitly mentiones CDN as something that you can | outsource and also notes that the market there is competitive | and the prices are low. | Nextgrid wrote: | As opposed, to "many small servers" in every country? The vast | majority of startups out there run out of a single AWS region | with a CDN caching read-only content. You can apply the same | CDN approach to a bare-metal server. | softfalcon wrote: | Yeah, but if I'm a startup and running only a small server, | the cloud hosting costs are minimal. I'm not sure how you | think it's cheaper to host tiny servers in lots of countries | and pay someone to manage that for you. You'll need IT in | every one of those locations to handle the service of your | "small servers". | | I run services globally for my company, there is no way we | could do it. The fact that we just deploy containers to k8s | all over the world works very well for us. | | Before you give me the "oh k8s, well you don't know bare | metal" please note that I'm an old hat that has done the | legacy C# ASP.NET IIS workflows on bare metal for a long | time. I have learned and migrated to k8s on AWS/GCloud and it | is a huge improvement compared to what I used to deal with. | | Lastly, as for your CDN discussion, we don't just host CDN's | globally. We also host geo-located DB + k8s pods. Our service | uses web sockets and latency is a real issue. We can't have | 500 ms ping if we want to live update our client. We choose | to host locally (in what is usually NOT a small server) so we | get optimal ping for the live-interaction portion of our | services that are used by millions of people every day. | Nextgrid wrote: | > the cloud hosting costs are minimal | | Disagreed. The cloud equivalent of a small server is still | a few hundred bucks a month + bandwidth. Sure, it's still a | relatively small cost but you're still overpaying | significantly over the Hetzner equivalent which will be | sub-$100. | | > pay someone to manage that for you | | The same guy that manages your AWS can do this. Having | bare-metal servers doesn't mean renting colo space and | having people on-site - you can get them from | Hetzner/OVH/etc and they will manage all the hardware for | you. | | > The fact that we just deploy containers to k8s all over | the world works very well for us. | | It's great that it works well for you and I am in no way | suggesting you should change, but I wouldn't say it would | apply to everyone - the cloud adds significant costs with | regards to bandwidth alone and makes some services outright | impossible with that pricing model. | | > We also host geo-located DB | | That's a complex use-case that's not representative of most | early/small SaaS which are just a CRUD app backed by a DB. | If your business case requires distributed databases and | you've already done the work, great - but a lot of services | don't need that (at least not yet) and can do just fine | with a single big DB server + application server and good | backups, and that will be dirt-cheap on bare-metal. | nostrebored wrote: | Claiming that Hetzner is equivalent is fallacious. The | offerings are completely different. | | Agreed on networking though! | Nextgrid wrote: | In context of a "small server", I think they are | equivalent. AWS gives you a lot more functionality but | you're unlikely to be using any of it if you're just | running a single small "pet" server. | kkielhofner wrote: | You don't need IT in every location or even different | hosting facility contracts. Most colo hosting companies | have multiple regions. From the 800lb gorilla (Equinix): | | https://www.equinix.com/data-centers | | Or a smaller US focused colo provider: | | https://www.coresite.com/data-centers/locations | | Between vendor (Dell, HP, IBM, etc) and the remote hands | offered by the hosting facility you don't ever have to have | a member of your team even enter a facility. Anywhere. | Depending on the warranty/support package the vendor will | dispatch someone to show up to the facility to replace | failed components with little action from you. | | The vendor will be happy to ship the server directly to the | facility (anywhere) and for a nominal fee the colo provider | will rack it and get IPMI, iLo, IP KVM, whatever up for you | to do your thing. When/if something ever "hits the fan" | they have on site 24 hour "remote hands" that can either | take basic pre-prescribed steps/instructions -or- work with | your team directly and remotely. | | Interestingly, at my first startup we had a facility in the | nearest big metro area that not only hosted our hardware | but also provided an easy, cheap, and readily available | meeting space: | | https://www.coresite.com/data-centers/data-center- | design/ame... | kgeist wrote: | >The vast majority of startups out there run out of a single | AWS region with a CDN caching read-only content. | | I wonder how many of them violate GDPR and similar laws in | other countries in regards to personal data processing by | processing everything in the US. | treis wrote: | This is one of those problems that basically no one has. RTT | from Japan to Washington D.C. is 160ms. There's very few | applications where that amount of additional latency matters. | naavis wrote: | It adds up surprisingly quickly when you have to do a TLS | handshake, download many resources on pageload etc. The TLS | handshake alone costs 3 round-trips over the network. | treis wrote: | TLS is cached though. Your 3 round trips is 1/2 second on | initial load but then should be reused for subsequent | requests. | | Resources should be served through a CDN so you'll get | local servers for those. | yomkippur wrote: | What holds me back from doing this is how will I reduce latency | from the calls coming from other side of the world when OVHcloud | seemingly does not have datacenters all over the world? There is | an noticeable lag when it comes to multiplayer games or even web | applications. | tonymet wrote: | people don't account for the cpu & wall-time cost of encode- | decode. I've seen it take up 70% of cpu on a fleet. That means | 700/1000 servers are just doing encode decode. | | You can see high efficiency setups like stackexchange & | hackernews are orders of magnitude more efficient. | pclmulqdq wrote: | This is exactly correct. If you have a microservice running a | Rest API, you are probably spending most of your CPU time on | HTTP and JSON handling. | adam_arthur wrote: | I'm building an app with Cloudflare serverless and you can | emulate everything locally with a single command and debug | directly... It's pretty amazing. | | But the way their offerings are structured means it will be quite | expensive to run at scale without a multi cloud setup. You can't | globally cache the results of a worker function in CDN, so any | call to a semi dynamic endpoint incurs one paid invocation, and | there's no mechanism to bypass this via CDN caching because the | workers live in front of the CDN, not behind it. | | Despite their media towards lowering cloud costs, they have | explicitly designed their products to contain people in a cost | structure similar to but different than via egress fees. And in | fact it's quite easily bypassed by using a non Cloudflare CDN in | front of Cloudflare serverless. | | Anyway, I reached a similar conclusion that for my app a single | large server instance works best. And actually I can fit my whole | dataset in RAM, so disk/JSON storage and load on startup is even | simpler than trying to use multiple systems and databases. | | Further, can run this on a laptop for effectively free, and cache | everything via CDN, rather than pay ~$100/month for a cloud | instance. | | When you're small, development time is going to be your biggest | constraint, and I highly advocate all new projects start with a | monolithic approach, though with a structure that's conducive to | decoupling later. | bilekas wrote: | I don't agree with EVERYTHING in the article such as getting 2 | big rather than multiple smaller, this is really just a | cost/requirement issue though. | | The biggest cost I've noticed with enterprises who go full cloud | is that they are locked in for the long term. I don't mean | contractually though, basically the way they design and implement | any system or service MUST follow the providers "way" this can be | very detrimental for leaving the provider or god forbid the | provider decides to sunset certain service versions etc. | | That said, for enterprise it can make a lot of sense and the | article covers it well by admitting some "clouds" are beneficial. | | For anything I've ever done outside of large businesses the go to | has always been "if it doesn't require a SRE to maintain, just | host your own". | amelius wrote: | Nice until your server gets hugged by HN. | runeks wrote: | > The big drawback of using a single big server is availability. | Your server is going to need downtime, and it is going to break. | Running a primary and a backup server is usually enough, keeping | them in different datacenters. | | What about replication? I assume the 70k postgres IOPS fall to | the floor when needing to replicate the primary database to a | backup server in a different region. | arwhatever wrote: | Recent team I was on used one big server. | | Wound up spawning off a separate thread from our would-be | stateless web api to run recurring bulk processing jobs. | | Then coupled our web api to the global singleton-esque bulk | processing jobs thread in a stateful manner. | | The wrapped actors up on actors on top of everything to try to | wring as much performance as possible out of the big server. | | Then decided they wanted to have a failover/backup server but it | was too difficult due to the coupling to the global singleton- | esque bulk processing job. | | [I resigned at this point.] | | So yeah color me skeptical. I know every project's needs are | different, but I'm a huge fan of dumping my code into some cloud | host that auto-scaled horizontally, and then getting back to | writing more code that provides some freeeking busines value. | the_duke wrote: | This has nothing to do with cloud Vs big server.You can build | horrible, tightly coupled architectures anywhere. You can | alsocleanly separate workloads on a single server just fine. | malkia wrote: | It was all good, until NUMA came, and now you have to careful | rethought your process, or you get lots of performance issues in | your (otherwise) well threaded code. Speaking from first-hand | experience, when our level editor ended up being used by artists | on a server class machine, and supposedly 4x faster machine was | actually going 2x slower (why, lots of std::shared_ptr<> use on | our side, or any atomic reference counting) caused slowdowns, as | the cache (my understanding) had to be synchronized between the | two physical CPUs each having 12 threads. | | But really not the only issue, just pointing out - that you can't | expect everything to scale smoothly there, unless well thought, | like ask your OS to allocate your threads/memory only on one of | the physical CPUS (and their threads), and somehow big | disconnected part of your process(es) on the other one(s), and | make sure the communication between them is minimal.. which | actually wants micro-services design again at that level. | | so why not go with micro-services instead... | faizshah wrote: | In the paper on Twitter's "Who to Follow" service they mention | that they designed the service around storing the entire twitter | graph in the memory of a single node: | | > An interesting design decision we made early in the Wtf project | was to assume in-memory processing on a single server. At first, | this may seem like an odd choice, run- ning counter to the | prevailing wisdom of "scaling out" on cheap, commodity clusters | instead of "scaling up" with more cores and more memory. This | decision was driven by two rationales: first, because the | alternative (a partitioned, dis- tributed graph processing | engine) is significantly more com- plex and di | and, second, because we could! We elaborate on these two | arguments below. | | I always wondered if they still do this and if this influenced | any other architectures at other companies. | | Paper: | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69... | faizshah wrote: | In the paper on Twitter's "Who to Follow" service they mention | that they designed the service around storing the entire twitter | graph in the memory of a single node: | | > An interesting design decision we made early in the Wtf project | was to assume in-memory processing on a single server. At first, | this may seem like an odd choice, run- ning counter to the | prevailing wisdom of "scaling out" on cheap, commodity clusters | instead of "scaling up" with more cores and more memory. This | decision was driven by two rationales: first, because the | alternative (a partitioned, dis- tributed graph processing | engine) is significantly more com- plex and di | and, second, because we could! We elaborate on these two | arguments below. | | I always wondered if they still do this and if this influenced | any other architectures at other companies. | | Paper: | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69... | faizshah wrote: | In the paper on Twitter's "Who to Follow" service they mention | that they designed the service around storing the entire twitter | graph in the memory of a single node: | | > An interesting design decision we made early in the Wtf project | was to assume in-memory processing on a single server. At first, | this may seem like an odd choice, run- ning counter to the | prevailing wisdom of "scaling out" on cheap, commodity clusters | instead of "scaling up" with more cores and more memory. This | decision was driven by two rationales: first, because the | alternative (a partitioned, dis- tributed graph processing | engine) is significantly more com- plex and dicult to build, and, | second, because we could! We elaborate on these two arguments | below. | | > Requiring the Twitter graph to reside completely in mem- ory is | in line with the design of other high-performance web services | that have high-throughput, low-latency require- ments. For | example, it is well-known that Google's web indexes are served | from memory; database-backed services such as Twitter and | Facebook require prodigious amounts of cache servers to operate | smoothly, routinely achieving cache hit rates well above 99% and | thus only occasionally require disk access to perform common | operations. However, the additional limitation that the graph | fits in memory on a single machine might seem excessively | restrictive. | | I always wondered if they still do this and if this influenced | any other architectures at other companies. | | Paper: | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69... | 3pt14159 wrote: | Yeah I think single machine has its place, and I once sped up a | program by 10000x by just converting it to Cython and having it | all fit in the CPU cache, but the cloud still does have a | place! Even for non-bursty loads. Even for loads that | theoretically could fit in a single big server. | | Uptime. | | Or are you going to go down as all your workers finish? Long | connections? Etc. | | It is way easier to gradually handover across multiple API | servers as you do an upgrade than it is to figure out what to | do with a single beefy machine. | | I'm not saying it is always worth it, but I don't even think | about the API servers when a deploy happens anymore. | | Furthermore if you build your whole stack this way it will be | non-distributed by default code. Easy to transition for some | things, hell for others. Some access patterns or algorithms are | fine when everything is in a CPU cache or memory but would fall | over completely across multiple machines. Part of the nice part | about starting with cloud first is that it is generally easier | to scale to billions of people afterwards. | | That said, I think the original article makes a nuanced case | with several great points and I think your highlighting of the | Twitter example is a good showcase for where single machine | makes sense. | efortis wrote: | Some comments wrongly equate bare-metal with on-premise. Bare- | metal servers can be rented out, collocated, or installed on- | premise. | | Also, when renting, the company takes care of hardware failures. | Furthermore, as hard disk failures are the most common issue, you | can have hot spares and opt to let damaged disks rot, instead of | replacing them. | | For example, in ZFS, you can mirror disks 1 and 2, while having 3 | and 4 as hot spares, with the following command: | zpool create pool mirror $d1 $d2 spare $d3 $d4 | | --- | | The 400Gbps are now 700Gbps | | https://twitter.com/DanRayburn/status/1519077127575855104 | | --- | | About the break even point: | | Disregarding the security risks of multi-tenant cloud instances, | bare-metal is more cost-effective once your cloud bill exceeds | $3,000 per year, which is the cost of renting two bare-metal | servers. | | --- | | Here's how you can create a two-server infrastructure: | | https://blog.uidrafter.com/freebsd-jails-network-setup | drewg123 wrote: | 720Gb/s actually. Those last 20-30Gb/s were pretty hard fought | :) | efortis wrote: | Yeah. Thank you! | zhoujianfu wrote: | 10 years ago I had a site running on an 8GB of ram VM ($80/mo?) | that ran a site serving over 200K daily active users on a | completely dynamic site written in PHP running MySQL locally. | Super fast and never went down! | porker wrote: | I like One Big (virtual) Server until you come to software | updates. At a current project we have one server running the | website in production. It runs an old version of Centos, the web | server, MySQL and Elasticsearch all on the one machine. | | No network RTTs when doing too many MySQL queries on each page - | great! But when you want to upgrade one part of that stack... we | end up cloning the server, upgrading it, testing everything, and | then repeating the upgrade in-place on the production server. | | I don't like that. I'd far rather have separate web, DB and | Elasticsearch servers where each can be upgraded without fear of | impacting the other services. | rlpb wrote: | You could just run system containers (eg. lxd) for each | component, but still on one server. That gets you multiple | "servers" for the purposes of upgrades, but without the rest of | the paradigm shift that Docker requires. | 0xbadcafebee wrote: | Which is great until there's a security vuln in an end-of- | life piece of core software (the distro, the kernel, lxc, | etc) and you need to upgrade the whole thing, and then it's a | 4+ week slog of building a new server, testing the new | software, fixing bugs, moving the apps, finding out you | missed some stuff and moving that stuff, shutting down the | old one. Better to occasionally upgrade/reinstall the whole | thing with a script and get used to not making one-off | changes on servers. | | If I were to buy one big server, it would be as a hypervisor. | Run Xen or something and that way I can spin up and down VMs | as I choose, LVM+XFS for snapshots, logical disk management, | RAID, etc. But at that point you're just becoming a personal | cloud provider; might as well buy smaller VMs from the cloud | with a savings plan, never have to deal with hardware, make | complex changes with a single API call. Resizing an instance | is one (maybe two?) API call. Or snapshot, create new | instance, delete old instance: 3 API calls. Frickin' magic. | | _" the EC2 Instance Savings Plans offer up to 72% savings | compared to On-Demand pricing on your Amazon EC2 Instances"_ | - https://aws.amazon.com/savingsplans/ | rlpb wrote: | Huh? Using lxd would be identical to what you suggest (VMs | on Xen) from a security upgrade and management perspective. | Architecturally and operationally they're basically the | equivalent, except that VMs need memory slicing up but lxd | containers don't. There are security isolation differences | but you're not talking about that here? | 0xbadcafebee wrote: | I would want the memory slicing + isolation, plus a | hypervisor like Xen doesn't need an entire host OS so | there's less complexity, vulns, overhead, etc, and I'm | not aware if LXD does the kind of isolation that ex. | allows for IKE IPSec tunnels? Non-hypervisors don't allow | for it iirc. Would rather use Docker for containers | because the whole container ecosystem is built around it. | rlpb wrote: | > I would want the memory slicing + isolation... | | Fine, but then that's your reason. "until there's a | security vuln in an end-of-life piece of core | software...and then it's a 4+ week slog of building a new | server" isn't a difference in the context of comparing | Xen VMs and lxd containers. As an aside, lxd does support | cgroup memory slicing. It has the advantage that it's not | mandatory like it is in VMs, but you can do it if you | want it. | | > Would rather use Docker for containers because the | whole container ecosystem is built around it. | | This makes no sense. You're hearing the word "container" | and inferring an equivalence that does not exist. The | "whole container ecosystem" is something that exists for | Docker-style containers, and is entirely irrelevant for | lxd containers. | | lxd containers are equivalent to full systems, and exist | in the "Use one big server" ecosystem. If you're familiar | with running a full system into a VM, then you're | familiar with the inside of a lxd container. They're the | same. In userspace, there's no significant difference. | YetAnotherNick wrote: | Even lxd has updates, many a times security updates. | ansible wrote: | I use LXC a lot for our relatively small production setup. | And yes, I'm treating the servers like pets, not cattle. | | What's nice is that I can snapshot a container and move it to | another physical machine. Handy for (manual) load balancing | and upgrades to the physical infrastructure. It is also easy | to run a snapshot of the entire server and then run an | upgrade, then if the upgrade fails, you roll back to the old | snapshot. | pclmulqdq wrote: | Containers are your friend here. The sysadmin tools that have | grown out of the cloud era are actually really helpful if you | don't cloud too much. | cxromos wrote: | is this clickbait? | | although i do like the alternate version: use servers, but don't | be too serverly. | jedberg wrote: | I'm a huge advocate of cloud services, and have been since 2007 | (not sure where this guy got 2010 as the start of the "cloud | revolution"). That out of the way, there is something to be said | for starting off with a monolith on a single beefy server. You'll | definitely iterate faster. | | Where you'll get into trouble is if you get popular quickly. You | may run into scaling issues early on, and then have to scramble | to scale. It's just a tradeoff you have to consider when starting | your project -- iterate quickly early and then scramble to scale, | or start off more slowly but have a better ramping up story. | | One other nitpick I had is that OP complains that even in the | cloud you still have to pay for peak load, but while that's | strictly true, it's amortized over so many customers that you | really aren't paying for it unless you're very large. The more | you take advantage of auto-scaling, the less of the peak load | you're paying. The customers who aren't auto-scaling are the ones | who are covering most of that cost. | | You can run a pretty sizable business in the free tier on AWS and | let everyone else subsidize your peak (and base!) costs. | rmbyrro wrote: | Isn't this simplistic? | | It really depends on the service, how it is used, the shape of | the data generated/consumed, what type of queries are needed, | etc. | | I've worked for a startup that hit scaling issues with ~50 | customers. And have seen services with +million users on a | single machine. | | And what does "quickly" and "popular" even mean? It also | depends a lot on the context. We need to start discussing about | mental models for developers to think of scaling in a | contextual way. | Phil_Latio wrote: | > Where you'll get into trouble is if you get popular quickly. | You may run into scaling issues early on | | Did it ever occur to you that you can still use the cloud for | on demand scaling? =) | jedberg wrote: | Sure but only if you architect it that way, which most people | don't if they're using one big beefy server, because the | whole reason they're doing that is to iterate quickly. It's | hard to build something that can bust to the cloud while | moving quickly. | | Also, the biggest issue is where your data is. If you want to | bust to the cloud, you'll probably need a copy of your data | in the cloud. Now you aren't saving all that much money | anymore and adding in architectural overhead. If you're going | to bust to the cloud, you might as well just build in the | cloud. :) | EddySchauHai wrote: | > But if I use Cloud Architecture, I Don't Have to Hire Sysadmins | | > Yes you do. They are just now called "Cloud Ops" and are under | a different manager. Also, their ability to read the arcane | documentation that comes from cloud companies and keep up with | the corresponding torrents of updates and deprecations makes them | 5x more expensive than system administrators. | | I don't believe "Cloud Ops" is more complex than system | administration, having studied for the CCNA so being on the | Valley of Despair slope of the Dunning Kruger effect. If keeping | up with cloud companies updates is that much of a challenge to | warrant a 5x price over a SysAdmin then that's telling you | something about their DX... | abrax3141 wrote: | I may be misunderstanding, but it looks like the micro-services | comparison here is based on very high usage. Another use for | micro-services, like lambda, is exactly the opposite. If you have | very low usage, you aren't paying for cycles you don't use the | way you would be if you either owned the machine, or rented it | from AWS or DO and left it on all the time (which you'd have to | do in order to serve that randomly-arriving one hit per day!) | pclmulqdq wrote: | If you have microservices that truly need to be separate | services and have very little usage, you probably should use | things like serverless computing. It scales down to 0 really | well. | | However, if you have a microservice with very little usage, | turning that service into a library is probably a good idea. | abrax3141 wrote: | Yes. I think that the former case is the situation we're in. | Lambdas are annoying (the whole AWS is annoying!) but, as you | say, scales to 0 very well. | marcosdumay wrote: | Why open yourself to random $300k bills from Amazon when the | alternative is wasting a $5/month server? | abrax3141 wrote: | I don't understand what these numbers are referring to. | marcosdumay wrote: | One is a normal size of those rare, but not too rare bills | people get from Amazon when their unused unoptmized | application gets some surprise usage. | | The other is how much it costs to have an always-on server | paid VPS capable of answering the once a day request you | specified. ___________________________________________________________________ (page generated 2022-08-02 23:00 UTC)