[HN Gopher] Scaling to 100k Users ___________________________________________________________________ Scaling to 100k Users Author : sckwishy Score : 280 points Date : 2020-02-05 16:29 UTC (6 hours ago) (HTM) web link (alexpareto.com) (TXT) w3m dump (alexpareto.com) | majkinetor wrote: | This is relevant only for multimedia apps. | | I have fintech systems in production with 100k+ users with | complex Gov app for entire country that runs on commodity | hardware (majority of work done by 1 backend server , 1 database | server and all reporting by 1 reporting backend server using the | same db). Based on our grafana metrics it can survive x10 number | of users without upgrade of any kind. It runs on linux and dot | net core and Sql Server. | | Most of the software is not multimedia in nature and those | numbers are off the charts for such systems. | rlander wrote: | I'll bite. What's a "fintech system" with "complex Gov app"? | jermaustin1 wrote: | Looking at his CV [1], it looks like he's done a lot of work | inside the Serbiam Treasury. | | 1: https://gist.github.com/majkinetor/877d5174ba322fbb808cc47 | a8... | Kiro wrote: | Yeah, it really depends on the kind of app. My app with a | couple of hundred thousand users runs on a single $5 Digital | Ocean droplet with standard PHP and MySQL. | leeoniya wrote: | > My app with a couple of hundred thousand users | | per year? per month? per day? simultaneously? doing what? | | it matters. | | i ask this as someone who runs a $40/mo Linode with a | debian/nginx/node/mysql stack that's definitely 20x over- | provisioned for an e-commerce site with 10k daily visitors, | 15 simultaneous backend users (reporting, order-entry, CRM, | analytics) and 0 caching tricks. i could easily run the site | on any 5 year old laptop with an SSD and 8GB RAM. | | normalize/de-normalize when needed, understand and hand-write | efficient SQL queries (ditch ORMs), choose small/fast libs | carefully (or write your own), and you can easily serve 100k | users per day on a single cheap VPS with no | orchestration/replication/hz-scaling bullshit. definitely | can't say the same about 200k _simultaneous_ users - that | would need proper hardware, but can still be a single server. | | Monoliths Are the Future: | https://news.ycombinator.com/item?id=22193383 | neillyons wrote: | What is the e-commerce site? | Demiurge wrote: | Are you offering a free stress test? :) | Scarbutt wrote: | _hand-write efficient SQL queries (ditch ORMs)_ | | Raw SQL strings or a query builder? | karambir wrote: | Yeah, for normal web app, we can easily have 100x users for | each step mentioned in article. | | For our company, we had more than 500k users with 1 small nginx | + 1 medium appServer(with autoscaling, though never needed it) | + 1 small cache server and RDS till now. We just added a aws | managed load balancer into the setup and think it might be | overkill. | | For a client with NewsFeed needs, I used a dedicated | server(64GB, 2TB space) to run nginx, app, cache, huge | elasticsearch and postgres. It was great(and cheap) option for | an MVP and let them validate the product for few months with | >10k users. | | It was awesome to learn a few years ago, how much compute power | we don't use. | bob1029 wrote: | Thank you for this post. I read "10 Users: Split out the | Database Layer" and about had an aneurysm. | | I also work with fintech systems built upon .NET Core and have | similar experiences regarding scaling of these solutions. You | can get an incredible amount of throughput from a single box if | you are careful with the technologies you use. | | A single .NET Core web API process using Kestrel and whatever | RDBMS (even SQLite) can absolutely devour requests in the <1 | megabyte range. I would feel confident putting the largest | customer we can imagine x10 on a single 16~32 core server for | the solution we provide today. Obviously, if you are pushing | any form of multimedia this starts to break down rapidly. | thinkmassive wrote: | It seems like they recommend splitting out the database at | the start because using a managed service is much easier than | properly managing your own production database. | throwaway5752 wrote: | I can't speak for high performance/near or realtime system | people but I would not trust a managed database service for | that needs. My experience is that the managed offerings lag | behind upstream and usually are economical because they are | multitenant. So you have a bit less predictability in io | wait / cpu queue, lose host level tunings (page sizes or | hugepages, share memory allocation, etc), and - not naming | names - some managed db services are so behind they lack | critical query planner profiling features. That's not even | going into application workload specific tuning for various | nosql stores. This is a nice article but its audience is | people that haven't scaled up a system but are trying to | cope with success. It's not great generalized scaling | advice. | tempestn wrote: | Even though the absolute numbers may well not apply to other | types of apps, the general concepts of how to scale do. We have | 100k+ users on effectively a single box too (actually two for | redundancy, ease of upgrades, etc., but one can handle it), but | this is a great overview of how to think about scaling beyond | that, however many users that's done at. Honestly when I was | reading the article I read that 1/10/100 as more of a unitless | degree of scale than actual numbers of humans. | grezql wrote: | SQL Server is a different beast. You get alot of performance | enhancement out of box. Yes its costs money you save tremendous | amount of tweaking time and headaches. | viggity wrote: | or you can just use AzureSQL and essentially just pay what it | costs for the box, because it is platform as a service. Its | far cheaper (and easier to maintain) than what it'd cost to | buy a SQL Server license and run it on a VM. | kernoble wrote: | Hey, really specific question regarding your deployment. A | teammate of mine reported difficulties with the .net SqlServer | database driver establishing connection from a linux client | (container instance based on the public .net core image) to a | SqlServer instance (on windows) . Are you familiar with this | problem? I think moving our systems over to .NET Core on linux | is the future, but this one experience has somewhat soured the | idea for some decision makers and the team managing our db. | NicoJuicy wrote: | Any tests on progress? I'm going on Postgress because of Bson | support and no-sql with sql duplicate fields for search. | | Using .net core also. | | In general: I agree, there's rarely a case for really using the | cloud. Page loads of my E-commerce project are 8ms for basket, | I'm wondering what should kill it first on a big load, even | without caching. Probably the database, not sure yet. | munns wrote: | As the original creator of the presentation referenced by the | blog author (later re-delivered by Joel in the linked post) I am | super excited to see this still have an impact on people, but I'd | say today in 2019 you'd probably do things very differently(as | others call out). | | Tech has progressed really far and there are tools like Netlify | for hosting that would replace 90% of the non-DB parts of this. | Cloud providers have also grown drastically and so again a lot of | this would/could look a lot lot different. | | Fwiw original deck from Spring of 2013, delivered at a VC event | and then went on to be the most viewed/shared deck on Slidehare | for a bit: https://www.slideshare.net/AmazonWebServices/scaling- | on-aws-... | | thanks, - munns@AWS | debaserab2 wrote: | Does it look that much different if you exclude solutions that | increase vendor lock-in? | munns wrote: | On your own metal it looks like it does in this post. | | With managed services it looks a world different. | Swizec wrote: | You always pay the vendor. Whether that's in sweat and tears | or in dollars is up to you. | | Fwiw, you are almost certainly shooting yourself in the foot | by avoiding vendor lockin at stages before 8 revenue figures | per year. Your engineering takes longer, is more brittle, and | because you're only using 1 vendor actively, your solution is | still vendor locked-in. | | Love, ~ Guy who learned his lesson many times | munns wrote: | Fwiw: Slack, Lift, AirBnb, Snapchat, Stripe are all 100% | public cloud based (in so far as I know). So up through 8+ | figures they are still doing it too. | | removed Uber as its not 100% cloud (or at least wasn't in | the past) | jayp wrote: | Uber has always been self hosted. Some workloads are on | Cloud and migrating more there. I last worked there 2 | years ago. | munns wrote: | Thank you for clarifying! I know quite a lot has | supposedly shifted. Will update my original comment. | debaserab2 wrote: | I think it depends on the type of vendor lock-in -- sure, | the trade off of having a managed Postgres instance is | obvious, but it becomes less obvious to me when you're | using things like a proprietary queueing or deployment | service. | | Writing service API integration code instead of code that | interfaces directly with the technology that service is | doing makes code quite brittle. If/when the vendor | deprecates the service, introduces backwards incompatible | changes, or abandons development of the product, you're | left on the hook to engineer your way out of that problem. | Often times that effort is equal to or greater than the | effort of an in-house solution in the first place. | | I had the same mentality as you until this happened to the | SaaS product I work on for a few different services. Now at | very least I try to make sure solutions are cloud agnostic. | thaniri wrote: | This blog post is almost entirely a re-hash of | http://highscalability.com/blog/2016/1/11/a-beginners-guide-... | | The primary difference is that this post tries to be more | generic, whereas the original is specific to AWS. | | The original, for what it is worth, is far more detailed than | this one. | lixtra wrote: | That's on purpose: | | >> This post was inspired by one of my favorite posts on High | Scalability. I wanted to flesh the article out a bit more for | the early stages and make it a bit more cloud agnostic. | Definitely check it out if you're interested in these kind of | things. | [deleted] | gfodor wrote: | It's probably a bad idea to switch to read only replicas for | reads pre-emptively, vs vertically scaling up the database. Doing | so adds a lot of incidental complexity since you have to avoid | read after writes, or ensure the reads come from the master. | | The reason punting on this is a good idea is because you can get | pretty far with vertical scaling, database optimization, and | caching. And when push comes to shove, you are going to need to | shard the data anyway to scale writes, reduce index depths, etc. | So a re-architecture of your data layer will need to happen | eventually, so it may turn out that you can avoid the | intermediate "read from replica" overhaul by just punting the | ball until sharding becomes necessary. | danenania wrote: | For those who have reached vertical database write scaling | limits and had to start sharding, I'm curious what kind of load | that entails? Looking at RDS instances, the biggest one is | db.r5.24xlarge with 48 cores and 768 gb ram. I imagine that can | take you quite a long way--perhaps even into millions of users | territory for a well-designed crud app that's read-heavy and | doesn't do anything too fancy? | adventured wrote: | > the biggest one is db.r5.24xlarge with 48 cores and 768 gb | ram. I imagine that can take you quite a long way--perhaps | even into millions of users territory | | That will run Stackoverflow's db by itself for reference, | along with sensible caching (they're very read-heavy and | cache like crazy). Here's their hardware for their SQL server | for 2016: | | 2 Dell R720xd Servers featuring: Dual E5-2697v2 Processors | (12 cores @2.7-3.5GHz each), 384 GB of RAM (24x 16 GB DIMMs), | 1x Intel P3608 4 TB NVMe PCIe SSD (RAID 0, 2 controllers per | card), 24x Intel 710 200 GB SATA SSDs (RAID 10), Dual 10 Gbps | network (Intel X540/I350 NDC). | | https://nickcraver.com/blog/2016/03/29/stack-overflow-the- | ha... | bcrosby95 wrote: | Very far I would guess. 10 years ago we took a single bare | metal database server running mysql with 8 cores and 64gb of | memory to 8 million daily users. 15k requests per second of | per user dynamic pages at peak load. | | We did use memcached where we could. | jedberg wrote: | The problem with going to the "top of the vertical" scaling so | to speak is that one day, if you're lucky, you'll have enough | traffic that you'll reach the limit. And it will be like | hitting a wall. | | And then you have to rearchitect your data layer under extreme | duress as your databases are constantly on fire. | | So you really need to find the balance point and start doing it | _before_ your databases are on fire all the time. | NicoJuicy wrote: | I actually implemented domain driven design WITH an Api-layer | ( so core, application, Infrastructure + api). They also are | split on Basket, catalog, checkout, shipping and pricing | domain with seperate db's. | | So just splitting up the heaviest part (eg. Catalog) into "a | microservice" would be easy while I add nginx as load | balancer. I already separated domain vs Integration Events. | | Both now use events in memory in the application, I only need | a message broker like NATS then for the integration events. | | It would be a easy wall ;). I have multiple options like | heavier hardware, splitting up the db from application server | or splitting up a domain bound api to a seperate server. | | As long as I don't need multimedia streaming, kubernetes or | implement Kafka the future is clear. | | Ps. Load balancing based on tenant and cookie would be a easy | fix in extreme circomstances. | | The thing I'm afraid for the most is hitting the identity | server for authentication/token verification. Not sure if | it's justified though. | | Side note: one application has an insane amount of complex | joins and will not scale :) | toast0 wrote: | Assuming you have a relatively stable growth curve, you | should have some ability to predict how long your hardware | upgrades will last. | | With that, you can start planning your rearchitecture if | you're running out of upgrades, and start implementing when | your servers aren't yet on fire, but are likely to be. | | Today's server hardware ecosystem isn't advancing as reliably | as it was 8 years ago, but we're still seeing significant | capacity upgrades every couple years. If you're CPU bound, | the new Zen2 Epyc processors are pretty exciting, I think | they also increased the amount of accessible ram, which is | also a potential scaling bottleneck. | jedberg wrote: | > Assuming you have a relatively stable growth curve, you | should have some ability to predict how long your hardware | upgrades will last. | | But that's not how the real world works. The databases | don't just slowly get bad. They hit a wall, and when they | do it is pretty unpredictable. Unless you have your scaling | story set ahead of time, you're gonna have a bad day (or | week). | wolco wrote: | That's exactly how the real world works. Databases will | get slow, then slower. Resources get used. Unpredictable | not really. Maybe you've run out of space or ram or | processes are hanging. The database will never just start | rending html or formatting your disk or email someone. It | is pretty predictable. | jedberg wrote: | The failure I've seen multiple times is that the database | is returning data within normal latencies, and then there | is a traffic tipping point and the latencies go up 1000x | for all requests. | toast0 wrote: | If you're lucky, the wall is at 95-100% cpu. Oftentimes, | we're not that lucky, and when you approach 60%, | everything gets clogged up, I've even worked on systems | where it was closer to 30%. | | Usually, databases are pretty good at running up to 100%, | though. And if you started with small hardware, and have | upgraded a few times already, you should have a pretty | good idea of where your wall is going to hit. Some | systems won't work much better on a two socket system | than a one socket system, because the work isn't open to | concurrency, but again, we're talking about scaling | databases, and database authors spend a lot of time | working on scaling, and do a pretty good job. Going | vertically up to a two socket system makes a lot of sense | on a database; four and eight socket systems could work | too, but get a lot more expensive pretty fast. | | Sometimes, the wall on a databases is from bad queries or | bad tuning; sharding can help with that, because maybe | you isolate the bad queries and they don't affect | everyone at once, but fixing those queries would help you | stay on a single database design. | bcrosby95 wrote: | The minute your RDBMS' hot dataset doesn't fit into | memory its going to shit itself. I've seen it happen | anywhere from 90% CPU down to around 10%. Queries that | were instant can start to take 50ms. | | It can be an easy fix (buy more memory), but the first | time it happens it can be pretty mysterious. | lllr_finger wrote: | DID is an extremely important concept that is alien to a lot | of developers: Deploy for 1.5X, Implement for 3X, Design for | 10X (your numbers may vary slightly) | [deleted] | cactus2093 wrote: | There are some cases where adding a read replica can be helpful | at almost no extra overhead - for instance if your product has | something like a stats dashboard you'll have some heavy queries | that are never going to result in a write after read and don't | matter if they are sometimes a few ms or even a few seconds or | tens of seconds out of date. Similarly if you have analysts | poking around running exploratory queries, a read replica can | be the first step towards an analytics workflow/data warehouse. | charlesju wrote: | These posts are great and there is always great information in | them. But to nitpick, it would be a lot easier to digest on face | value if you lead with concurrency rather than raw total users as | that's the true gauge of how your server infrastructure looks | like. | stingraycharles wrote: | Yeah I still don't understand the need to split servers at 10 | users. Even if this is in parallel, it must still mean there is | a well-beyond-average resource consumption per user. | cwingrav wrote: | Probably so when your 10 users grow to 1000, your efforts at | 10x are good for 1000x, and you're working on 100000x. | superphil0 wrote: | Use firebase or any other serverless architecture and forget | about scaling and devops. Not only will you save development time | but also money because you need less developers. Yes I understand | at some point it will get expensive, but you can still optimize | later and move expensive parts to your own infrastructure if | needed | ablekh wrote: | Nice, but very simplistic (on purpose, it seems), write-up on the | topic. For a much more comprehensive and excellent coverage of | designing & implementing large-scale systems, see | https://github.com/donnemartin/system-design-primer. Also, I want | to mention that an important - and relevant to scaling - aspect | of _multi-tenancy_ is very often (as is in Alex 's post) not | addressed. Most of the large-scale software-intensive systems are | SaaS, hence the multi-tenancy importance and relevance. | huzaif wrote: | We can now achieve pretty high scalability from day 1 with a tiny | bit of "engineering cost" up front. Serverless on AWS is pretty | cheap and can scale quickly. | | App load: |User| <-> |Cloudfront| <-> |S3 hosted React/Vue app| | | App operations: |App| <-> |Api Gateway| <-> |Lambda| <-> |Dynamo | DB| | | Add in Route53 for DNS, ACM to manage certs, Secrets Manager to | store secrets, SES for Email and Cognito for users. | | All this will not cost a whole lot until you grow. At that point, | you can make additional engineering decisions to manage costs. | aratakareigen wrote: | Great, but this reads like a particularly blunt Amazon ad. Is | there a way to achieve "high scalability" without selling my | soul to Amazon? | dumbfoundded wrote: | I think if you use something like serverless, you can | abstract the cloud layer. I've never used it for anything | more than a toy project though. | | https://serverless.com/ | huzaif wrote: | Yes, it does read like that. | | In the context of a start-up, cost is a big factor and then | perhaps (hopefully) handling growth. You could start small | and refactor apps/infrastructure as you grow but I am unsure | how one could afford to do that efficiently while also | managing a growing startup. | | On the selling soul to cloud provider, I don't see it like | that. I have a start-up to bootstrap and I want to see it | grow before making altruistic decisions that would sustain | the business model. | | Once you are past the initial growth stage, there are many | options for serverless, gateway, caches, proxies that can be | orchestrated in K8 on commodity VMs in the datacenter. Though | this is where you would need some decent financial backing. | | (I am not associated with Amazon, Google or Azure. I do run | my start-up on Azure.) | ignoramous wrote: | I'm down a similar route, but I must point out that beyond | a certain number of users / scale, Serverless becomes cost- | prohibitive. For instance, per back-of-the-napkin | calculation, the Serverless load I run right now, though | very cost-effective for the smaller userbase I've got, | would quickly spiral out of control once I cross a | threshold (which is at 40k users). At 5M users, I'd be | paying an astonishing 100x the cost than if I hosted the | services on a VPS. That said, Serverless does reduce DevOps | to an extent but introduces different but fewer other | complications. | | As patio11 would like to remind us all, _we 've got a | revenue problem, not a cost problem._ [0] | | [0] https://news.ycombinator.com/item?id=22202301 | sky_rw wrote: | Yes, sell your soul to Google. | fragmede wrote: | The big clouds have similar enough products, just the names | are changed, so at a high level, GP's list of AWS products | can be swapped with eg, Azure's product names. | https://www.wintellect.com/the-rosetta-stone-of-cloud- | servic... | | Sadly, anything more in-depth than that, you'll need to sign | an NDA with AWS to learn anything about the performance | limits of their services (eg Redshift), and you won't get | that unless you're already a big customer there. Azure's not | going to be falling over themselves to let you know where | they fall short, either. This is vendor lock-in, and is why | there are so many free cloud credits to be had to startups. | | This is also a reason I believe SaaS companies will find it | is harder than they realized to arbitrage between clouds, and | business models based on that may not be able to get that | right. | papito wrote: | I bet that DNC Iowa primaries app was serverless. Problem | solved! [dusts off hands]. | marriedWpt wrote: | AWS seems like it would be expensive long term. | | Between my issues with AWS currently and the exterior look of | Amazon, I'm skeptical AWS is a good solution. | lbriner wrote: | Like most providers, it does depend. Some products are priced | very competitively while others seem over-the-top. For | smaller companies, the cloud is a cheaper starting point for | many systems but even for larger organisations, there are | savings to be made by outsourcing your servers. Do you know | how much it costs to install and maintain a decent air-con | system for your server room? | | One of the other major advantages of cloud is that you can | save a lot in support staff. Compare the wages of even 1 | decent sysadmin looking after your own hardware compared to | several thousand dollars of AWS and it's still loads cheaper. | Hardware upgrades, OS updates etc. are often automatic or | hidden. | ludamad wrote: | I hated DynamoDB. What good is there about it other than | convenience? | ignoramous wrote: | I've found that KV stores like DynamoDB make for a good | control-plane configuration repository. For instance, say, | you need to know if a client, X, is allowed to access a | resource, Y. And, say, you've clients in order of millions | and resources in order of 100s, and you've got very specific | queries to execute on such denormalized data and need | consistently low latency and high throughput across key- | combinations. | | Another good use-case is to store checkpointing information. | Say, you've processed some task and would like to check-in | the result. Either the information fits the 400KB DynamoDB | limit or you use DynamoDB as a index to a S3 file. | | You could do those things with managed or self-hosted RDBMS, | but DynamoDB takes away the need to manage the hardware, the | backups, the scale-ups, and the scale-outs, reduces ceremony | whilst dealing with locks, schemas, misbehaving clients, and | myraid other configuration knobs whilst also fitting your | queries patterns to a tee. | | KV stores typically give you consistent performance on reads | and writes, if you avoid cascading relationships between two | or more keys, and make just the right amount of trade-offs in | terms of both cross-cluster data-consistency and cross-table | data-consistency. | | Besides, in terms of features, one can add a write-through | cache in front of a DynamoDB table, can point-in-time-restore | data up to a minute granularity, can create on-demand tables | that scale with load (not worry about provisioned capacity | anymore), can auto-stream updates to Elasticsearch for | materialised views or consume the updates in real-time | themselves, can replicate tables world-wide with lax | consistency guarantees and so on...with very little fuss, if | any. | | Running databases is hard. I pretty much exclusively favour a | managed solution over self-hosted one, at this point. And for | denormalized data, a managed KV store makes for a viable | solution, imo. | danenania wrote: | All good points, but one thing people should look at very | closely before choosing DynamoDB as a primary db is the | transaction limits. Most apps are going to have some | operations that should be atomic and involve more than 25 | items. With DynamoDB, your only option currently is to | break these up into multiple transactions and hope none of | them fail. But as you scale, eventually some _will_ fail, | while others in the same request succeed, leaving your data | in an inconsistent state. | | While this could be ok for some apps, I think for most use | cases it's really bad and ends up being more trouble than | what you save on ops in the long run, especially | considering options like Aurora that, while not as hands- | off as Dynamo, are still pretty low-maintenance and don't | limit transactions at all. | Scarbutt wrote: | If you don't mind watching a video: | https://www.youtube.com/watch?v=6yqfmXiZTlM | dkarras wrote: | It will cost an arm and a leg when you eventually grow though. | agumonkey wrote: | odd, that was my google query of yesterday.. | | I'm curious what kind of hardware can sustain 100k concurrent | connections these days. | lbriner wrote: | We were running a speed test with node vs dotnet core and even | on a small Linux box (4GB, 1 core), we could reach nearly 10K | concurrent requests for a basic HTTP response but the exact | nature of the system will affect that massively. | | Add large request/response sizes or CPU/RAM bound operations | and your servers can very quickly reach their limits with far | fewer concurrent requests. | | Architecture is a big picture task since you have to consider | the whole system before implementing part of it, otherwise you | end up having to start again. | agumonkey wrote: | thanks that's already a lower bound point of reference | bcrosby95 wrote: | Conversely, rent or buy 1 bare metal server. That's how we went | until we hit around 300k users. Back in 2008. | brokencode wrote: | I think it's kind of crazy that we have 64 core processors | available, but still need so many servers to handle only a | hundred thousand users. That's what, a few thousand requests | per second max? | | Having many servers gives you redundancy and horizontal | scalability, but also comes at a high complexity and | maintenance cost. Also, with many machines communicating over | the network, latency and reliability can become much harder to | manage. | | Most smaller companies can probably get away with having a | single powerful server with one extra server for failover, and | probably two more for the database with failover as well. I | think this would also result in better performance and | reliability as well. I'm curious to know whether the author | tried vertical scaling first or went straight to horizontal | scaling. | jedberg wrote: | Heh, this is one of the questions I liked to used for interviews. | | "Let's work together and design a system that scales appropriate | but isn't overbuilt. Let's start with 10 users". | | Then we talk about what we need and go from there. The end result | looks a lot like this blog post, for those who are qualified. | ignoramous wrote: | /offtopic | | Heh, you're being modest. I'm sure you've dealt with far more | complex distributed systems than the hypothetical one in the | blog post. | jedberg wrote: | Sure, but most of the people I was interviewing hadn't, so it | was a good way to test their knowledge. :) | | If you can scale to 100K users, you can probably learn the | rest to scale to 100M users. | k2xl wrote: | A little bit of overkill recommendations here. | | With 10 users you don't "need" to separate out the database | layer. Heck you don't need to do that with 100 users. Website I | ran back in 2007-2010 had tens of thousands of users on a single | machine running app, database, and caching fine. | | Users are actually a really poor way use for scalability | planning. What's more relevant is queries/data transmission per | interval, and also the distribution of the type of data | transfers. | | I'd say replace the "Users" in this posts to "queries per second" | and then I think it's a better general guide. | segmondy wrote: | IMHO, I believe discussion about scale should begin with at least | 1Million users these days. 100k has been old news for more than a | decade. | lbriner wrote: | As stated above, the number of users is not a measure of a | system, it is the concurrency multiplied by the typical system | loads per action. | | Clearly a million users on Facebook is much heavier than a | million registered with online banking and who only use it once | a month. | erkken wrote: | We now use a DigitalOceans managed database with 0 standby nodes, | coupled with another instance running Django. It is working good. | | We are however actually thinking about switching to a new | dedicated server at another provider (Hetzner) where we are | looking at having the Web server and the DB on the same server, | however the new server will have hugely improved performance | (which is sometimes needed), still at a reduced cost compared to | the DigitalOcean setup. | | The thing we are doubting is if having a managed db is worth it. | The sell in is that everything is of course managed. But what | does this mean in reality? Updating packages is easy, backups as | well (to some extent), and we still do not use any standby nodes | and doubt we will need any replication. So far we have never had | the need to recover anything (about 5 years). Before we got the | managed db we had it in the same machine (as we are now looking | at going back to) and never had any issues. | | Any input? | heffer wrote: | Do note that with dedicated servers you are subjecting yourself | to things such as hardware upgrades and failures which you will | have to manage yourself if you want to prevent downtime. | | And while Hetzner customer support is generally excellent, in | my experience, their handling of DDoS incidents will generally | leave your server blackholed and sometimes requires manual | intervention to get back online. | | This is something you need to account for in terms of | redundance if you are planning to expose your application | directly to the net without any CDN/Load balancer/DDoS filter | in place. | | From my experience it makes sense to work with a data centre | that is less focussed on a mass market but allows for | individual client relations to mitigate risks like that. I love | Hetzner for what they are and do host some services with them, | but I wouldn't build a business around services hosted there. | | And this not only goes for Hetzner but pretty much any provider | whose business model is based on low margin/high throughput. | erkken wrote: | Oh, their DDoS protection was one of the reasons we were | thinking about moving away from DO. | | It is a public facing SaaS API which does not have much | traffic in terms of requests, but would be catastrophic to be | blackholed. So its that bad? | | Regarding hardware failures- have never experienced any so | far, but guess it's just a question of when then. | heffer wrote: | > It is a public facing SaaS API which does not have much | traffic in terms of requests, but would be catastrophic to | be blackholed. So its that bad? | | Well, depends on whether you have people that don't like | you. For them it can be rather easy to stage a DDoS against | your server and take that server offline for some time. | | > Regarding hardware failures- have never experienced any | so far, but guess it's just a question of when then. | | It has been happening to me much less often since they | switched most of their portfolio over to "Enterprise Grade" | disks. These days I tend to go with NVMe anyway so it has | become less of an issue. | ryanar wrote: | I thought part of their managed service was that they optimized | / tuned your postgres db based on how you were using it. If | that is true, then moving off of the managed service means you | are tuning postgres yourself now. | | Also want to throw in there that it is important to not only | compare specs, but to also compare hardware. If DO has newer | chips and faster RAM, then you will take a performance hit | moving to the new provider even if the machine is beefier. | sb8244 wrote: | They quote "We'll handle setting up, backing up, and | updating". I interpret that as literally the database itself, | not the application specific nature of it--how it's used. | | For example, I would be surprised if they noticed that your | IOPS was high and you needed to upgrade the storage/disk | components. (That would be cool if it's the type of thing | they offer). | adventured wrote: | Pretty certain that DO tunes to broad usage performance | optimization (all the easy, obvious performance wins), not | dynamically per client to each client's usage. | | Here's their pitch: easy setup & maintenance, scaling, daily | backups, optional standby nodes & automated failover, fast | reliable performance including SSDs, can run on the private | network at DO and encrypts data at rest & in transit. | marcinzm wrote: | That seems pretty aggressive for just 100k users unless they mean | concurrent users (in which case they should say so). | | Let's say that maybe 10% of your users are on at any given time | and they each may make 1 request a minute. That's under 200 QPS | which a single server running a half-decent stack should be able | to handle fine. ___________________________________________________________________ (page generated 2020-02-05 23:00 UTC)