hngopher.com

       [HN Gopher] Single dependency stacks
       ___________________________________________________________________
        
       Single dependency stacks
        
       Author : jeffreyrogers
       Score  : 142 points
       Date   : 2022-02-09 16:53 UTC (6 hours ago)
        
 (HTM) web link (brandur.org)
 (TXT) w3m dump (brandur.org)
        
       | bcrosby95 wrote:
       | I always start with just MySQL and introduce things as needed -
       | not as guessed. These days I don't work on anything with enough
       | traffic that needs more than that.
       | 
       | An RDBMS is a lot more than just SQL these days, and they offer a
       | lot of good enough solutions to a wide variety of problems.
        
         | mrweasel wrote:
         | Completely agreed, sadly we're seeing a ton of developers who
         | are honestly more interested in getting half baked solutions
         | out the door so they can move on to the next project. We have
         | one customer who run a huge national project on a few MariaDB
         | servers, one can technically run the whole thing, it's no
         | problem. Another customer is smaller but insist on using
         | Hibernate, but they don't really know how to use it, so they'll
         | frequently kill the database generating silly queries. Instead
         | of accepting that may they don't fully understand their choose
         | stack, they try to "solve" the problem by adding things like
         | Kubernetes and Kafka, complicating everything.
         | 
         | Modern databases, and servers in general is capable of amazing
         | things, but there's a shortage of developers with the skills to
         | utilize them.
        
       | alilleybrinker wrote:
       | Apparently Tailscale for a long time just used a JSON file as
       | their data storage, and moved from that to SQLite with a hot-
       | swappable backup with Litestream [0], and hey they've done fine
       | with that.
       | 
       | [0]:
       | https://securitycryptographywhatever.buzzsprout.com/1822302/...
        
       | fizx wrote:
       | This is great, but you might want to have multiple postgreses for
       | the different workloads. DB postgres != rate-limit PG != search
       | PG. It's pretty hard to optimize one DB for every workload.
        
         | goostavos wrote:
         | Counter point: most people operate on workloads so trivial that
         | they don't need optimized.
         | 
         | I think the most important line in the article is the "let's
         | see how far it gets us." It is absolutely trivial to invent
         | situations where an architecture wouldn't work well, or scale,
         | or "be optimal." It's far, far harder to just exist in reality,
         | where most things are boring, and your "bad" architecture is
         | all you ever need.
        
         | RedShift1 wrote:
         | Why? You can have multiple databases in one instance, running
         | multiple pg instances seems counterproductive?
        
           | rtheunissen wrote:
           | Maybe instance-level configuration?
        
           | mikeklaas wrote:
           | Multiple databases in postgres fundamentally share the same
           | underlying infrastructure (i.e., WAL), and so do not offer
           | much in terms of scalability or blast-radius protection
           | compared to putting all tables in the same database.
        
       | theptip wrote:
       | I'm a big fan of this approach, having built a Django monolith
       | with the standard Celery/RMQ, dabbled in Redis for latency-
       | sensitive things like session caching, and never hitting scale
       | where any of those specialized tools were actually required
       | (despite generating substantial revenue with the company).
       | 
       | One thing to note, if you use pq or another Postgres-as-queue
       | approach, you should be aware of the work you'll need to do to
       | move off it -- this pattern lets you do exactly-once processing
       | by consuming your tasks in the same DB transaction where you
       | process the side-effects. In general when using a separate task
       | queue (RMQ, SQS, etc.) you need to do idempotent processing (at
       | least once message semantics). A possible exception is if you use
       | Kafka and use transactional event processing, but it's not
       | serializable isolation.
       | 
       | This is probably a reason in favor of using a Postgres task queue
       | initially since exactly-once is way simpler to build, but just be
       | aware that you're going to need to rethink some of your
       | architectural foundation if/when you need to move to a higher-
       | throughput queue implementation.
        
         | btown wrote:
         | https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...
         | describes the benefits of the above approach.
         | 
         | Something to bear in mind is that if you have a bug or crash in
         | your task handler that causes a rollback, another worker will
         | likely try to grab the same task again, and you might end up
         | clogging all of your workers trying the same failed tasks over
         | and over again. We use a hybrid approach where a worker takes
         | responsibility for a task atomically using SKIP LOCKED and
         | setting a work-in-progress flag, but actually does the bulk of
         | the work outside of a transaction; you can then run arbitrary
         | cleanup code periodically for things that were set as work-in-
         | progress but abandoned, perhaps putting them into lower-
         | priority queues or tracking how many failures were seen in a
         | row.
         | 
         | Postgres is absolutely incredible. If you are at anything less
         | than unicorn scale, outside of analytics where columnar stores
         | are better (though Citus apparently now has a solution for this
         | - https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-.
         | ..), and highly customizable full-text search (it's hard to
         | beat Elastic/Lucene's bitmap handling), there are very few
         | things that other databases can do better if you have the right
         | indices, tuning, and read replicas on your Postgres database.
        
           | [deleted]
        
         | samwillis wrote:
         | Your whole first paragraph literally describes our situation
         | exactly, same stack and all. Classic premature optimisation.
         | 
         | It's made it so clear to me that so much of what you read on HN
         | about the latest and greatest scaling tricks are only relative
         | to a tiny tiny fraction of business.
        
           | jrumbut wrote:
           | Getting good at solving problems with relational databases is
           | a highly underrated skill.
           | 
           | They are really underutilized by many projects.
           | 
           | Not to say any other software is bad, just that keeping the
           | stack simple can help small teams move quickly and not get
           | bogged down fighting fires. Also, the paths to scale up the
           | major RDBMSes are well documented and understood. With a
           | newer service and many interacting systems in your back end
           | you end up having to be a pioneer (which takes time away from
           | implementing new features).
        
             | jaxrtech wrote:
             | Absolutely. I've also seen my fair share of horrendous
             | home-grown "ETL" programs that waste more time shuffling
             | bytes to and from database with poor ORM queries in loops,
             | that could be done with a couple half decent SQL queries.
             | 
             | Probably the most useful things for me was learning
             | relational algebra in college, and having been thrown on
             | the deep end on a team that was very SQL heavy (not
             | withstanding attempting to debug Oracle PL/SQL syntax
             | errors while pulling your hair out about a missing closing
             | parenthesis -- of which isn't the problem).
             | 
             | The usual challenge seems to be fetching data from external
             | services or performing complex business that may
             | conditionally load things -- things that can be awkward in
             | procedural SQL. At the end of the day, you're building a
             | messy ad-hoc dependency graph that is being manually
             | executed very inefficiently. Would be better to just have
             | your code just describe the dependency graph and treat each
             | value transparently as a promise/future and then have a
             | separate engine execute it.
             | 
             | Anyhow, something something monads with lipstick, I
             | digress...
        
               | KptMarchewa wrote:
               | ETL using ORM? Really?
        
           | jjice wrote:
           | FWIW, setting up a Redis server and setting up some basic
           | caching middleware is pretty straightforward in my
           | experience. Did this at my job a month ago in an afternoon.
           | 
           | I'd say the biggest overhead is adding Redis if you don't
           | already have it, and that addition's difficulty will vary
           | based on how you host Redis. We use Elasticache on AWS, so
           | just a few clicks and set it in the same VPC.
           | 
           | I guess the real question comes down to how you feel about an
           | extra moving part. Redis is probably the part of our system
           | that has had the least hiccups (very much set and forget and
           | no resource issues so far), but I can understand in the case
           | where you'd rather not add more than a DB.
           | 
           | I'd say it's just as easy to setup as Postgres. Elastic
           | search I hear is a pain, though I have no personal
           | experience.
        
             | blowski wrote:
             | The pain is not initially setting it up, it's in the
             | ongoing maintenance. Redis is one of the less painful
             | services to support, especially if using a managed version.
             | But I don't like the trend of defaulting to using Redis
             | without really justifying it.
        
               | jjice wrote:
               | You're right for sure if the service is small - Redis
               | would be overkill. I guess I'm coming at it from the
               | perspective I'm most used to where we use it for data
               | caching and session data since we have multiple servers,
               | so handling it any other way would be more work.
        
             | barrkel wrote:
             | Setting up caching usually isn't the problem, it's
             | invalidation and eviction that bites you.
        
             | smoe wrote:
             | From my view, the issue in the case is not how difficult it
             | is to setup Redis for caching (it is indeed just a couple
             | of clicks/commands), but the new issues one has to deal
             | with when resorting to caching things too prematurely,
             | instead of making the app fast enough with minimal effort.
        
         | smoe wrote:
         | > just be aware that you're going to need to rethink some of
         | your architectural foundation if/when you need to move to a
         | higher-throughput queue implementation
         | 
         | I haven't seen many if any projects that didn't require some
         | architectural rethinking over their lifetime. I have seen more,
         | that where arguably over engineered in the beginning, but then
         | never lived long enough to actually benefit from it.
         | 
         | Not saying everyone should use Postgres-as-queue for every
         | project. But for a lot of projects it is going to be much
         | harder to acquire the active user base generating the
         | throughput Postgres can't handle, than doing continuous
         | refactoring of the system to deal with the changing
         | requirements.
        
         | rattray wrote:
         | For Node folks interested in a postgres-based task queue, I
         | find graphile-worker[1] to be pretty terrific. Docs make it
         | sounds like it's only for postgraphile/postgrest but it's great
         | with any Node app IMO.
         | 
         | [1] https://www.npmjs.com/package/graphile-worker
        
         | closeparen wrote:
         | Dependencies are not equal in this regard. For example in a
         | corporate context, we have basically 1.5 people in Eastern
         | Europe maintaining Redis for 5,000 engineers. Kafka is more
         | like 15.
        
         | [deleted]
        
       | kelp wrote:
       | I kind of love this idea.
       | 
       | It reminds me of a redis use case we had at a former employer.
       | 
       | We had a cluster with a high double digit number of nodes that
       | delivered a lot of data to various external APIs.
       | 
       | Some of those APIs required some metadata along with the payload.
       | That metadata was large enough that it had made sense to cache it
       | in Redis. But over time, with growth the cluster got large
       | enough, and high volume enough that just fetching that data from
       | Redis was saturating the 10Gbps Nic on the ElastiCache instance,
       | creating a significant scaling bottleneck. (I don't remember if
       | we moved up to the 25Gbps ones or not.)
       | 
       | But we could have just as easily done a local cache (on disk or
       | something) for this metadata on each node and avoided the cost of
       | the ElastiCache and all the scaling and operational issues. It
       | would have also avoided the network round trips to Redis, and the
       | whole thing probably would have just performed better.
        
       | rkhacker wrote:
       | I am sure there is momentary thrill of achieving minimalism but
       | alas the world is not so simple anymore. I would refer the OP and
       | the community here to the paper from the creator of PostgreSQL:
       | http://cs.brown.edu/~ugur/fits_all.pdf
        
         | finiteseries wrote:
         | (2005)
        
           | recuter wrote:
           | Exactly. The title is - "One Size Fits All": An Idea Whose
           | Time Has Come and Gone
           | 
           | As is so often the case in this industry an idea comes, goes,
           | and comes back around again. Time to reevaluate.
        
         | luhn wrote:
         | I think that paper is making an argument orthogonal to OP. OP
         | is saying Postgres is a good enough solution, that the
         | advantages of simplifying the stack outweigh the disadvantages
         | of using a non-optimal database for basic use cases.
        
       | samwillis wrote:
       | I am so for this, being the sole developer in my company for the
       | last 10 years I introduced far to many "moving parts" as it grew
       | and I'm now going through the process of simplifying it all.
       | 
       | I love Redis but it's next to go, currently used for user
       | sessions and a task queue, both of which Postgres is more than
       | capable of doing [0]. Also, as a mostly Python backend, I want to
       | rip out a couple of Node parts, shrink that Docker image.
       | 
       | 0: https://news.ycombinator.com/item?id=21536698
        
       | deathanatos wrote:
       | I'm not sure what qualifies as "stateful", but
       | 
       | > _Fewer dependencies to fail and take down the service._
       | 
       | No logging? No metrics? No monitoring? (& _yes_ , you'd think
       | those shouldn't take down the stack if they went offline. And I'd
       | agree. And yet, I've witnessed that failure mode multiple times.
       | In one, a call to Sentry was synchronous & a hard-fail, so when
       | Sentry went down, that service 500'd. In another, syslog couldn't
       | push logs out to the logging service, as that was _very_ down,
       | having been inadvertently deleted by someone who ran  "terraform
       | apply", didn't read the plan, & then said "make it so"; syslog
       | then responded to the logging service being down by logging that
       | error to a local file. Repeatedly. As fast as it possibly could.
       | Fill the disk. Disk is full. Service fails.)
       | 
       | I've also seen our alerting provider have an outage _during an
       | outage we 're having_ & thus not sending pages for our outage,
       | causing me to ponder and wonder how I'd just rolled a 1 on the
       | SRE d20 and what god did I anger? Also who watches the watchmen?
       | 
       | > _A common pitfall is to introduce something like ElasticSearch,
       | only to realize a few months later that no one knows how to run
       | it._
       | 
       | Yeah I've seen that exact pit fallen into.
       | 
       | No DNS? Global Cloudflare outage == fun.
       | 
       | No certificates?
       | 
       | I've seen certs fail so many different way. Of course not getting
       | renewed, that's your table stakes "welcome to certs!" failure
       | mode. Certs get renewed but an _allegedly Semver compatible_
       | upgrade changed the defaults, and required extensions don 't get
       | included leading to the client rejecting the cert. I've seen a
       | service which watches certs to make sure they don't expire (see
       | the outage earlier in this paragraph!) have an outage (which, b/c
       | it's monitoring, wasn't customer visible) because a tool issued a
       | malformed cert (...by... default...) that the monitor failed to
       | parse (as it was malformed). Oh, and then the LE cross-signing
       | expiration took out an Azure service that wasn't ready for it, a
       | service from a third-party of ours that wasn't ready for it,
       | _and_ our CI system b /c several tools were out of date including
       | _an up to date system on Debian that was theoretically
       | "supported"..._ but still shipped an ancient crypto library
       | riddled with bugs in its path validation.
       | 
       | > _Okay fine, S3 too, but that's a different animal._
       | 
       |  _Is it?_ I 've seen that have outages too, & bring down a
       | service with it. (There really wasn't a choice there; S3 was the
       | service's backing store, & without it, the service was truly
       | screwed.)
       | 
       | But of course, all this is to say I violently agree with the
       | article's core point: think carefully about each dependency, as
       | they have a very real production cost.
       | 
       | (I've recently been considering changing my title to SRE because
       | I have done very little in the way of SWE recently...)
        
       | AtNightWeCode wrote:
       | Redis is overused in my opinion. For many requests it does not
       | beat a database for the same amount of money. There can be other
       | reasons for using a cache though. I often hear that people claim
       | that the cache "protects" the database. From my experience it is
       | more common that once the database has problems it spills over to
       | the cache. If then for instance a circuit breaker opens to the
       | cache the database will be smacked senseless.
        
         | rtheunissen wrote:
         | Often, cache is relied on so much that we are afraid to clear
         | it because no one knows what the impact will be on the
         | database. We now duplicate our data in many cases, have to deal
         | with cache invalidation, and ironically create more risk than
         | protection. Cache should be extremely selective and
         | encapsulated very well.
        
           | AtNightWeCode wrote:
           | Most projects I work on use a lot of edge caching but it is
           | not business critical. It is for speed. It is a problem if
           | the design depends on both a cache and a database if the
           | cache is dependent on the database.
        
       | rglover wrote:
       | Personally staking my own future on this "less is more" approach
       | having seen some serious horror flicks in terms of
       | app/infrastructure stacks the past few years.
       | 
       | What continues to surprise me: a lot of time and money has been
       | or is being wasted on reinventing the wheel (speaking
       | specifically about webdev here).
       | 
       | There are a ton of great tools out there that are battle-tested
       | (e.g., my favorite find of late as someone just digging into
       | hand-rolled--meaning no k8s/docker/etc--infrastructure is
       | haproxy).
        
         | xupybd wrote:
         | Simple manually deployed docker images have been a great win
         | for us.
         | 
         | You get to declare all your dependencies in the docker build.
         | All config is in one .env file.
         | 
         | Installs and roll backs are trivial.
         | 
         | Setting up new dev environments is easy.
        
           | justin_oaks wrote:
           | What do you mean when you say "manually deployed docker
           | images"?
           | 
           | It could mean that you build the images on one machine, then
           | export the images as a tar files, copy those to the
           | destination server, and then import the images.
           | 
           | Or it could mean that you copy the Dockerfile and any
           | necessary context files to the destination server and run the
           | Docker build there.
           | 
           | Or it could mean you still use a Docker registry (Docker Hub,
           | AWS ECR, or a self-hosted registry), but you're manually
           | running docker or docker-compose commands on the destination
           | server instead of using an orchestrator like Kubernetes.
           | 
           | As for me, I've done pretty well with that last option. I
           | still use either an in-house Docker registry or AWS ECR, but
           | I haven't needed anything like Kubernetes yet.
        
         | ftlio wrote:
         | I've seen millions of dollars spent on infrastructure to
         | support what Heroku could do for a few thousand a month. Not to
         | mention the egregious dev heartache caused by having to work
         | against it. Anyone who argued it was a waste of time and money
         | just "didn't get it" apparently.
         | 
         | I'm a huge fan of all the cool container stuff, queues, stream
         | processing, all the weird topologies for apps built with node,
         | Golang. I'm with it, kids. But for an MVP, just use Heroku,
         | GAE, a Droplet with SCP for god sakes.
         | 
         | If you need to do something more complicated, growth will tell
         | you.
        
           | pphysch wrote:
           | > If you need to do something more complicated, growth will
           | tell you.
           | 
           | +1. Needing to refactor & scale your infrastructure to enable
           | more growth is almost always a "good problem to have".
           | 
           | You've steadily grown to $100mm revenue and your backend is
           | starting to show it because you prioritized productivity over
           | premature optimizations? Oh no, the world is ending! (said no
           | one ever)
        
           | VWWHFSfQ wrote:
           | I worked for a very profitable small internet biz whose
           | entire deployment was git-archive + rsync. I never had to
           | troubleshoot that thing even once. Now it seems like
           | everybody is playing golf trying to see how many AWS services
           | they can use to unpack a gzip on a server.
        
       | rlawson wrote:
       | There are so many benefits of keeping things as simple as
       | possible.                 - easier troubleshooting       - easier
       | to maintain documentation       - quicker onboarding of new devs
       | - easier to migrate to new hosting if needed       - quicker to
       | add features (or decide not to add)
        
       | baggy_trough wrote:
       | The next level up of this approach is running everything on one
       | box.
        
       | pnathan wrote:
       | my take on this looks similar, but I'll have more going on:
       | 
       | 1. kubernetes. 2. postgres. 3. application.
       | 
       | where the kubernetes bit is used for the more integration test
       | side of things.
       | 
       | a lot of machinery can be employed that gets in the way of "wtf
       | just happened".
        
       | eezing wrote:
       | I was worried about how long the initial indexing would take for
       | a recent full text search implementation in Postgres.
       | 
       | Took less than a second on a few hundred thousand rows.
       | 
       | Naive and simple is good enough for now.
        
       | cjfd wrote:
       | I think this is the right idea. The pendulum between having as
       | many dependencies as possible and having no dependencies at all
       | has flung way too far in the 'as many dependencies as possible'
       | side. It is a major PITA when yet another random component
       | breaks. Let us say that A can be done in B in, let us say, three
       | man weeks. I would say it is worth it. The advantage is that A
       | will never break because it is not there. Note that A also may
       | break 3 years from now when everybody who knows anything about A
       | has left the company.... B is now used in more places so people
       | are more likely to have been forced to learn it so when the
       | emulation of A break there is a better chance that people will
       | know what to do. I see mostly advantages.
        
       | mberning wrote:
       | I was a bit disappointed. I though they were going to implement
       | their entire system using stored procedures. That would be
       | "single dependency". As it stands it is "all my app tier
       | dependencies and postgres.
        
       | [deleted]
        
       | M0r13n wrote:
       | This is why I love Ansible. As a DevOps enigneer I do not design
       | or implement complex systems or programs. But I am responsible
       | for the reliability of our systems and infrastructure. And
       | Ansible is just pleasant to use for the same reasons stated by
       | the author:
       | 
       | - a single packaged without any additional dependencies - no
       | client side software - pure SSH - simple playbooks written in
       | only YAML
       | 
       | Focusing on simplicity and maintainabilty has helped me deliver
       | reliable systems.
        
       | chishaku wrote:
       | What other examples are there of "single dependency stacks"?
       | 
       | This article is really about the versatility and reliability of
       | postgres.
       | 
       | And I'm all in agreement.
       | 
       | Reminiscent of:
       | 
       | https://www.craigkerstiens.com/2017/04/30/why-postgres-five-...
       | 
       | https://webapp.io/blog/postgres-is-the-answer/
       | 
       | http://rachbelaid.com/postgres-full-text-search-is-good-enou...
       | 
       | http://boringtechnology.club/
       | 
       | As much as HN could lead you astray with the hype of this and
       | that tech, articles like the above are some of the most
       | consistently upvoted on this website.
        
         | chubot wrote:
         | Also:
         | 
         | https://sive.rs/pg2 - Simplify: move code into database
         | functions
         | 
         | https://sive.rs/pg - PostgreSQL example of self-contained
         | stored procedures
         | 
         | some linked examples:
         | https://github.com/sivers/store/tree/master/store/functions
         | 
         | I like this idea in theory ... although it would cause me to
         | need to know a lot more SQL, which is a powerful but hostile
         | language :-/
         | 
         | I care about factoring stuff out into expressions / functions
         | and SQL fails in that regard ...
         | 
         | https://www.scattered-thoughts.net/writing/against-sql/
         | 
         | It's hard to imagine doing this with a ton of duplication. I
         | have written SQL by hand and there are probably more confusing
         | corners than in shell, which is saying a lot!
        
       | np_tedious wrote:
       | I have nothing to disagree with here, but it's worth noting that
       | his company Crunchy Data are themselves a postgres provider. So
       | they, more then most, have the chops and incentive to do a great
       | deal in postgres alone.
       | 
       | https://www.crunchydata.com/
        
       | 0xbadcafebee wrote:
       | > 1 Okay fine, S3 too, but that's a different animal.
       | 
       | I think people forget that AWS S3 isn't immutable. Unlike an EBS
       | volume, it is impossible to "snapshot" S3 the way you can a
       | database. There are arbitrary global limitations outside the
       | scope of your control, and a dozen different problems with trying
       | to restore or move or version all the things about buckets that
       | aren't objects (although the objects too can be problematic
       | depending on a series of factors).
       | 
       | If you want real simplicity/repeatability/reliability, but have
       | to use S3, host your own internal S3 service. This way you can
       | completely snapshot both the metadata and block devices used by
       | your S3 service, making it actually immutable. Plus you can do
       | things like re-use a bucket name or move it to any region without
       | worrying about global conflicts. (All of that is hard/expensive
       | to do, however, so you should just use AWS S3 and be very careful
       | not to use it in a way that is unreliable)
        
         | [deleted]
        
         | simonw wrote:
         | My guess is that they use S3 mainly for things like backups,
         | where you write once to a brand new key.
         | 
         | I'd be surprised if they were using mutable S3 objects that
         | constantly get updated in-place. They have PostgreSQL for that!
        
           | [deleted]
        
       | tulstrup wrote:
       | This idea is super cool.
       | 
       | I am not sure it necessarily has to be just one single
       | dependency, but keeping the number of dependencies as low as
       | possible makes a lot of sense to me. At least the overhead of
       | introducing any given new dependency should be taken into serious
       | consideration and held against the concrete benefits that will be
       | gained from it.
       | 
       | I wrote a blog post on a very similar subject, essentially all of
       | the same arguments, but targeted more towards the dependencies
       | and abstractions found within a given systems code structure and
       | application architecture.
       | 
       | If you are interested, you can read it here:
       | https://betterprogramming.pub/avoiding-premature-software-ab...
        
       | wwweston wrote:
       | > normally I'd push all rate limiting to Redis. Here, we rate
       | limit in memory and assume roughly uniform distribution between
       | server nodes.
       | 
       | Dumb question: what do they mean by rate limiting in memory vs
       | via Redis? Does that mean keeping track of request origins+counts
       | using those storage mechanisms, or something else?
        
         | winrid wrote:
         | You can use an in memory LRU cache of request orgin+count. You
         | can also periodically take that data and do an INCREMENT
         | against your DB to get fairly scalable rate limiting.
        
         | VWWHFSfQ wrote:
         | I'm guessing process-local memory. Like a Python dict or
         | something
        
       | KwisaksHaderach wrote:
       | Many can even get away with less: sqlite.
       | 
       | One less process to worry about.
        
       | chucke wrote:
       | operationally, makes sense. but the inevitable moment (if you
       | survive) you need to migrate to smth else depending on a
       | different queue system, it'll be a pain to retrofit the code
       | relying on db-level transactions and locks.
        
         | bcrosby95 wrote:
         | Except it's not inevitable. We have a few 15+ year old
         | profitable projects that are still working fine on RDBMS backed
         | queues.
        
         | jjav wrote:
         | > but the inevitable moment (if you survive)
         | 
         | It's probably not inevitable. Simple is fast, and fast can
         | scale you really far.
         | 
         | Sure, if you end up being google-scale then yeah, the world
         | changes. But there's very few companies that large, yours is
         | probably not growing to that size.
         | 
         | Over a decade ago I joined a mid-sized startup and took
         | ownership of a service using MySQL. The first urgent warnings I
         | was given was that they had to migrate to cassandra ASAP
         | because soon MySQL couldn't possibly handle it.
         | 
         | I took a look at the traffic and the growth curve and projected
         | customer adoption. And then put that project on hold, no need
         | yet. Company went on to an IPO, grew a lot, pretty successful
         | in its industry. And years later when I left, it was still
         | going strong on MySQL with no hint of approaching any
         | limitations.
        
         | xupybd wrote:
         | The if you survive but is key. If you survive to the point you
         | need to scale like this you will no doubt have more resources
         | available. Do what you can to get going now. Solve future
         | problems as they come.
        
         | Nextgrid wrote:
         | This only makes sense if the effort to migrate is more than the
         | accumulated effort of working with and maintaining that
         | solution from the start.
        
       | VWWHFSfQ wrote:
       | Note that you can approximate rate-limiting in Redis with
       | Postgres' UNLOGGED tables [0]. They're a lot faster than regular
       | tables since they don't write to the WAL. Of course, if you
       | restart your server or if it crashes then you lose the table
       | contents. But for stuff like rate-limiting you probably don't
       | care. And unless you're using some kind of persistence in Redis
       | it happens there also.
       | 
       | I tend to run this kind of stuff on a separate PG server so that
       | the query velocity doesn't affect more biz-critical things.
       | 
       | [0] https://www.postgresql.org/docs/current/sql-
       | createtable.html...
        
         | timando wrote:
         | You don't lose data on a clean restart.
        
         | jamil7 wrote:
         | Nice, Redis is awesome but definitely something I've seen
         | pulled in prematurely all the time.
        
         | hcarvalhoalves wrote:
         | > But for stuff like rate-limiting you probably don't care.
         | 
         | I guess you would care in this scenario, otherwise you have
         | cascading failures (something makes pg crash, and the lack of
         | rate limit allows the abuse to continue).
         | 
         | So implementing a rate limiter separate from the rest may make
         | sense too. I like their idea of doing it on memory and keeping
         | the load balanced, as it doesn't rely on any dependency.
        
         | simonw wrote:
         | I hadn't seen UNLOGGED tables before, that's a really neat
         | trick, thanks.
        
       | pphysch wrote:
       | There's definitely potential to go too far into monolith
       | territory and misinterpret how simple your architecture actually
       | is.
       | 
       | An example: Django backed by Postgres. I tend to view this as 1
       | architectural unit, i.e. Postgres is wholly embedded in Django. I
       | am under no illusion that I have _both_ a Django project and a
       | PostgreSQL instance. I have a Django-backed-by-Postgres. I _can_
       | have that PostgreSQL instance be a standalone interface, but that
       | means increasing my architectural units from 1 to 2. Instead, if
       | I want to integrate with Django 's raw tables, I'm going to do it
       | on Django's terms (via a custom HTTP API) rather than fighting
       | the ORM over who gets to DBA the database. Bad for performance?
       | No doubt. We'll worry about that when we get there.
       | 
       | Yes, you can run a web app server directly out of Postgres
       | without an additional "app layer" like Django (Crunchy has some
       | cool tools for this). But should you?
       | 
       | To be clear, I'm a big fan of KISS, just skeptical of false
       | minimalism.
        
         | justin_oaks wrote:
         | Agreed. This quote seems relevant: "Everything should be made
         | as simple as possible, but not simpler."
         | 
         | The article talks about using rate limiting using Redis and
         | dropping it in favor of handling it on each server node and
         | assuming uniform distribution of requests. Doing that is a
         | trade-off of precision rate-limiting for a simpler
         | architecture.
         | 
         | That may be a good trade-off, but only if you can get away with
         | it. If they were required to have more precise rate-limiting
         | then the simpler architecture would not have been possible.
         | 
         | In my own work, I used Memcached instead of Redis for rate
         | limiting data. The applications were coded to fall back to the
         | per-node rate limiting if Memcached weren't reachable.
         | Memcached may have been another dependency, but it was one of
         | the less troublesome dependencies. I never experienced a
         | problem with it in production. The fallback behavior meant that
         | we didn't even need Memcached in a dev environment.
         | 
         | I guess my point is this: Not all dependencies are as
         | troublesome as others.
        
         | KwisaksHaderach wrote:
         | What's the crunchy tool for this?
        
           | craigkerstiens wrote:
           | I believe they're referring to some tools like pg_tileserv
           | which gives you a turnkey tile server on top of PostGIS. As
           | it stands today we don't have anything to automatically run
           | that app from Postgres itself (but stay tuned we might be
           | launching something around that in just a few weeks).
           | Tileserv is in an interesting category like many other turn
           | key APIs or services (like PostgREST or Postgraphile) on top
           | of a database, but I don't view them as different than say
           | running a Django app for example.
        
       ___________________________________________________________________
       (page generated 2022-02-09 23:00 UTC)