hngopher.com

       [HN Gopher] In defense of simple architectures
       ___________________________________________________________________
        
       In defense of simple architectures
        
       Author : abzug
       Score  : 209 points
       Date   : 2022-04-06 19:04 UTC (3 hours ago)
        
 (HTM) web link (danluu.com)
 (TXT) w3m dump (danluu.com)
        
       | lifefeed wrote:
       | I was interviewing for software jobs recently, and while I was
       | studying up on the "system design" portion I kept circling around
       | the same insight that Dan Luu writes about so well here.
       | 
       | I would sit down at an interview and try to create these "proper"
       | system designs with boxes and arrows and failovers and caches and
       | well tuned databases. But in the back of my mind I kept thinking,
       | "didn't Facebook scale to a billion users with PHP, MySQL, and
       | Memcache?"
       | 
       | It reminds me of "Command-line Tools can be 235x Faster than your
       | Hadoop Cluster" at https://adamdrake.com/command-line-tools-can-
       | be-235x-faster-... , and the occasional post by
       | https://rachelbythebay.com/w/ where she builds a box that's just
       | fast and with very basic tooling (and a lot of know-how).
        
       | aadvark69 wrote:
       | Simple architectures work well, until they don't. A good example
       | is ye olde ruby on rails monolith. Dead simple to set up and
       | iterate quickly, but once you reach a certain organization and/or
       | codebase size, velocity starts to degrade exponentially
        
       | aidos wrote:
       | In terms of the choices they're unsure about; I'd say it's best
       | to stay away from Celery / RabbitMQ if you don't really need it.
       | For us just using RQ (Redis backed queue) has been a lot less
       | hassle. Obviously it's all going to depend on your scale, but
       | it's a lot simpler.
       | 
       | RE the sqlalchemy concern; you do need to decide on where your
       | transactions are going to be managed from and have a strict rule
       | about not allow functions to commit / rollback themselves.
       | Personally I think that sqla is a great tool, it saves a lot of
       | boilerplate code (and data modelling and migrations are a
       | breeze).
       | 
       | But overall the sentiments in this article resonate with my
       | experience.
        
       | bob1029 wrote:
       | I think the biggest problem for most developers is not
       | understanding what one computer can actually do and how reliable
       | they are in practice.
       | 
       | Additionally, understanding of how tolerant 99% of businesses are
       | to real-world problems that could hypothetically arise can help
       | one not frustrate over insane edge case circumstances. I suspect
       | a non-zero number of us have spent time thinking about how we
       | could provide deterministic guarantees of uptime that even
       | unstoppable cosmic radiation or regional nuclear war couldnt
       | interrupt.
       | 
       | I genuinely hope that the recent reliability issues with cloud &
       | SAAS providers has really driven home the point that a little bit
       | of downtime is almost never a fatal issue for a business.
       | 
       | "Failover requires manual intervention" is a _feature_ , not a
       | caveat.
        
         | nomemory wrote:
         | Some people don't even realise how much traffic a simple web
         | app with server side rendering (decently written), hosted on an
         | average dedicated server can hold... They dont need cloud,
         | autoscaling, microservices, kafka, event driven architectures,
         | etc.
         | 
         | We've lost our way in the masked marketing the cloud providers
         | are creating to help us solve problems we will never encounter,
         | unless we are building the next Netflix or Facebook.
        
           | bob1029 wrote:
           | If you want to get an idea of where things are at right now,
           | this is a good place to start looking:
           | 
           | https://www.techempower.com/benchmarks
           | 
           | If you just need plaintext services, something like ~7
           | million requests per second is feasible at the moment.
           | 
           | By being clever with threading primitives, you can preserve
           | that HTTP framework performance down through your business
           | logic and persistence layers too.
        
         | cameronh90 wrote:
         | These requirements don't come out of nowhere. Normally they
         | come from:
         | 
         | 1. CEOs/whoever that don't listen to how much additional
         | complexity it is to build a system with extremely high uptime
         | and demand it anyway.
         | 
         | 2. Developers with past experience that systems going down
         | means they get called in the middle of the night.
         | 
         | 3. Industry expectations. Even if you're a small finance
         | company where all your clients are 9-5 and you could go down
         | for hours without any adverse impacts, regulators will still
         | want to see your triple redundant, automated monitoring, high
         | uptime, geographically distributed, tested fault tolerant
         | systems. Clients will want to see it. Investors will check for
         | it when they do due diligence.
         | 
         | Look at how developers build things for their own personal
         | projects and you'll see that quite often they're just held
         | together with duct tape running on a single DO instance. The
         | difference is, if something goes wrong, nobody is going to be
         | breathing down their neck about it and nobody is getting fired.
        
         | _jal wrote:
         | Past the proof of concept, "developers" should frankly not be
         | making these decisions. People who understand systems and
         | failure analysis should be. You might have devs with that
         | experience, but they're comparatively rare.
         | 
         | As far as complexity... if you get big enough, you can't avoid
         | it. My meta-rule is to only accept additional complexity if
         | solving the issue some other way is impractical.
         | 
         | It is almost always far, far easier to add additional moving
         | parts to your production environment than it is to remove them
         | after they're in use.
        
         | trasz wrote:
         | Also, those complicated architectures are often quite
         | unreliable anyway - just in ways that don't show in metrics.
         | Slack comes to mind: not only its functionality is poor
         | compared to eg IRC, but it fails in hilarious ways, eg showing
         | duplicated messages, or not showing them at all. Another
         | example is YouTube - the iOS app gets confused when displaying
         | an ad, which results in starting the playback at a wrong time
         | offset. I guess it's because companies like those don't care
         | about actual reliability - what they do care about is
         | availability.
        
           | joshlemer wrote:
           | How could you say that Slack has poor functionality compared
           | to IRC?
        
             | exfascist wrote:
             | When you type something into IRC that message shows up in
             | the log and every online users client pretty reliably.
             | Furthermore the high degree of diversity among clients
             | provides a pretty extreme amount of client side
             | functionality that Slack _completely_ lacks (scripting is a
             | huge one.)
        
               | spicybright wrote:
               | I love irc, but is is just silly.
               | 
               | Slack has much better history because you don't need to
               | have been online when messages are sent to log them.
               | Slack is absolutely more reliable in this regard.
               | 
               | IRC is easy to script because the protocol is so simple.
               | But you leave so much on the table for that cost.
               | 
               | Obviously if your use case is text only that you don't
               | care about being persistent and you lean heavily on
               | scripting to get things done then IRC will do the trick.
               | Otherwise it's such a crutch to do anything besides
               | beyond that.
        
               | exfascist wrote:
               | IRC has logs for history, they're fast and you can run
               | your own logger to control the retention policy if you
               | want. These heavy weight IM tools have extremely short
               | log retention (months) and searching through the logs is
               | _extremely_ slow and frustrating IME.
        
               | zie wrote:
               | Slack is not instantly "better" than IRC, it's just a
               | different approach to the chat problem and it's arguably
               | more approachable for people that don't want to learn
               | about the chat space.
               | 
               | Logging is just different between the two.
               | 
               | For IRC, logging is outside the scope of the IRC
               | protocol. Anyone can log anything anytime anywhere with
               | whatever policies and procedures they want. This usually
               | leads to each channel/project having some "official" log
               | of the channel somewhere, using whatever they feel is
               | good for them.
               | 
               | Slack on the other hand centralizes the logs, which
               | removes lots of control into the administrators/slack
               | developers.
               | 
               | So Slack's logs are likely easier to find, but that
               | doesn't necessarily make them easier to use.
               | 
               | Persistency is also just different, IRC makes it your
               | problem, but it's a solved problem if you care about it.
               | irccloud.com and sr.ht both offer persistence in
               | different ways as two differing examples to the problem.
               | 
               | Slack of course centralizes the problem and removes some
               | control.
               | 
               | I personally think Slack and approaches like it (I prefer
               | MatterMost) are great for internal things where
               | administrators need central control of stuff for various
               | reasons. For public things, I think Slack is a bad
               | solution, and something like IRC or Matrix is a better
               | solution to the problem of public chat.
        
               | ryukafalz wrote:
               | The versatility of clients is indeed a huge benefit of
               | IRC. I used to use IRC at work and always had my Weechat
               | window split with a small pane up top showing either my
               | highlights or a channel I needed to monitor at the time.
               | With Slack, you can't do that, which means you have to
               | repeatedly click between channels if you need to pay
               | attention to multiple at a time.
        
               | diroussel wrote:
               | You can use split view to keep an eye on another channel.
               | But another window would be better.
               | 
               | https://slack.com/intl/en-
               | gb/help/articles/4403608802963-Ope...
        
           | snvzz wrote:
           | Slack comically uses gigabytes of RAM and plenty of CPU time
           | in the client side.
        
             | musicale wrote:
             | It's a nice demonstration of the efficiency of web apps vs.
             | native apps.
        
               | sitkack wrote:
               | It really has nothing to do with that. The slack client
               | is just written poorly.
        
             | ChrisMarshallNY wrote:
             | Obligatory I Am Developer toon: https://twitter.com/iamdevl
             | oper/status/1072503943790497798/p...
        
               | yakshaving_jgt wrote:
               | I wonder who he stole that joke from.
        
               | ChrisMarshallNY wrote:
               | I would assume bruised_blood, but I can't [easily] find
               | the original, so I posted that.
        
               | yakshaving_jgt wrote:
               | No, sure. That's fair enough.
               | 
               | My point simply being that iamdevloper is a notorious
               | joke thief and is especially unsporting about it when
               | it's pointed out.
        
             | WJW wrote:
             | Wtf are you doing with it? My slack instance (on linux) is
             | resting around 300 MB resident set size and 0% cpu. 300 MB
             | is still a lot for a chat app, but it is definitely not
             | gigabytes.
        
               | guelo wrote:
               | Make sure you're counting all the sub-processes it spawns
               | (at least on Mac, don't know about linux)
        
               | dan-robertson wrote:
               | If you just add up memory usage for subprocesses you are
               | likely to over count due to shared memory. The number you
               | typically want to add up in Linux is 'proportional set
               | size' which is, I think, the sum over every page of the
               | process's memory of page_size / number of processes which
               | can access the page. I don't know what happens if you
               | mmap some physical memory twice (I think some newish Java
               | GC does this).
        
         | [deleted]
        
         | lkxijlewlf wrote:
         | > I think the biggest problem for most developers is...
         | 
         | ... reading blogs and such where some loud mouth is telling
         | them about so called "best practices" and so they bring that
         | back to work with them.
         | 
         | There are not enough loud mouths telling people to keep it
         | simple (until you can't or know better).
        
       | dang wrote:
       | What's the year on this? anybody know?
       | 
       | Normally I check the Internet Archive, but
       | https://web.archive.org/web/*/https://danluu.com/simple-arch....
        
         | Beltalowda wrote:
         | Based on Dan's Twitter, March 2022:
         | https://twitter.com/danluu/status/1501644166983421953
         | 
         | That links to the original on wave.com, dated March 9th this
         | year.
        
         | Jtsummers wrote:
         | The "previous" article at the bottom is the most recent article
         | in his archive, which was apparently published in March 2022.
         | So I'm guessing this year, and either this month or last month.
         | But the archive doesn't seem to have been updated yet with this
         | article.
        
           | dang wrote:
           | Ah ok - it's new then. Thanks to both of you!
        
         | [deleted]
        
       | gherkinnn wrote:
       | At the risk of making an ad-hominem attack, I found this website
       | unreadable.
       | 
       | Minimalism is fine. But there comes a point when there's so
       | little, it is nothing. danluu.com is a bucket of sand facing an
       | overbuilt cathedral.
        
         | chubot wrote:
         | Reader mode in your browser goes a long way, I think they all
         | have it now
        
         | dan-robertson wrote:
         | You can read the same content here:
         | https://www.wave.com/en/blog/simple-architecture/
        
         | Beltalowda wrote:
         | I set a user style in Stylus for danluu.com:
         | body { font: 16px/1.6em sans-serif; max-width: 50em; margin:
         | auto; }
         | 
         | Can even add it manually in the inspector if you want.
        
       | sydthrowaway wrote:
       | What is Wave?
        
       | taeric wrote:
       | My favorite trap in all of this, is that this thinking will fail
       | most tech interviews. It is incredibly frustrating.
        
         | winrid wrote:
         | Yep. Failed an interview because I used EJS (SSR) and Node to
         | build a simple Twitter in 30mins. The interviewer saw that it
         | was three files and did not seem impressed.
         | 
         | I guess they wanted me to use lots of little components in an
         | SPA which I did in my day job, but it didn't seem nessisary for
         | the task...
        
           | syngrog66 wrote:
           | 3 files? "Luxury!"
           | 
           | I could implement a Twitter in 1 Python or Go file, hosted on
           | 1 machine
           | 
           | granted its concurrent user capacity and traffic load
           | capacity would be insufficient for actual Twitter. but all
           | the basics would work, in the small
        
         | mkl95 wrote:
         | I guess the only thing you can do is avoid those places. Last
         | time I checked Wave were on a hiring spree.
        
         | wanda wrote:
         | I think that probably says more about the tech companies than
         | anything else.
        
         | bob1029 wrote:
         | Trap or integrated win-win?
         | 
         | We use one of these "aggressively simple" architectures too. At
         | this point, I would quit my job instantaneously if I had to
         | even look at k8s or whatever the cool kids are using these
         | days.
        
           | adra wrote:
           | Man, kubernetes is so much easier than the smattering of crap
           | that you have to jungle together before it. Puppet and co? No
           | thanks. Terraform? It's fine, but only a part of a CI/CD
           | picture. If you think the alternatives are better, I really
           | have to wonder how much of the trenches crap that people in
           | your org deal with regularly that you're insulated from.
           | That, or you're a release-quarterly kinda company?
        
             | throw0101a wrote:
             | > _Puppet and co? No thanks._
             | 
             | Puppet? _Luxury_. I started my configuration management
             | journey with cfengine. And the folks that I first heard CM
             | about started with Makefiles:
             | 
             | * http://www.infrastructures.org/papers/bootstrap/bootstrap
             | .ht...
             | 
             | * https://www.usenix.org/legacy/publications/library/procee
             | din...
        
             | hajhatten wrote:
             | We're using Nomad + Consul + a custom little cli and I
             | would never go back to K8s from this.
             | 
             | Not a yaml document in sight.
        
               | DenseComet wrote:
               | Nomad is pretty great for a lot of things, especially
               | self hosted. The only reason I prefer k8s is the
               | ecosystem. Even though there are standardized specs like
               | CSI, they were written with k8s in mind, so some drivers
               | are completely broken on Nomad. Also, most cloud
               | providers offer managed k8s, but very few offer managed
               | Nomad.
        
             | bob1029 wrote:
             | We wrote our own tools for most things. Our build is a
             | single dotnet publish command, followed by copying the
             | output to an S3 bucket for final consumption.
             | 
             | That output is 100% of what you need to run our entire
             | product stack on a blank vm.
             | 
             | Monolithic pays for itself in so many ways. Sqlite and
             | other in-process database solutions are a major factor in
             | our strategy.
        
           | WrtCdEvrydy wrote:
           | > look at k8s or whatever the cool kids are using these days.
           | 
           | I'm fine with complex architecture and would actually welcome
           | someone choosing something complex but the issue is that we
           | have perverse incentives at work to introduce stuff just to
           | pad our resume.
           | 
           | Kubernetes was designed for companies deploying thousands of
           | small APIs/applications where management is a burden. I've
           | seen companies that deploy 3 APIs running Kubernetes and
           | having issues...
        
       | reggieband wrote:
       | I understand his point but I actually think micro-services can be
       | simpler than monoliths.
       | 
       | Even for his architecture, it sounds like they have an API
       | service, a queue and some worker processes. And they already have
       | kubernetes which means they must be wrapping all of that in
       | docker. It seems like a no-brainer to me to at least separate out
       | the code for the API service from the workers so that they can
       | scale independently. And depending on the kind of work the
       | workers are doing you might separate those out into a few
       | separate code bases. Or not, I've had success on multiple
       | projects where all jobs are handled by a set of workers that have
       | a massive `switch` statement on a `jobType` field.
       | 
       | I think there is some middle ground between micro-services and
       | monoliths where the vast majority of us live. And in our minds
       | we're creating these straw-man arguments against architectures
       | that rarely exist. Like a literal single app running on a single
       | machine vs. a hundred independent micro-services stitched
       | together with ad-hoc protocols. Micro-services vs. monoliths is
       | actually a gradient where we rarely exist at either ludicrous
       | extreme.
        
       | calpaterson wrote:
       | > GraphQL libraries weren't great when we adopted GraphQL (the
       | base Python library was a port of the Javascript one so not
       | Pythonic, Graphene required a lot of boilerplate, Apollo-Android
       | produced very poorly optimized code)
       | 
       | What do people use instead of Graphene? Strawberry?
        
         | fernandogrd wrote:
         | There is also ariadne
        
       | scrubs wrote:
       | Nah, I don't much like the tone of this article. Not at all.
       | 
       | The engineering message should be: keep your architecture as
       | simple as possible. And here are some ways (to follow) on how to
       | find that minimal and complete size 2 outfit foundation in your
       | size 10 hoarder-track-suite-eye-sore.
       | 
       | Do we really need to be preached at with a warmed over redo of
       | `X' cut it for me as a kid so I really don't know why all the
       | kids think their new fangled Y is better? No we don't.
       | 
       | If you have stateless share nothing events your architecture
       | should be simple. Should or could you have stateless share
       | nothing even if that's not what you have today? That's where we
       | need to be weighing in.
       | 
       | Summary: less old guy whining/showing-off and more education.
       | Thanks. From the Breakfast club kids.
        
       | endisneigh wrote:
       | How far can you get with a single Postgres instance on a single
       | machine? I know things like cockroach and citus existence but
       | generally Postgres isn't sharded as far as I know.
        
         | zozbot234 wrote:
         | Postgres supports sharding out of the box. The documentation
         | tells you how to do it, using foreign data wrapper and table
         | partitioning.
        
         | dan-robertson wrote:
         | You can scale up that one machine a lot. If you start with a
         | normal sized machine you have a lot of overhead in increasing
         | ram/cpu on that machine (eg you could start with say 16 cores
         | and 100G ram or less and scale up to like 2TB ram and 64/128
         | cores). There's also runway for scaling things by eg shooting
         | down certain long-running queries that cause performance
         | problems or setting up read replicas.
         | 
         | So even if you're a bit worried about scaling it, you can at
         | least feel the problems are far away enough that you shouldn't
         | care until later.
        
         | zie wrote:
         | Pretty far!
        
           | endisneigh wrote:
           | How far was exactly? Like tps for reads and writes with what
           | specs?
           | 
           | I've been looking for real world performance.
        
             | zie wrote:
             | That's complicated based on workload, etc. A single PG node
             | will obviously never scale to Google or Facebook levels.
             | 
             | Attend a PG conference and you will run into plenty of
             | people running PG with similar use cases(and maybe similar
             | loads) to you.
             | 
             | I can say we run a few hundred concurrent users backed by
             | PG on a small to medium sized VPS without issues. Our DB is
             | in the 3 digit GB range on disk, but not yet TB range.
        
             | bpicolo wrote:
             | Hundreds of thousands of reads per second was pretty doable
             | even back in 2014-2015 era.
             | 
             | You can get a 60TB NVMe instance with 96 cores these days -
             | https://aws.amazon.com/ec2/instance-types/i3en/. Relational
             | databases just scream on the dang things.
        
       | AceJohnny2 wrote:
       | > _one major African market requires we operate our "primary
       | datacenter" in the country_
       | 
       | What country could that be? That sounds challenging.
        
       | surfer7837 wrote:
       | Just boils down to not optimising until you need to. Start with a
       | 3 tier web app (unless your requirements lead you to another
       | solution), then start with read replicas, load balancing,
       | sharding, redis/RabbitMQ etc
        
         | zrail wrote:
         | Realistically almost every web app can start as a one-tier web
         | app that uses SQLite as a data store and serves mostly HTML.
        
           | [deleted]
        
           | a9h74j wrote:
           | I have a dumb question ...
           | 
           | In almost all performance areas -- gaming, PCs, autos, etc --
           | there are usually _whole publications_ dedicated to
           | performing benchmarks and publishing those results.
           | 
           | Are there any publications or sites which implement a few
           | basic applications against various new-this-season "full
           | stacks" or whatnot, and document performance numbers and
           | limit-thresholds on different hardware?
           | 
           | Likewise, there must be stress-test frameworks out there. Are
           | there stress-test and scalability-test third-party services?
        
             | zie wrote:
             | Fossil SCM is a great example of a sqlite application that
             | has stood the test of time. I don't know what sqlite.org's
             | traffic is like, but it's not tiny and it runs on a tiny
             | VPS without issue(and has for years now).
        
             | SpikeMeister wrote:
             | TechEmpower has benchmarks for different web stacks:
             | https://www.techempower.com/benchmarks/
        
       | ryanbrunner wrote:
       | I think especially for small teams starting out, complex
       | architecture can be a huge trap.
       | 
       | Our architecture is extremely simple and boring - it would
       | probably be more-or-less recognizable to someone from 2010 - a
       | single Rails MVC app, 95+% server-rendered HTML, really only a
       | smattering of Javascript (some past devs did some stuff with
       | Redshift for certain data that was a bad call - we're in the
       | process of ripping that out and going back to good old Postgres)
       | 
       | Our users seem to like it though, and talk about how easy it is
       | to get set up. Looking at the site, the interactions aren't all
       | that different from what we would build if we were using a SPA.
       | But we're just 2 developers at the moment, and we can move faster
       | than much larger teams just because there's less stuff to contend
       | with.
        
         | woah wrote:
         | That doesn't sound like it's really any simpler than a json API
         | server (written in node, python, go, or anything else), and a
         | SPA. Maybe the lesson is "build with what you know if you want
         | to go fast".
        
           | ryanbrunner wrote:
           | In my experience SPAs bring a lot of headaches that you just
           | don't really need to think about with traditional HTML.
           | Browser navigation, form handling, a lot of accessibility
           | stuff comes out of the box for free, and there's one source
           | of truth about what makes a particular object valid or how
           | business logic works (which is solvable in the SPA world but
           | brings a lot of complexity when you need to share logic
           | between the client and the server, especially when they're in
           | different languages).
           | 
           | Frankly out of all the things that make our architecture
           | simple and efficient, I would say server rendered HTML is by
           | far the biggest one.
        
             | woah wrote:
             | Probably depends on the requirements. If the product should
             | basically feel like a static web page, and you are OK
             | making design and product decisions that work easily in
             | that paradigm, then a server side framework built to make
             | static web pages is going to be simpler.
             | 
             | If you have product or design requirements that it should
             | feel more dynamic like a native app, then trying to patch
             | that on top of a static webpage might get messy.
        
               | treis wrote:
               | IMHO the important thing is where your data is. If can
               | all be client side then write a SPA. If it's on the
               | server then the more you do on the server the better.
               | 
               | Returning HTML and doing a simple element replace with
               | the new content is 99.9% indistinguishable from a SPA.
        
           | ggpsv wrote:
           | There is some overhead in using SPAs when your application
           | could have been built in the way that the parent comment
           | suggests.
           | 
           | Some front-end frameworks are closing this gap but I wouldn't
           | necessary say they're equally a simple. See
           | https://macwright.com/2020/05/10/spa-fatigue.html
           | 
           | In other words, choose the right tool for the job.
        
       | pavlov wrote:
       | There are some web apps still in production that I wrote almost a
       | decade ago in Node+Express in the simplest, dumbest style
       | imaginable. The only dependencies are Express and some third-
       | party API connectors. The database is an append-only file of JSON
       | objects separated by newlines. When the app restarts, it reads
       | the file and rebuilds its memory image. All data is in RAM.
       | 
       | I figured these toys would be replaced pretty quickly, but turns
       | out they do the job for these small businesses and need very
       | little maintenance. Moving the app to a new server instance is
       | dead simple because there's basically just the script and the
       | data file to copy over, so you can do OS updates and RAM
       | increases that way. Nobody cares about a few minutes of downtime
       | once a year when that happens.
       | 
       | There are good reasons why we have containers and orchestration
       | and stuff, but it's interesting to see how well this dumb single-
       | process style works for apps that are genuinely simple.
        
         | dmw_ng wrote:
         | > database is an append-only file of JSON objects separated by
         | newlines. When the app restarts, it reads the file and rebuilds
         | its memory image. All data is in RAM
         | 
         | Apps like this tend to perform like an absolute whippet too (or
         | if they dont, getting them to perform well is often a 5 line
         | change). It's really freeing to be able to write scans and
         | filters with simple loops that still return results faster than
         | a network roundtrip to a database.
         | 
         | The problem is always growth, either GC jank from a massive
         | heap, running out of RAM, or those loops eventually catching up
         | with you. Fixing any one of these eventually involves either
         | serialization or IO, at which point the balance is destroyed
         | and a real database wins again.
        
           | Beltalowda wrote:
           | Another issue with "just a JSON file" as a database is that
           | you need to be a bit careful to avoid race conditions and the
           | like, e.g. if two web pages try to write the same database at
           | the same time. It's not an issue for all applications, and
           | not _that_ hard to get right, but does require some effort.
           | This is a huge reason I prefer SQLite for simple file storage
           | needs.
        
             | endorphine wrote:
             | Doesn't the fact that its opened in append only mode
             | (Linux) mitigate data races with regards to writes?
        
               | Beltalowda wrote:
               | Your write will be fine; that is, it's not as if data
               | from one write will be interspersed with the data from
               | another write. It's just that the order might be wrong,
               | or opening the file multiple times (possibly from
               | multiple processes) could be fun too. The program or
               | computer crashing mid-write can also cause problems.
               | Things like that.
               | 
               | Again, may not be an issue at all for loads of
               | applications. But I used a lot of "flat file databases"
               | in the past, and found it's not an issue right up to the
               | point that it is. Overall, I found SQLite simple, fast,
               | and ubiquitous enough to serve as a good fopen()
               | replacement. In some cases it can even be faster!
        
               | dmoy wrote:
               | Here is my list of numbers: 1,Here is my list of letters:
               | a,b,2,3,d
        
           | pavlov wrote:
           | Yes, you need to be sure that you understand the growth
           | pattern if you want to YOLO in RAM. If your product aims to
           | be the next Instagram, this is clearly not the architecture.
           | 
           | But a lot of small businesses are genuinely small. They may
           | not sign up new customers that often. When they do, the
           | impact to the service is often very predictable ("Amy at
           | customer X uses this every other day, she's very happy, it
           | generates 100 requests / week"). If growth picks up, there
           | would be signs well in advance of the toy service becoming an
           | actual problem.
        
             | jkaptur wrote:
             | > If your product aims to be the next Instagram, this is
             | clearly not the architecture.
             | 
             | But maybe! https://instagram-engineering.com/dismissing-
             | python-garbage-...
        
           | bob1029 wrote:
           | > The problem is always growth, either GC jank from a massive
           | heap, running out of RAM, or those loops eventually catching
           | up with you
           | 
           | Absolutely. The challenge is having enough faith that it will
           | take long enough to catch up to you.
           | 
           | Statistically speaking, it won't catch up to you and if it
           | does, it will take so long you should have seen it coming
           | from miles away and had time to prepare.
           | 
           | In my systems that use an in-memory/append-only technique, I
           | try to keep only the pointers and basic indexes in memory.
           | With modern PCIe flash storage, there is no good
           | justification for keeping big fat blobs around in memory
           | anymore.
        
         | epolanski wrote:
         | I often rewrite in my free time what I do at work without
         | dependencies and I'm often amazed at how far and faster you can
         | move.
        
         | mftb wrote:
         | I do almost this exact thing for all my personal stuff. I have
         | 5 or 6 going in a vm for simple things like my bookmarks,
         | etc... works great. I could definitely see it solving many
         | small business use-cases.
        
         | ammanley wrote:
         | Reminds me a lot of this (first paragraph):
         | https://litestream.io/blog/why-i-built-litestream/
         | 
         | Well done on building an easy-to-maintain single node app with
         | few dependencies. You would be the SWE I would send prayers of
         | thanks too after onboarding (and for not making me crawl
         | through a massive Helm chart/CloudFormation template hell).
        
         | danenania wrote:
         | Built-in first-class concurrency (ala node, golang, rust, etc.)
         | is a huge win for simple architectures, since it lets you avoid
         | adding a background queue, or at least delay it for a very long
         | time.
         | 
         | I think people are also too quick to add secondary data stores
         | and caches. If you can do everything with a transactional SQL
         | database + app process memory instead, that is generally going
         | to save you tons of trouble on ops, consistency, and versioning
         | issues, and it can perform about as well with the right table
         | design and indexes.
         | 
         | For example: instead of memcache/redis, set aside ~100 MB of
         | memory in your app process for an LRU cache. When an object is
         | requested, hit the DB with an indexed query for just the
         | 'updatedAt' timestamp (should be a sub-10ms query). If it
         | hasn't been modified, return the cached object from memory,
         | otherwise fetch the full object from the DB and update the
         | local cache. For bonus points, send an internal invalidation
         | request to any other app instances you have running when an
         | object gets updated. Now you have a fast, scalable, consistent,
         | distributed cache with minimal ops complexity. It's also quite
         | economical, since the RAM it uses is likely already over-
         | provisioned.
         | 
         | This is exactly the approach that EnvKey v2[1] is using, and
         | it's a huge breath of fresh air compared to our previous
         | architecture. Just MySQL, Node/TypeScript, and eventually
         | consistent replication to S3 for failover. We also moved to
         | Fargate from EKS (AWS kubernetes product), and that's been a
         | lot simpler to manage as well.
         | 
         | 1 - https://v2.envkey.com
        
           | gregmac wrote:
           | > For example: instead of memcache/redis, set aside a ~100 MB
           | of memory in your app process for an LRU cache. When an
           | object is requested, hit the DB with an indexed query for
           | just the 'updatedAt' timestamp (should be a sub-10ms query).
           | If it hasn't been modified, return the cached object from
           | memory, otherwise fetch the full object from the DB and
           | update the local cache.
           | 
           | I've never built something with this type of mechanism for a
           | DB query, but it's interesting. I don't think I've ever timed
           | a query like this, but I feel like it's going to be an "it
           | depends" situation based on what fields you're pulling back,
           | if you're using a covering index, just how expensive the
           | index seek operation is, and how frequently data changes.
           | I've mainly always treated it as "avoid round trips to the
           | database" -- zero queries is better than one, and one is
           | better than five.
           | 
           | I also guess it depends on how frequently it's updated: if
           | 100% of the time the timestamp is changed, you might as well
           | just fetch (no caching). Based on all the other variables
           | above, the inflection point where it makes sense to do this
           | is going to change.
           | 
           | Interesting idea though, thanks.
           | 
           | > For bonus points, send an internal invalidation request to
           | any other app instances you have running when an object gets
           | updated. Now you have a fast, scalable, consistent,
           | distributed cache with minimal ops complexity.
           | 
           | Now you have to track what other app servers exist, handle
           | failures/timeouts/etc in the invalidation call, as well as
           | have your app's logic able to work properly if this
           | invalidation doesn't happen for any reason (classic cache
           | invalidation problem). My inclination is at this point you're
           | on the path of replicating a proper cache service anyways,
           | and using Redis/Memcache/whatever would ultimately be
           | simpler.
        
             | danenania wrote:
             | It definitely does depend on various factors, but if your
             | query is indexed, both the SQL DB request and the
             | Redis/Memcache lookup of the full object are likely to be
             | dominated by internal network latency. If your object is
             | large, the DB single-field lookup could easily be faster
             | since you're sending less back over the wire.
             | 
             | In other words, a single-field indexed DB lookup can be
             | treated more like a cache request. Though for heavier/un-
             | indexed queries, your "avoid round trips to the database"
             | advice certainly applies.
             | 
             | With this architecture, the internal invalidation request
             | is just an optimization. It isn't necessary and it doesn't
             | matter if it fails, since you always check the timestamp
             | with a strongly consistent DB read before returning a
             | cached object.
        
         | smm11 wrote:
         | This. Not that I'm all about janky, but my road is littered
         | with stuff I didn't think would make it through summer, and
         | everything I check is still ticking 5, 7, 10 years later.
         | 
         | LONG ago I was amused by a Sun box in a closet that nobody knew
         | anything about. I heard about the serial label printer that
         | stopped working eight months ago, which was eight months after
         | I shut off the Sun. I brought it back up again late one Friday,
         | and the old/broken label printer magically worked again.
         | 
         | Now my stuff is that.
        
           | uxamanda wrote:
           | Ha, now you can use the label printer to label the machine!
        
         | uuyi wrote:
         | I would love to see more stuff like that.
         | 
         | An application I have written recently for personal use is a
         | double entry accounting system after GNUcash hosed itself and
         | gave me a headache. This is based on Go and SQLite. The entire
         | thing is one file (go embed rocks) and serves a simple http
         | interface with a few JS functions like it is 2002 again. The
         | back end is a proper relational model that is stored in one .db
         | file. It is fully transactional with integrity checks. To run
         | it you just start program and open a browser. To backup you
         | just copy the .db file. You can run reports straight out of
         | SQLite in a terminal if you want.
         | 
         | This whole concept could scale to tens of users fine for LOB
         | applications and consume little memory or resources.
        
           | powersurge360 wrote:
           | Check out alpinejs or stimulusjs and combine it with htmx to
           | get to a SPA like experience with very little additional
           | complexity! Htmx let's you serve partials over the wire
           | instead of a page load so you can update the page
           | incrementally and alpine and stimulus are both tools to add
           | JS sprinkles like you've described in a way that is
           | unobtrusive.
        
             | uuyi wrote:
             | I appreciate the notion but my objective was to do the
             | exact opposite of this and keep away from external
             | dependencies and scripts where possible, apart from the
             | solitary go-sqlite3.
             | 
             | The result is about 30K of source (including Go, CSS, HTML
             | templates) which is less than minified alpinejs!
        
           | CraigJPerry wrote:
           | >> This whole concept could scale to tens of users
           | 
           | I strongly suspect this approach scales to tens of thousands
           | of users. Maybe 30-40k users would be my guess on a garden
           | variety intel i5 desktop from the past 3 years or so.
           | 
           | I say this because that hardware (assuming NVMe storage) will
           | do north of 100k connect + select per second (connect is
           | super cheap in sqlite, you're just opening a local file),
           | assuming 2-3 selects per page serve gets me to the 30-40k
           | number. The http server side won't be the bottleneck unless
           | there's some seriously intensive logic being run.
        
             | uuyi wrote:
             | Interesting point. I may have to write a performance test
             | suite for it now and test this.
        
       | ilovecaching wrote:
       | The vast, vast, vast majority of organizations don't need micro
       | services, don't need half of the products they bought and now
       | have to integrate into their stack, and are simply looking to
       | shave their yak to meet the bullet list of "best practices" for
       | year 202X. Service oriented architectures and micro services
       | solve a particular problem for companies that are operating on a
       | massive scale and can invest (read waste money) on teams devoted
       | to tooling. What most companies should do is build a monolith
       | that makes money, but hire good software engineers that can write
       | packages/modules whatever with high levels of cohesion and loose
       | coupling, so that _one day_ when you become the next Google, it
       | will be less of a pain to break it into services. But in the end
       | it really doesn 't matter if it's painful anyway, because you'll
       | have the money to hire an army of people to do it while the
       | original engineers take their stock and head off to early
       | retirement.
        
         | danielvaughn wrote:
         | I'd never worked with micro-services before this latest
         | freelance project. I start working with this platform that is
         | basically "note taking but with a bit of AI/ML". So okay, a bit
         | of complexity with the ML stuff, but otherwise a standard CRUD
         | app.
         | 
         | The application itself is a total of 3 pages, encompassing
         | maybe 20 endpoints at the most, with about 100 daily active
         | users. For the backend, some genius decided to build a massive
         | kubernetes stack with 74 unique services, which has been
         | costing said company over $1K/month just in infra costs. It
         | took me literally weeks to get comfortable working on the
         | backend, and so much stuff has broken that I have no idea how
         | to fix.
         | 
         | Not only that, but the company has never had more than 1
         | engineer working on it at a time (they're very small even
         | though they've been around a bit). If there were such a thing
         | as developer malpractice, I'd sue whoever built it.
        
           | DerArzt wrote:
           | > 3 pages .... 74 unique services
           | 
           | Just, wat.
           | 
           | Sounds like the architect was doing some resume driven
           | development cause damn.
        
             | danielvaughn wrote:
             | In all honesty I'm _very_ angry about it. It was built a
             | while ago, and the dev is no longer here, but I almost want
             | to track him down and make him help fix it. This founder
             | isn 't technical, so he's been leaning on developers for
             | guidance, and this guy basically built him a skyscraper
             | when what he really needed was a shed. It hurts to think
             | about all the time and money that he's poured into just
             | maintaining it. Crazy.
        
           | ammanley wrote:
           | _74_ services on a k8 stack for _3_ pages and 100 active
           | daily users?????
           | 
           | This has to be a crime.
        
             | danielvaughn wrote:
             | Ok I feel very validated now - I'm not used to
             | microservices so didn't know what was typical. It _felt_
             | crazy, so good to know based on this comment 's responses
             | that it is indeed crazy.
             | 
             | For example, in order to sign up a user...the client hits
             | the /signup endpoint, which first lands on the server-
             | gateway service. Then that is passed along to an account-
             | service which creates the user. Then the accounts-service
             | hits a NATS messaging service twice - one message to send a
             | verification email, and another to create a subscription.
             | The messaging service passes the first message along to the
             | verification-service, which sends out a sendgrid email.
             | Then the second message gets passed along to a
             | subscription-service-worker. The subscription-service-
             | worker adds a job to a queue, which when it gets processed,
             | hits the actual subscription-service, which sends along a
             | request to Stripe to create the customer record and trial
             | subscription.
             | 
             | 6 services in order to sign up a user, in what could have
             | been done with about 100-300 lines of Node.
        
               | Beltalowda wrote:
               | Even the most fanatical microservices proponent will tell
               | you that's just bonkers.
               | 
               | At my very first programming job many years ago I was
               | given a bunch of code written by a string of "previous
               | guys" (mostly interns) over a period of 10 years or more
               | and was told "good luck with it". I was the only
               | developer, with no real technical oversight. It was my
               | first "real" programming job, but I had been programming
               | for many years already (mostly stuff for myself, open
               | source stuff, etc, but never "real" production stuff).
               | 
               | In hindsight, I did some things that were clearly
               | overcomplicated. I had plenty of time, could work on what
               | I wanted, and it was fun to see if I could get the
               | response speed of the webpage down from 100ms to 50ms, so
               | I added a bunch of caching and such that really wasn't
               | needed. Varnish had just been released and I was eager to
               | try it, so I added that too. It was nowhere near the
               | craziness you're describing though, and considering the
               | state of the system when I took things over things were
               | still massively improved, but I'd definitely do things
               | different now because none of that was really needed.
               | 
               | Maybe if it had been today instead of 15 years ago I
               | would have gone full microservice, too.
        
       ___________________________________________________________________
       (page generated 2022-04-06 23:00 UTC)