[HN Gopher] GoodJob - a Postgres-based ActiveJob back end for Ru... ___________________________________________________________________ GoodJob - a Postgres-based ActiveJob back end for Ruby on Rails Author : another-dave Score : 119 points Date : 2020-07-23 16:12 UTC (6 hours ago) (HTM) web link (island94.org) (TXT) w3m dump (island94.org) | samokhvalov wrote: | To make it scale well, two things are required: | | 1. TRUNCATE if cleanup is needed (no DELETEs) + partitioning. | | 2. SELECT .. FOR UPDATE SKIP LOCKED or advisory locks. | | It's worth learning from Skype's PgQ developed in 2000-s. | | Update: found the work with advisory locks in the code, great. | rubyn00bie wrote: | Despite my username, I haven't written any serious Ruby is like | 5+ years; that is to say, I'm fairly removed from a lot of the | changes in the Rails world. | | Now, pardon me for my ignorance, but... isn't this just like | Resque back in the day? Or like doesn't ActiveJob just support | any ol' backend outta the box? What's the real use case for this? | I'm genuinely curious not trying to be rude or diminish the work. | sideofbacon wrote: | GoodJob author here. | | The core reason I wrote it was to see how lean/simple I could | write a database-backed ActiveJob adapter today. And hope | | In practice any adapter should _just work_, but also all the | other adapters pre-date ActiveJob (Rails 4.2) and | Concurrent::Ruby (adopted in Rails 5.0 I think). The assumption | is that building GoodJob on top of what already exists in Rails | today, it can be performant (enough) and simple (easy to | understand, maintain, keep compatible with new versions of | Rails, etc). That's not a big claim, but maybe it's compelling. | techscruggs wrote: | You can implement background processing with less operational | complexity. This allows you to leverage activejob to get | something up and running quickly and easily. | | Granted, if you start having higher throughput, you want to | move on from using postgres as your message bus. | why-el wrote: | I didn't look into this one yet besides the home page, but I | always thought Postgres would make a killer backend for | ActiveJob, leveraging advisory locks for synchronizing jobs and | such. Resque is Redis backed, but if you don't want to add that | as a dependency and already have Postgres setup, then this | would be a great fit. | thibaut_barrere wrote: | I believe Resque was fork based (at least initially, it may | have changed, I haven't looked at it in a while!), while | GoodJob appears to be multithreaded (like Sidekiq), so this is | quite different in terms of performance & throughput. | joemclarke wrote: | This is awesome, thanks for sharing! Sidekiq works well for me | but it's good to have an option that doesn't require Redis. | simplify wrote: | This is the exact sort of thing I love about the Rails community. | Always innovating to keep things simple. | abc_321 wrote: | this is a test comment | SkyLinx wrote: | Would this work with pgbouncer? I'm wondering because of the | advisory locks. | sideofbacon wrote: | GoodJob author here. Excellent observation. | | It should work if PgBouncer is using session-level pooling, but | not transaction-level pooling. I made a ticket to document | this: https://github.com/bensheldon/good_job/issues/52 | | I don't have much experience with PgBouncer. What's the | scale/need when a team would use PgBouncer? That would help me | prioritize supporting it better. | shay_ker wrote: | I'm excited about ActiveJob because, finally, people will be able | to build background jobs on things that aren't Redis. No hate on | Redis - I'm a huge fan of it - but sometimes you have stronger | consistency reqs that Redis does not have, and stronger | availability reqs that SQS or PG don't have (aka multi-region | jobs). | | If the concurrency in ActiveJob become as good or better than | Sidekiq, we can really get a lot of interesting stuff out there. | scruple wrote: | I used Que [0] in a number of projects in the last few years | that had stronger consistency demands. It can plug into | ActiveJob. Those projects are still humming along without | trouble today. | | [0]: https://github.com/que-rb/que | arcticfox wrote: | As a fellow Que user, I'm happy to see this line in the | README: | | > For example, GoodJob is currently ~600 lines of code, | whereas Que is ~1,200 lines, | | I've tried to dig into the Que internals before (looking into | deadlocks) and they were gnarly; skimming through the GoodJob | internals, it looks a lot more like what I'd hope for. | nthj wrote: | Que also has a Golang port [1], which can be a tool to get | towards The Citadel [2] for performance or dependency-heavy | jobs without going full SOA | | [1] https://github.com/bgentry/que-go | | [2] https://m.signalvnoise.com/the-majestic-monolith-can- | become-... | cpursley wrote: | For a very nice library in the Elixir/Phoenix ecosystem have a | look at https://github.com/sorentwo/oban | regulation_d wrote: | I think one of my biggest annoyances, from a development | perspective, with Sidekiq is that it typically moves you in the | direction of running foreman or something like that, which | makes using the pry a pain. | | One thing I really like about Phoenix development is that I | don't have to change my flow to use a REPL while still being | able to run background jobs. | dwheeler wrote: | This looks very promising! I'm currently using active job, so I | will need to take a look at this. Thanks for letting me know. | bmn__ wrote: | Also see: https://news.ycombinator.com/item?id=9576864 Postgres | Job Queues and Failure by MVCC | bdcravens wrote: | In this post, I noticed comparison with other job libraries, but | not Sidekiq, the most popular one. Why? | sideofbacon wrote: | GoodJob author here. The reason is that I don't have any | production experience with Sidekiq. | | My target right now is someone, like myself, who would | typically spin up a new project with delayed_job or que, and | provide a better alternative with GoodJob. | | Maybe in the future I'll do some more research on understanding | why someone might switch from a Sidekiq/Redis to not-Redis. | geoffharcourt wrote: | I think they specifically wanted to compare themselves to queue | systems that use the database as the storage for queued jobs. | jbverschoor wrote: | Hm. Got falsely excited. No listen/notify etc? This really is | just delayed_job | techscruggs wrote: | This looks great. I'd love to remove Redis from my tech stack on | applications with low throughput background jobs. Sadly, it | doesn't appear to have a UI. | | Resque and Sidekiq are both good enough solutions, but what makes | them my defacto choice is the robust web UIs that comes with | them. | block_dagger wrote: | Keeping your jobs in Redis (like Sidekiq does) seems to be a good | feature to me. I'm curious how GoodJob compares when it comes to | UI and queue management. | another-dave wrote: | It would be nice to have one less dependency in the tech stack | but honestly have no complaints with Sidekiq so far. | | I'm quite new to Ruby/Rails though -- would be interesting to | hear from others how they think it stacks up (unfortunately the | author doesn't compare in the blog post) | mberning wrote: | Sidekiq is one of the best pieces of software I have ever used. | chriszhu12 wrote: | Another small thing to consider: Adding redis to your stack | adds a new failure mode. A service I built a while ago was | running on Heroku where Redis add-ons aren't cheap, and our | Redis would always running against memory limits, which would | cause our entire service to fail. | | Meanwhile, 250GB Postgres server has been churning along no | problem. | | Also, conceptually, architecture where state = DB, and business | logic = web server is generally easier to reason about during | service migrations and such. | | If you don't use ActionCable which I think is the other core | feature that relies on Redis, being able to remove it would've | saved me a lot of early morning firefighting. | debaserab2 wrote: | This is exactly why I stuck with DelayedJob after all of | these years. The community seemed to flock to Resque or | Sidekiq, but I always felt my needs were met just fine with | Delayed Job. I never was interested in adding more | unnecessary complexity to our dev and staging environments | for a trade-off I didn't need (there wouldn't be a | significant performance increase for my needs). | | I've seen this pattern happen with a lot of Ruby projects: | there's a popular Gem that people use that grows over time, | then someone writes a blog post about why that package isn't | suitable for their needs due to a design trade-off (often | introducing a new Gem that is advertised as superior to the | previous one) and then suddenly the old Gem stagnates, | causing the maintainer to lose interest and updates stop | pacing Rails versions. Sometimes the old Gem gets deprecated | altogether and maintainers stick a big DO NOT USE sign at the | top of the github README, all but guaranteeing there is no | further community traction or organization for the project. | | Meanwhile those of us using the old project chug along with a | forked Gem that we cobble a bunch of patches into to meet our | needs because there's no longer a centralized place to | contribute to anymore. In some ways it's the double-edged | sword of OSS. Maintainers aren't obligated to stay as | maintainers of course, but it does get frustrating how | quickly the community is willing to drop support of things | that a lot people are using in production systems. | | It definitely makes me reconsider the "don't reinvent the | wheel" advice that is so adamantly thought of as a common | sense convention in our craft. | learc83 wrote: | That's exactly why I implemented my own queue system. If | it's something that will only take a few hours to build, | and I'll need to maintain it long term, I'll always favor | rolling it myself. | | There are very few "feature gems" that I've used in the | past 15 years doing Rails development that I've been happy | I used a year later. Development gems on the other hand | like byebug are usually fine. | cactus2093 wrote: | The standard advice used to be that neither Redis or Postgres | are very good for using as a queue. | | The creator and former maintainer of Redis was up until a few | years ago discouraging its use as a queue, I think mainly | because of its lack of durability and high availability at the | time. He built a prototype Disque[0] to address the issues but | it never became production ready. The other downside is that | Redis is in-memory which means the queues have less | capacity/are more expensive for the same capacity than an on- | disk solution, but as memory gets cheaper over the years this | becomes less and less of an issue. The upside is the throughput | of Redis is very high. | | I have personally worked on rails apps using redis-based queues | like resque (and to a lesser extent sidekiq), and actually | haven't run into any redis crashes or downtime in years of | runtime, redis is very solid in general. You can also snapshot | the redis instance periodically, to limit the number of jobs | you would lose if it did crash. | | In terms of using a primary db like postgres or mysql as a | queue, I have personally run into issues with this multiple | times. I would recommend never to do it, except on the smallest | of side projects. | | The issue is that eventually your queues will back up, whether | it's due to a bug, surge of traffic, or just complex | interaction of behavior in your app that cascades a ton of jobs | at once when you run a backfill or something. When your app | starts to get overloaded it's pretty trivial to increase the | number of web instances running, so your bottleneck in these | situations is going to be the db performance. As your queues | get backed up, your queue workers are running at full speed | processing jobs nonstop, which puts strain on your DB. | Additionally, the act of enqueueing and dequeuing a job itself | also puts strain on the db, so you can easily get into an | unstable situation where each job that gets added to the queue | makes every other job take longer. | | If you allocate a separate DB instance that is only running | your queue, that is much safer. Still, a DB like postgres is | not great at doing constant writes and deletes, it creates | additional auto-vacuum pressure for instance. But this will | manifest as just getting worse throughput on the same hardware | than you could get from a dedicated queue like rabbit mq, so if | you're not at large scale it's a fine option. | | Edit: And one other thing to add, for a lot of web apps the | scope of what is needed from a queue these days is a lot less | now than it was in the past. It used to be, and in large | enterprise systems it often still is, the case that when people | talked about a message queue they wanted something to | facilitate passing messages between many completely separate | apps. Now most apps just use a rest api for that (or perhaps | protobufs or graphql or something but still over http). So I | think historically an additional reason against using a simple | datastore as a queue was that it didn't have enough features so | you'd end up re-inventing the wheel with things like brokers, | fan out and broadcast patterns, at-most-once vs at-least-once | semantics, etc. But here I'm just considering the very limited | usecase of a sidekiq-like queue, for processing jobs in the | background for a single web app. | | tl;dr: Never use your primary DB as a queue. Using a separate | Postgres instance can work if you over-provision capacity and | don't need to maximize throughput, and a Redis-based solution | can work if you don't need high availability and can tolerate | some messages lost if something goes wrong. | | [0] https://github.com/antirez/disque | debaserab2 wrote: | > In terms of using a primary db like postgres or mysql as a | queue, I have personally run into issues with this multiple | times. I would recommend never to do it, except on the | smallest of side projects. | | Postgres is fine as a queue for medium to large sized | projects as it depends completely on your workload and what | sort of performance characteristics you need. Thinking about | it in terms of project size is the wrong way to evaluate it. | learc83 wrote: | That completely depends on the amount of items you plan on | adding to the queue. I've used the primary DB as a queue | often over the last 10 years for things like sending | notifications. | | Sure there's a point where this doesn't make sense, but for | the majority of cases there are ways to mitigate the positive | reinforcement loop you're taking about. | | The easiest is to limit the number of workers to some number | that won't impact DB performance if they are running full | tilt. You can even use an enum on existing records to | determine the background job status if you have few enough | rows (we do this for a table with a few hundred thousand job | applicants). | | I've found that in most cases we want a record that the | background job was performed, so we were often updating the | database anyway when a job was complete. | | Sure if you are firing off so many events that PG can't keep | up with writing them, then PG isn't a good option. | pselbert wrote: | > Never use your primary DB as a queue. Using a separate | Postgres instance can work if you over-provision capacity and | don't need to maximize throughput, and a Redis-based solution | can work if you don't need high availability and can tolerate | some messages lost if something goes wrong. | | That is a broad generalization that assumes most applications | are operating at mega scale. The benefits of simplified | dependencies (a single database instance), transactional | guarantees (a single database instance) and persistence (not | using Redis) far outweigh the eventual possibility that the | queue will place too large a load on your database. | | As the author of Oban[0] (an PG backed persistent queue in | Elixir) I'm definitely biased. However, the level of adoption | in the Elixir community seems to signal that a lot of | companies favor simplicity and safety over a possible scale | issue down the road. The primary application I work on | processes ~500k-1m jobs a day and the queue overhead is | virtually invisible. | | [0] https://github.com/sorentwo/oban | jordic wrote: | There's also another benefit around using of as a queue, | you can just publish your messages from within the same | transaction you are using on the request, and that's nice, | all or nothing.. | | I will dig on your elixir queue, just build my own some | days ago (mostly for fun) but also for solving some | limitations in rabbit (mostly time based scheduling at | short periods of time.. and control the throughput (to | solve some rate limits). | | Mine is here, https://github.com/vinissimus/jobs | | The main feature is that it's built with pl/pgsql and | allows to integrate so well with the rest of server backend | (publishing jobs from triggers... ) Also listening on | results with pg_notify | cactus2093 wrote: | You actually bring up another point that in my mind is yet | another argument against using your primary DB as a queue, | so I'm sure I am also biased from being burned in the past | :) | | But in a past company I worked at, the company started out | thinking sure just throw the queue in the primary db for | simplicity. Eventually our slow query logs and db | performance monitoring tools were showing that ~40% of the | db load was due to the queue inserts and queries. It may be | that it was doing something incredibly inefficient and | unnecessary in the particular library we were using but we | did look into it pretty thoroughly, this was a few years | ago though so I don't recall all of the details. And that | was at normal operation, then we ran into an issue that | basically brought our site down when queues backed up. | | At that point it was definitely time to split out the | queue, and when we did it we realized that we had | implicitly been depending on transactional consistency | between the queue and the app data in a few places, which | was then extra work to track down and fix these types of | issues. This is IMO a code smell as well in general - your | data and your infrastructure should ideally not be so | tightly coupled. | | Managed databases are so easy to set up these days, I would | still definitely recommend a separate instance for the | queue vs the primary db from day 1 in any new app I build. | If you do want to combine them to save money on infra, use | a separate logical db and separate connection pool and | everything so that it's easy to split out in the future. | paulryanrogers wrote: | Agree that separate is more scalable. Still it also | distributes the state, and that can be surprising if not | planned for. | simplify wrote: | Agreed, Sidekiq is great and I don't imagine anyone switching | away from it for existing projects. | | But I can see GoodJob being better for spinning up new | projects, as there are fewer moving parts to worry about, and | it's easy to upgrade to Sidekiq, etc. when you need to (thanks | to ActiveJob's standard abstraction). | dyeje wrote: | Yea I agree, a comparison to sidekiq would be helpful in | evaluating and was what I was expecting to be covered on the | page. | dorianmariefr wrote: | i'm not sure why i should use this instead of delayed job for | instance | thibaut_barrere wrote: | delayed_job hasn't seen a release in 1 year, if that matters to | you https://rubygems.org/gems/delayed_job/ | pqdbr wrote: | If you're using delayedjob, I strongly suggest you check out | the Performance table in Sidekiq's readme. 465 seconds for | DelayedJob against 14 seconds for Sidekiq for the same | workload. | | It's 7100 jobs/sec x 215 jobs/sec. | | I wonder how GoodJob will compare. | | https://github.com/mperham/sidekiq#performance | mbell wrote: | That benchmark is just showing job overhead. That Sidekiq can | do nothing 7100 times a second only matters when overhead is | significant, i.e. when your jobs do almost nothing. | mhoad wrote: | I feel like Rails is really starting to see some new life that | has felt a bit absent from it for a while now. It almost seems | like there is a new generation of people extremely burned out | from the constant churn of the "JS-all-the-things" movement that | just took over most of web development for the past several | years. | | There is a lot of exciting stuff landing in Rails and it's | surrounding ecosystems at the moment and I think it's still | probably the best choice for many new SAAS / web app based | companies and startups. | WrtCdEvrydy wrote: | Rails was always great, JS just paid a lot more... | | It felt great to get back into it having people pay to migrate | their Rails apps to React and Node. | gav wrote: | It's funny you say that, I've seen people pick a tech stack | based on developer cost. | | Maybe five years back I had a client in NYC who had a | predominantly Java stack, but due to internal politics, | wouldn't pay more than 135k for a senior developer when the | market was wanting at least 25-30k more than that. Everyone | they hired was pretty terrible. | | At the time, Node developers were cheaper, so we split the | application in two, wrapped all the old Java code with APIs, | which was then able to be maintained by a small team while | the front-end and any new features were all re-written with | Node. It was architectural a better system--but the driving | factor was really developer cost (the business hated to make | any investments in technology). | | The problem was that Node got popular and it got harder and | harder to hire good Node developers at what the company | wanted to pay. They had a big re-org and fired everyone I | knew, so I don't know what they are doing now, but I often | wonder if they just repeated the process with whatever up- | and-coming language was more hip. | WrtCdEvrydy wrote: | > I've seen people pick a tech stack based on developer | cost | | You'd be surprised how often cost of developers comes into | the overall conversation especially when you have tech | sprawl and have to spend to get qualified people. | | I've seen projects green-light just on the premise that | removing legacy code will lead to a payroll reduction. | techscruggs wrote: | What are some of the other exciting things happening in the | Rails world these days? | czbond wrote: | Thanks for asking - was curious the same. Always loved Rails, | but then got into the "JS" toolset - which has been a cluster | of extra required work. | mhoad wrote: | There's a new release about to drop which is Rails 6.1 that | covers some interesting developments. | | GitHub took all of their code associated with running "web | scale" applications that need to talk to multiple databases | at the same time and put that into a dead simple native Rails | framework. | | Basecamp have spent the last couple of years (I think?) | working in secret on what they claim is an entirely new way | of doing all things front end. There isn't a LOT of details | on that at the moment and it's kind of built up as a big | surprise. But everyone who has had a behind the scenes look | seems unusually excited. | | There's a kind of interesting web based interface for | describing and developing all kinds of boilerplate parts of | your application that could really speed up development time | https://github.com/rails/rails/pull/35489 | | Some cool security stuff like this | https://twitter.com/dhh/status/1268236728134889475?s=21 and | promoting things like WebAuthn to be first class citizens. | | But basically at this stage you have 3 major contributors of | Shopify, Basecamp / Hey and Github extracting huge parts of | their internal systems and rewriting them as mini Rails | frameworks with famously simple APIs for people to use. | | That's obviously in addition to everyone / everything else. I | think the future looks bright. | TheRealDunkirk wrote: | > an entirely new way of doing all things front end. | | Are you referring to ViewComponent: | https://github.com/github/view_component? As someone who's | happily been continuing to make Rails apps for the past few | years (and avoiding Angular like the plague after being | burned badly with it), this seems really intriguing, but | I'm waiting on the 6.1 release before I try it. | mhoad wrote: | Nah this is entirely seperate from that although I think | View Component (also from GitHub in case anyone is | wondering) is super exciting. | | They are basically doing a major rewrite of Turbolinks | and Stimulus I think. If you go to app.hey.com and jump | into dev tools you can kind of start to piece things | together somewhat since they have made all of the source | maps available. But I'm yet to see too many people really | figure out what is happening and I'm excited for the | official announcement. | | I think the idea that Rails has somewhat lagged behind | the rest of the front end world is in many ways an | overblown talking point but is at the same time in no way | an unfair criticism. | | I've heard the core team make several references for the | last year that this new approach is basically to front | end what Rails was to backend when it launched in terms | of developer experience while keeping 80-90% of the | performance benefits of SPAs but with 0% of the drama. Or | at least this is how I understand them to be promoting | it. | thibaut_barrere wrote: | Here are a few interesting pieces in Rails specifically: | | - https://github.com/hopsoft/stimulus_reflex is inspired by | Phoenix LiveView and is getting a bit of traction around me | | - https://github.com/hopsoft/cable_ready is a companion | project | | - https://github.com/discourse/message_bus by Sam Saffron | from Discourse is a nice way to implement live updates too, | quite easily (video demo at https://twitter.com/thibaut_barre | re/status/12565974431075860...) | | - https://lamby.custominktech.com is a Rails + AWS Lambda | integration which is also gaining a bit of traction | | I also see interesting stuff in Ruby more generally these | days: | | - my own https://www.kiba-etl.org (data processing framework) | is growing nicely | | - https://sidekiq.org is very solid and used in most Rails | app I've seen | | - https://github.com/contribsys/faktory allows interop-jobs | (e.g. create from Ruby, consume from something else), which | is also interesting | | - https://github.com/oracle/truffleruby is making very good | progress | | Just a few cherry-picked links, but I definitely think there | are some nice evolutions going on in the Ruby world (speaking | as someone also using Elixir in production!). | keesj wrote: | In addition to that Hey.com[1] was recently released which | is a new app built by Basecamp. Where Ruby on Rails | originated. | | This new app comes with many new technologies. Some of | which are already extracted into Rails (ActionText and | ActionMailer), and some that will be (completely new | approach to frontend with an all new Turbolinks). | | At the same time Stimulus is getting mature. | | All the excited around this, is motivating people to start | creating helpful Rails resources[2], which in turn leads to | more excitement. | | [1] https://hey.com | | [2] https://twitter.com/marckohlbrugge/status/1271749844886 | 06105... | Scarbutt wrote: | They seem to be recreating turbolinks ever couple of | years heh. ___________________________________________________________________ (page generated 2020-07-23 23:00 UTC)