[HN Gopher] Asynchronous Task Scheduling at Dropbox
       ___________________________________________________________________
        
       Asynchronous Task Scheduling at Dropbox
        
       Author : pimterry
       Score  : 54 points
       Date   : 2020-11-11 19:26 UTC (3 hours ago)
        
 (HTM) web link (dropbox.tech)
 (TXT) w3m dump (dropbox.tech)
        
       | ryanworl wrote:
       | "To avoid this situation, there is a termination logic in the
       | Executor processes whereby an Executor process terminates itself
       | as soon as three consecutive heartbeat calls fail. Each heartbeat
       | timeout is large enough to eclipse three consecutive heartbeat
       | failures. This ensures that the Store Consumer cannot pull such
       | tasks before the termination logic ends them--the second method
       | that helps achieve this guarantee."
       | 
       | Neither this or the first method guarantees a lack of concurrent
       | execution. A long GC pause or VM migration after the second check
       | could allow the job to get rescheduled due to timeout. The first
       | worker could resume thinking it still had one heartbeat left to
       | execute before giving up on the job and it could've already been
       | handed out to another worker in the meantime.
        
       | richardARPANET wrote:
       | Why didn't they just use Celery?
        
       | newfeatureok wrote:
       | Why is it that this task scheduling problem appears to often? Why
       | hasn't this problem been "solved" in the same way sorting strings
       | has?
       | 
       | I understand companies have different requirements, but if you
       | look at the history even on Hacker News this problem is basically
       | being resolved by different companies at least once a quarter.
        
         | mfateev wrote:
         | Because it is a hard problem to solve holistically.
         | 
         | It looks simple on the surface. So almost any company ends up
         | creating an implementation similar to the one described in the
         | article. Then it learns that it is much harder than looks, but
         | it is usually too late. So they end up maintaining it for a
         | long time with the original team long gone.
         | 
         | BTW I believe that temporal.io (I'm tech lead of the project)
         | is so far the best open source solution to this problem.
        
       | Thaxll wrote:
       | If you work in Go and need to work on similar problem I highly
       | recommend: https://cadenceworkflow.io/
        
         | mfateev wrote:
         | It is not Go specific. It already supports both Go, Java and
         | Ruby.
         | 
         | temporal.io our fork of Cadence will have PHP support very
         | soon. Support for other languages is coming. Python and
         | Typescript are the highest priority.
        
           | swyx wrote:
           | Clickable link: http://temporal.io/
           | 
           | i worked on this site btw - would be happy to receive
           | feedback on the site, particularly if any wording was
           | confusing or unclear!
        
       | rkagerer wrote:
       | I wish folks building Task frameworks would provide a standard
       | mechanism for tasks to signal their progress. I realize it might
       | not be relevant here but I've noticed this gap in more general
       | frameworks as well.
        
       | jeffbee wrote:
       | If you have the opportunity, _please_ do not build it like this.
       | Referring to the architectural diagram, it is going to be much
       | more efficient for the  "Frontend" to persist the task data into
       | a durable data store, like they show, but then the Frontend
       | should simply directly call the "Store Consumer" with the task
       | data in an RPC payload. There is _no_ reason in the main
       | execution path why the store consumers should ever need to read
       | from the database, because almost all tasks can cut-through
       | immediately and be retired. Reading from the database should only
       | need to happen due to restarts and retries of tasks that fail to
       | cut through.
       | 
       | Disclaimer/claim: I worked on this system and on gmail delivery.
        
         | mrfox321 wrote:
         | In other words, the frontend gets some ACK from the DB before
         | calling the "Store Consumer"? (just trying to make sure I
         | understand your critique of the design)
        
           | jeffbee wrote:
           | Well it doesn't necessarily need to happen in that order, I
           | think. Frontend needs to ensure that the task is durably
           | stored before it acknowledges the end of the operation to its
           | caller.
           | 
           | Using email as an analogy, you have to commit the message to
           | durable storage before you respond 250 to DATA.
        
         | neolog wrote:
         | If you do it that way, how do you make sure the task gets
         | completed successfully and exactly once?
        
           | jeffbee wrote:
           | I wouldn't. Exactly-once is a fool's quest, and the scheme in
           | this article does not offer it.
           | 
           | To achieve at-least-once, you need only track which tasks
           | have been successfully retired, and persist that knowledge in
           | the database by either deleting or mutating the task. During
           | a cold start you scan the persistent store to find tasks that
           | were still pending/live at the time your process began.
        
       | [deleted]
        
       | imstil3earning wrote:
       | What does this solve that something like Rabbitmq doesn't? Am I
       | missing some key points?
        
         | Thaxll wrote:
         | Rabbitmq does not solve this problem, Rabbitmq offer a solution
         | for message passing that's it, it does not offer a framework to
         | execute tasks etc ...
        
         | aeyes wrote:
         | Everything listed under "Features", "System guarantees" and
         | "Lambda requirements"?
         | 
         | Dropbox using Python, the real question is what Celery didn't
         | solve for them. My guess would be scalability.
        
           | solumos wrote:
           | It seems that Nextdoor also had issues with celery[0].
           | 
           | "Scalability" is a great scapegoat for making dubious
           | decisions, but my guess here would be the "task priority"
           | requirement.
           | 
           | [0] https://engblog.nextdoor.com/nextdoor-taskworker-simple-
           | effi...
        
       | sna1l wrote:
       | Why not use something like Cadence/Temporal/Amazon Workflow
       | Service?
        
         | [deleted]
        
         | stunt wrote:
         | Flyte is also in the same space.
         | 
         | https://github.com/lyft/flyte
        
           | sna1l wrote:
           | Yeah lots of different options here, mostly surprised they
           | didn't talk about why they didn't choose any of the
           | alternatives.
        
       | jonpurdy wrote:
       | I don't meant to take away from the article, but it makes me sad
       | to see such awesome people building and writing about really cool
       | bespoke solutions. It's obvious that Arun knows their stuff and
       | is able to communicate it clearly.
       | 
       | The sad thing is that Dropbox Product has so heavily dropped the
       | ball that users like myself (from back in 2009) have switched
       | away in droves over the past few years.
       | 
       | I understand that Dropbox core functionality wouldn't have been
       | enough to multiply the valuation of the company to what investors
       | expected. But it would have been nice to not jam collaboration
       | features into the product and mess up the simple, platform-native
       | UI with it's current abomination. I'd pay $10/mo forever if I
       | could get the 2010-esque Dropbox Mac client and sync service back
       | since it's way better than anything else (especially iCloud).
        
         | rkagerer wrote:
         | Dropbox customer here, agree wholeheartedly.
         | 
         | They've really gone downhill by adding unwanted bloat, and it
         | just seems to be accelerating. Meanwhile their core product is
         | degrading. Abandonment of the Public folder in spite of a huge
         | outcry from customers was disappointing. The user experience is
         | plastered with advertising to try their other products, even if
         | you turn off all the relevant notification settings. And lately
         | I've been running into subtle functionality bugs in the client.
         | 
         | Would happily give my money to a competitor focused on a lean,
         | reliable product.
        
         | donor20 wrote:
         | This - we are a business user for dropbox, on windows the task
         | tray is a mess, the collab / editing / paper features so
         | annoying. Sync I think is still OK if you can ignore everything
         | on the website.
         | 
         | I do wish you could PAY for a basic version (maybe make the
         | collab stuff free as part of some trial or something).
        
         | svara wrote:
         | I don't know, I think it's pretty great. Can't live without it.
         | 
         | The client's UI is a bit odd, but at the end of the day it's
         | really good at what it's supposed to do: Syncing files.
         | 
         | Performance is also great: I'm using multiple machines to write
         | code on, and I keep my local git repo on Dropbox. I can
         | literally save a change on my notebook and run it on some other
         | machine 3 seconds later.
         | 
         | On Mac and Linux you might want to check out maestral
         | (https://github.com/SamSchott/maestral), a third-party client
         | that works really well.
        
           | secondcoming wrote:
           | Does Dropbox use proper filesystems? I considered using
           | Amazon's S3 to host a repo but apparently it may not work
           | properly since it's not a 'proper' file system
        
         | draw_down wrote:
         | Come on
        
         | Osiris wrote:
         | I can understand the need for a company to be constantly trying
         | to add value to their product, but that tendency to be changing
         | so much can easily cause you to lose sight of what made you
         | popular in the first place.
         | 
         | I use Dropbox personally to keep documents synced between my
         | computer and my wife's and also to grab documents I need from
         | the web if I'm on another computer. I occasionally share a
         | folder if I need to give a large number of files to someone.
         | 
         | I recently had a notification come up on the dropbox taskbar
         | icon and it popped up this huge window that looked like a
         | massive electron app. In the old days, there wasn't even a UI,
         | just a context menu that also showed the state of the sync.
         | 
         | For me, Dropbox provides the most benefit when it's not
         | visible, running invisibly in the background doing it's thing.
        
       | staticassertion wrote:
       | Error: 4xx Error (4xx) We can't find the page you're looking for.
       | Check out our Help center and forums for help, or head back to
       | home.
       | 
       | Getting this when I click the link.
        
       ___________________________________________________________________
       (page generated 2020-11-11 23:01 UTC)