[HN Gopher] Rethinking serverless with FLAME ___________________________________________________________________ Rethinking serverless with FLAME Author : kiwicopple Score : 304 points Date : 2023-12-06 12:03 UTC (10 hours ago) (HTM) web link (fly.io) (TXT) w3m dump (fly.io) | 8organicbits wrote: | > With FLAME, your dev and test runners simply run on the local | backend. | | Serverless with a good local dev story. Nice! | victorbjorklund wrote: | Totally. One reason I don't like serverless is because the | local dev exp is so much worse compared to running a monolith. | sergiomattei wrote: | This is incredible. Great work. | amatheus wrote: | > Imagine if you could auto scale simply by wrapping any existing | app code in a | | > function and have that block of code run in a temporary copy of | your app. | | That's interesting, sounds like what fork does but for | serverless. Great work | abdellah123 wrote: | Wow, this is amazing. Great work. | | One could really up a whole Hetzner/OVH server and create a KVM | for the workload on the fly!! | MoOmer wrote: | WELL, considering the time delay in provisioning on | Hetzner/OVH, maybe Equinix Metal would work better? But, if | you're provisioning + maybe running some configuration, and | speed is a concern, probably using Fly or Hetzner Cloud, etc. | still makes sense. | chrismccord wrote: | Author here. I'm excited to get this out and happy to answer any | questions. Hopefully I sufficiently nerd sniped some folks to | implement the FLAME pattern in js , go, and other langs :) | ryanjshaw wrote: | This looks great. Hopefully Microsoft are paying attention | because Azure Functions are way too complicated to secure and | deploy, and have weird assumptions about what kind of code you | want to run. | bbkane wrote: | I had a lot of problems trying to set up Azure Functions with | Terraform a couple of years ago. Wonder if it's gotten | better? | | https://www.bbkane.com/blog/azure-functions-with-terraform/ | orochimaaru wrote: | I used them with Python. Simple enough but opinionated. I | didn't play around with durable functions. | | Don't have strong feelings there. It worked. I did have | some issues with upgrading the functions but found the work | arounds. | bob1029 wrote: | > weird assumptions about what kind of code you want to run | | Those "weird assumptions" are what makes the experience | wonderful for the happy path. If you use the C#/v4 model, I | can't imagine you'd have a hard time. Azure even sets up the | CI/CD for you automatically if your functions are hosted in | Github. | | If your functions need to talk to SQL, you should be using | Managed Identity authentication between these resources. We | don't have any shared secrets in our connection strings | today. We use Microsoft Auth to authenticate access to our | HttpTrigger functions. We take a dep on IClaimsPrincipal | right in the request and everything we need to know about the | user's claims is trivially available. | | I have zero experience using Azure Functions outside of the | walled garden. If you are trying to deploy python or rust to | Az Functions, I can imagine things wouldn't be as smooth. | Especially, as you get into things like tracing, Application | Insights, etc. | | I feel like you should only use Microsoft tech if you intend | to drink a large amount of their koolaid. The moment you | start using their tooling with non C#/.NET stacks, things go | a bit sideways. You might be better off in a different cloud | if you want to use their FaaS runners in a more "open" way. | If you _can_ figure out how to dose yourself appropriately | with M$ tech, I 'd argue the dev experience is unbeatable. | | Much of the Microsoft hate looks to me like a stick-in-bike- | wheels meme. You can't dunk on the experience until you've | tried the one the chef actually intended. Dissecting your | burger and only eating a bit of the lettuce is not a thorough | review of the cuisine on offer. | jorams wrote: | > You can't dunk on the experience until you've tried the | one the chef actually intended. Dissecting your burger and | only eating a bit of the lettuce is not a thorough review | of the cuisine on offer. | | But Microsoft isn't selling burgers that people are taking | a bit of lettuce from. They're selling lettuce, and if that | lettuce sucks in any context that isn't the burger that | they're _also_ selling, then complaining about the quality | of their lettuce is valid. | jabradoodle wrote: | A cloud vendor where using some of the most popular | languages in the world makes your life harder is a genuine | reason to dislike something. | kapilvt wrote: | azure functions don't fit common definition of serverless, | I've had a few convos with them over several years.. but | there is really a mismatch owing to the original origin at | azure, and real lack of understanding of the space, ie origin | is as built on top of web apps.. ie.. azure functions is | built on a hack for to try and enter the marketing in the | serverless space at its origins. how many websites do you | need to run... ie you can't run more than 50 functions, or | the 16 cell table on different runtime options (ie. provision | servers for your server less)... consumption is better, but | the origins in web apps means its just a different product.. | hey every function has a url by default :shrug: azure needs a | radical rethink of what serverless is, I haven't seen any | evidence they got the memo. in aws, lambda originated out of | s3, re bring compute to storage. | danielskogly wrote: | Great article and video, and very exciting concept! Looking | forward to a JS implementation, but that looks like a challenge | to get done. | | And now I feel (a tiny bit) bad for sniping ffmpeg.fly.dev :) | tlivolsi wrote: | On an unrelated note, what syntax highlighting theme did you | use for the code? I love it. | willsmith72 wrote: | Pretty cool idea, and that api is awesome. | | > CPU bound work like video transcoding can quickly bring our | entire service to a halt in production | | Couldn't you just autoscale your app based on cpu though? | quaunaut wrote: | Yes and no: Maybe the rest of your workloads don't require much | CPU- you only need this kind of power for one or two workloads, | and you don't want them getting crowded out by other work | potentially. | | Or they require a GPU. | | Or your core service only needs 1-2 servers, but you need to | scale up to dozens/hundreds/thousands on demand, for work that | only happens maybe once a day. | willsmith72 wrote: | fair enough. | | i think it's cool tech, but none of those things are "hair on | fire" problems for me. i'm sure they are for some people. | chrismccord wrote: | Thanks! I try to address this thought in the opening. The issue | with this approach is you are scaling at the wrong level of | operation. You're scaling your entire app, ie webserver, in | order to service specific hot operations. Instead what we want | (and often reach for FaaS for) is _granular_ elastic scale. The | idea here is we can do this kind of granular scale for our | existing app code rather that smashing the webserver /workers | scale buttons and hoping for the best. Make sense? | stuartaxelowen wrote: | If you autoscale based on CPU consumption, doesn't the macro | level scaling achieve the same thing? Is the worry scaling | small scale services where marginal scaling is a higher | multiple, e.g. waste from unused capacity? | sofixa wrote: | Very interesting concept, however it's a bit soured by the fact | that Container-based FaaS is never mentioned, and it removes a | decent chunk of the negatives around FaaS. Yeah you still need to | deal with the communication layer (probably with managed services | such as SQS or Pub/Sub), but there's no proprietary runtime | needed, no rewrites needed between local/remote runtime | environments. | willsmith72 wrote: | what are some examples of container-based faas? like you put | your docker image onto lambda? | sofixa wrote: | * Google Cloud Run - | https://cloud.google.com/run/docs/deploying#command-line | | * OpenFaaS - https://www.openfaas.com/blog/porting-existing- | containers-to... | | * AWS Lambda - https://docs.aws.amazon.com/prescriptive- | guidance/latest/pat... | | * Scaleway Serverless Containers - | https://www.scaleway.com/en/serverless-containers/ | | * Azure Container Instances - https://learn.microsoft.com/en- | us/azure/container-instances/... | | Probably others too, those are just the ones I know off the | top of my head. I see very little reason to use traditional | Function-based FaaS, which forces you into a special, locked- | in framework, instead of using containers that work | everywhere. | willsmith72 wrote: | ok yeah so like an image on lambda, totally agree, a lot of | the pros of serverless without a lot of the cons | dprotaso wrote: | https://knative.dev/ - (CloudRun API is based on this OSS | project) | chrismccord wrote: | Bring-your-own-container is certainly better than proprietary | js runtimes, but as you said it carries every other negative I | talk about in the post. You get to run your language of choice, | but you're still doing all the nonsense. And you need to reach | for the mound of proprietary services to actually ship | features. This doesn't move the needle for me, but I would be | happy to have it if forced to use FaaS. | agundy wrote: | Looks like a great integrated take on carving out serverless | work. Curious to see how it handles the server parts of | serverless like environment variables, db connection counts, etc. | | One potential gotcha I'm curious if there is a good story for is | if it can guard against code that depends on other processes in | the local supervision tree. I'm assuming since it's talking about | Ecto inserts it brings over and starts the whole apps supervision | tree on the function executor but that may or may not be desired | for various reasons. | chrismccord wrote: | It starts your whole app, including the whole supervision tree, | but you can turn on/off services based on whatever logic you | want. I talk a bit about this in the screencast. For example, | no need to start the phoenix endpoint (webserver) since we | aren't serving web traffic. For the DB pool, you'd set a lower | pool size or single connection in your runtime configuration | based on the presence of FLAME parent or not. | agundy wrote: | Oh cool! Thanks for the reply, haven't had time to watch the | screencast yet. Looking forward to it. | OJFord wrote: | This is one reason I really don't like US headline casing as | enforced by HN - it looks like Serverless, as in the capital-S | company, serverless.com, is what's being rethought, not the | small-s principle. | | (Aside: I _wish_ someone would rethink Serverless, heh.) | davidjfelix wrote: | This is a very neat approach and I agree with the premise that we | need a framework that unifies some of the architecture of cloud - | shuttle.rs has some thoughts here. I do take issue with this | framing: | | - Trigger the lambda via HTTP endpoint, S3, or API gateway ($) | * Pretending that starting a fly machine doesn't cost the same as | triggering via s3 seems disingenuous. | | - Write the bespoke lambda to transcode the video ($) | * In go this would be about as difficult as flame -- you'd have | to build a different entrypoint that would be 1 line of code but | it could be the same codebase. Node it would depend on bundling | but in theory you could do the same -- it's just a promise that | takes an S3 event, that doesn't seem much different. | | - Place the thumbnail results into SQS ($) * I | wouldn't do this at all. There's no reason the results need to be | queued. Put them in a deterministically named s3 bucket where | they'll live and be served from. Period. | | - Write the SQS consumer in our app (dev $) * | Again -- this is totally unnecessary. Your application *should | forget* it dispatched work. That's the point of dispatching it. | If you need subscribers to notice it or do some additional work | I'd do it differently rather than chaining lambdas. | | - Persist to DB and figure out how to get events back to active | subscribers that may well be connected to other instances than | the SQS consumer (dev $) * Your lambda really | should be doing the DB work not your main application. If you've | got subscribers waiting to be informed the lambda can fire an SNS | notification and all subscribed applications will see "job 1234 | complete" | | So really the issue is: | | * s3 is our image database | | * our app needs to deploy an s3 hook for lambda | | * our codebase needs to deploy that lambda | | * we might need to listen to SNS | | which is still some complexity, but it's not the same and it's | not using the wrong technology like some chain of SQS nonsense. | chrismccord wrote: | Thanks for the thoughts - hopefully I can make this more clear: | | > * Pretending that starting a fly machine doesn't cost the | same as triggering via s3 seems disingenuous. | | You're going to be paying for resources wherever you decide to | run your code. I don't think this needs to be spelled out. The | point about costs is rather than paying to run "my app", I'm | paying at multiple layers to run a full solution to my problem. | Lambda gateway requests, S3 put, SQS insert, each have their | own separate costs. You pay a toll at every step instead of a | single step on Fly or wherever you host your app. | | > * I wouldn't do this at all. There's no reason the results | need to be queued. Put them in a deterministically named s3 | bucket where they'll live and be served from. Period. This is | totally unnecessary. Your application _should forget_ it | dispatched work. That 's the point of dispatching it. If you | need subscribers to notice it or do some additional work I'd do | it differently rather than chaining lambdas. | | You still need to tell your app about the generated thumbnails | if you want to persist the fact they exist where you placed | them in S3, how many exist, where you left off, etc. | | > * Your lambda really should be doing the DB work not your | main application. If you've got subscribers waiting to be | informed the lambda can fire an SNS notification and all | subscribed applications will see "job 1234 complete" | | This is _exactly_ my point. You bolt on ever more Serverless | offerings to accomplish any actual goal of your application. | SNS notifications is exactly the kind of thing I don 't want to | think about, code around, and pay for. I have | Phoenix.PubSub.broadcast and I continue shipping features. It's | already running on all my nodes and I pay nothing for it | because it's already baked into the price of what I'm running - | my app. | davidjfelix wrote: | > This is exactly my point. You bolt on ever more Serverless | offerings to accomplish any actual goal of your application. | SNS notifications is exactly the kind of thing I don't want | to think about, code around, and pay for. I have | Phoenix.PubSub.broadcast and I continue shipping features. | It's already running on all my nodes and I pay nothing for it | because it's already baked into the price of what I'm running | - my app. | | I think this is fine if and only if you have an application | that can subscribe to PubSub.broadcast. The problem is that | not everything is Elixir/Erlang or even the same language | internally to the org that runs it. The solution | (unfortunately) seems to be reinventing everything that made | Erlang good but for many general purpose languages at once. | | I see this more as a mechanism to signal the runtime | (combination of fly machines and erlang nodes running on | those machines) you'd like to scale out for some scoped | duration, but I'm not convinced that this needs to be | initiated from inside the runtime for erlang in most cases -- | why couldn't something like this be achieved externally | noticing the a high watermark of usage and adding nodes, much | like a kubernetes horizontal pod autoscaler? | | Is there something specific about CPU bound tasks that makes | this hard for erlang that I'm missing? | | Also, not trying to be combative -- I love Phoenix framework | and the work y'all are doing at fly, especially you Chis, | just wondering if/how this abstraction leaves the walls of | Elixir/Erlang which already has it significantly better than | the rest of us for distributed abstractions. | tonyhb wrote: | You're literally describing what we've built at | https://www.inngest.com/. I don't want to talk about us | much in this post, but it's _so relevant_ it 's hard not to | bring it up. (Huge disclaimer here, I'm the co-founder). | | In this case, we give you global event streams with a | durable workflow engine that any language (currently | Typescript, Python, Go, Elixir) can hook into. Each step | (or invocation) is backed by a lightweight queue, so queues | are cheap and are basically a 1LOC wrapper around your | existing code. Steps run as atomic "transactions" which | must commit or be retried within a function, and are as | close to exactly once as you could get. | ekojs wrote: | I don't know if I agree with the argument regarding durability vs | elastic execution. If I can get both (with a nice API/DX) via | something like Temporal (https://github.com/temporalio/temporal), | what's the drawback here? | bovermyer wrote: | As an alternative to Lambdas I can see this being useful. | | However, the overhead concerns me. This would only make sense in | a situation where the function in question takes long enough that | the startup overhead doesn't matter or where the main application | is running on hardware that can't handle the resource load of | many instances of the function in question. | | I'm still, I think, in the camp of "monoliths are best in most | cases." It's nice to have this in the toolbox, though, for those | edge cases. | cchance wrote: | He commented in another post that they use pooling so you don't | really pay the cold start penalty as often as you'd think so | maybe not a issue? | freedomben wrote: | I don't think this goes against "monoliths are best in most | cases" at all. In fact it supports that by letting you code | like it's all one monolith, but behind-the-scenes it spins up | the instance. | | Resource-wise if you had a ton of unbounded concurrency then | that would be a concern as you could quickly hit instance | limits in the backend, but the pooling strategy discussed lower | in the post addresses that pretty well, and gives you a good | monitoring point as well. | tonyhb wrote: | This is great! It reminds me of a (very lightweight) Elixir | specific version of what we built at https://www.inngest.com/. | | That is, we both make your existing code available to serverless | functions by wrapping with something that, essentially, makes the | code callable via remote-RPC . | | Some things to consider, which are called out in the blog post: | | Often code like this runs in a series of imperative steps. Each | of these steps can run in series or parallel as additional | lambdas. However, there's implicit state captured in variables | between steps. This means that functions become _workflows_. In | the Inngest model, Inngest captures this state and injects it | back into the function so that things are durable. | | On the note of durability, these processes should also be backed | by a queue. The good thing about this model is that queues are | cheap. When you make queues cheap (eg. one line of code) | _everything becomes easy_ : any developer can write reliable code | without worrying about infra. | | Monitoring and observability, as called out, is critical. Dead | letter queues suck absolute major heaving amounts of nauseous | air, and being able to manage and replay failing functions or | steps is critical. | | A couple differences wrt. FLAME and Inngest. Inngest is queue | backed, event-driven, and servable via HTTP across any language. | Because Inngest backs your state externally, you can write a | workflow in Elixir, rewrite it in Typescript, redeploy, and | running functions live migrate across backend languages, similar | to CRIU. | | Being event-driven allows you to manage flow control: everything | from debounce to batching to throttling to fan-out, across any | runtime or language (eg. one Elixir app on Fly can send an event | over to run functions on TypeScript + Lambda). | | I'm excited where FLAME goes. I think there are similar goals! | chrismccord wrote: | Ingest looks like an awesome service! I talk about job | processors/durability/retries in the post. For Elixir | specifically for durability, retries, and workflows we reach | for Oban, which we'd continue to do here. The Oban job would | call into FLAME to handle the elastic execution. | darwin67 wrote: | FYI: there's an Elixir SDK for Inngest as well. Haven't fully | announced it yet, but plan to post it in ElixirForum some | time soon. | | https://github.com/inngest/ex_inngest | gitgud wrote: | > _It then finds or boots a new copy of our entire application | and runs the function there._ | | So for each "Flame.call" it begins a whole new app process and | copies the execution context in? | | A very simple solution to scaling, but I'd imagine this would | have some disadvantages... | | Adding 10ms to the app startup time, adds 10ms to every | "Flame.call" part of the application too... same with memory I | suppose | | I guess these concerns just need to be consider when using this | system | chrismccord wrote: | The FLAME.Pool discussed later in the post addresses this. | Runners are pooled and remain configurable hot for whatever | time you want before idling down. Under load you are rarely | paying the cold start time because the pool is already hot. We | are also adding more sophisticated pool growth techniques to | the Elixir library next so you also avoid hitting an at | capacity runner and cold starting one. | | For hot runners, the only overhead is the latency between the | parent and child, which should be the same datacenter so 1ms or | sub 1ms. | bo0tzz wrote: | Currently the per-runner concurrency is limited by a fixed | number. Have you thought about approaches that instead base | this on resource usage, so that runners can be used | optimally? | chrismccord wrote: | Yes, more sophisticated pool growth options is something I | want longer term. We can also provide knobs that will let | you drive the pool growth logic yourself if needed. | solatic wrote: | Cold start time is _the_ issue with most serverless runtimes. | | Your own mission statement states: "We want on-demand, | granular elastic scale of specific parts of our app code." | Doing that _correctly_ is _fundamentally_ a question of how | long you need to wait for cold starts, because if you have a | traffic spike, the spiked part of the traffic is simply not | being served until the cold start period elapses. If you 're | running hot runners with no load, or if you have incoming | load without runners (immediately) serving them, then you're | not really delivering on your goal here. AWS EC2 has had | autoscaling groups for more than a decade, and of course, a | VM is essentially a more elaborate wrapper for any kind of | application code you can write, and one with a longer cold- | start time. | | > Under load you are rarely paying the cold start time | because the pool is already hot. | | My spiky workloads beg to differ. | bo0tzz wrote: | Depending of course on the workload and request volume, I | imagine you could apply a strategy where code is run | locally while waiting for a remote node to start up, so you | can still serve the requests on time? | solatic wrote: | No, because then you're dividing the resources allocated | to the function among the existing run + the new run. If | you over-allocate ahead of time to accommodate for this, | you might as well just run ordinary VMs, which always | have excess allocation locally; the core idea of scaling | granularly is that you only allocate the resources you | need for that single execution (paying a premium compared | to a VM but less overall for spiky workloads since less | overhead will be wasted). | conradfr wrote: | In a Elixir/Phoenix app I don't think this will be really | used for web traffic and more for background/async jobs. | tardismechanic wrote: | > FLAME - Fleeting Lambda Application for Modular Execution | | Reminds me of 12-factor app (https://12factor.net/) especially | "VI. Processes" and "IX. Disposability" | rubenfiszel wrote: | That's great. I agree with the whole thesis. | | We took an alternative approach with https://www.windmill.dev | which is to consider the unit of abstraction to be at the source | code level rather than the container level. We then parse the | main function, and imports to extract the args and dependencies, | and then run the code as is in the desired runtime (typescript, | python, go, bash). Then all the secret sauce is to manage the | cache efficiently so that the workers are always hot regardless | of your imports | | It's not as integrated in the codebase as this, but the audience | is different, our users build complex workflows from scratch, | cron jobs, or just one-off scripts with the auto-generated UI. | Indeed the whole context in FLAME seems to be snapshotted and | then rehydrated on the target VM. Another approach would be to | introduce syntax to specify what is required context from what is | not and only loading the minimally required. That's what we are | currently exploring for integrating better Windmill with existing | codebase instead of having to rely on http calls. | bo0tzz wrote: | > Indeed the whole context in FLAME seems to be snapshotted and | then rehydrated on the target VM. Another approach would be to | introduce syntax to specify what is required context from what | is not and only loading the minimally required. | | This isn't strictly what is happening. FLAME just uses the | BEAM's built in clustering features to call a function on a | remote node. That implicitly handles transferring only the | context that is necessary. From the article: | | > FLAME.call accepts the name of a runner pool, and a function. | It then finds or boots a new copy of our entire application and | runs the function there. Any variables the function closes over | (like our %Video{} struct and interval) are passed along | automatically. | rubenfiszel wrote: | Fair point, TIL about another incredible capability of the | BEAM. As long as you're willing to write Elixir, this is | clearly a superior scheme for deferred tasks/background jobs. | | One issue I see with this scheme still is that you have to be | careful of what you do at initialization of the app since now | all your background jobs are gonna run that. For instance, | maybe your task doesn't need to be connected to the db and as | per the article it will if your app does. They mention having | hot-modules, but what if you want to run 1M of those jobs on | 100 workers, you now have a 100 unnecessary apps. It's | probably a non-issue, the number of things done at | initialization could be kept minimal, and FLAME could just | have some checks to skip initialization code when in a flame | context. | chrismccord wrote: | This is actually a feature. If you watch the screencast, I | talk about Elixir supervision trees and how all Elixir | programs carefully specify the order their services stop | and stop in. So if your flame functions need DB access, you | start your Ecto.Repo with a small or single DB connection | pool. If not, you flip it off. | | > It's probably a non-issue, the number of things done at | initialization could be kept minimal, and FLAME could just | have some checks to skip initialization code when in a | flame context. | | Exactly :) | jrmiii wrote: | So, Chris, how do you envision the FLAME child | understanding what OTP children it needs to start on | boot, because this could be FLAME.call dependent if you | have multiple types of calls as described above. Is there | a way to pass along that data or for it to be pulled from | the parent? | | Acknowledging this is brand new; just curious what your | thinking is. | | EDIT: Would it go in the pool config, and a runner as a | member of the pool has access to that? | chrismccord wrote: | Good question. The pools themselves in your app will be | per usecase, and you can reference the named pool you are | a part of inside the runner, ie by looking in system env | passed as pool options. That said, we should probably | just encode the pool name along with the other parent | info in the `%FLAME.Parent{}` for easier lookup | jrmiii wrote: | Ah, that makes a lot of sense - I think the | FLAME.Parent{} approach may enable backends that wouldn't | be possible otherwise. | | For example, if I used the heroku api to do the | equivalent of ps:scale to boot up more nodes - those new | nodes (dynos in heroku parlance) could see what kind of | pool members they are. I don't think there is a way to do | dyno specific env vars - they apply at the app level. | | If anyone tries to do a Heroku backend before I do, an | alternative might be to use distinct process types in the | Procfile for each named pool and ps:scale those to 0 or | more. | | Also, might need something like Supabase's | libcluster_postgres[1] to fully pull it off. | | EDIT2: So the heroku backend would be a challenge. You'd | maybe have to use something like the formation api[2] to | spawn the pool, but even then you can't idle them down | because Heroku will try to start them back. I.e. there's | no `restart: false` from what I can tell from the docs or | you could use the dyno api[3] with a timeout set up front | (no idle awareness) | | [1] https://github.com/supabase/libcluster_postgres | | [2] https://devcenter.heroku.com/articles/platform-api- | reference... | | [3] https://devcenter.heroku.com/articles/platform-api- | reference... | Nezteb wrote: | Oops you've got an extra w, here is the URL for anyone looking: | https://www.windmill.dev/ | | I love the project's goals; I'm really hoping Windmill becomes | a superior open-source Retool/Airtable alternative! | rubenfiszel wrote: | Thanks, fixed! (and thanks) | AlchemistCamp wrote: | This looks fantastic! At my last gig we had exactly the "nuts" | FaaS setup described in the article for generating thumbnails and | alternate versions of images and it was a source of unnecessary | complexity. | RcouF1uZ4gsC wrote: | This seems a lot like the "Map" part of map-reduce. | neoecos wrote: | Awesome work, let's see how long it takes to get the Kubernetes | backend. | anonyfox wrote: | Amazing addition to Elixir for even more scalability options! | Love it! | isoprophlex wrote: | Whoa, great idea, explained nicely! | | Elixir looks ridiculously powerful. How's the job market for | Elixir -- could one expect to have a chance at making money | writing Elixir? | ed wrote: | Yep! Elixir is ridiculously powerful. Best place to look for | work is the phoenix discord which has a pretty active job | channel. | anonyfox wrote: | It's indeed very powerful and there are jobs out there. Besides | being an excellent modern toolbox for lots of problems | (scaling, performance, maintenance) and having the arguably | best frontend-tech in the industry (LiveView), the Phoenix | framework also is the most loved web framework and elixir | itself the 2nd most loved language according to the | stackoverflow survey. | | Its still a more exotic choice of a tech stack, and IMO its | best suited for when you have fewer but more senior devs | around, this is where it really shines. But I also found that | phoenix codebase survived being "tortured" by a dozen juniors | over years quite well. | | I basically make my money solely with Elixir and have been for | ~5 years now, interrupted only by gigs as a devops for the | usual JS nightmares including serverless (where the cure always | has been rewriting to Elixir/Phoenix at the end). | imafish wrote: | Having dealt with the pain and complexity of a 100+ lambda | function app for the last 4 years, I must say this post | definitely hits the spot wrt. the downsides of FaaS serverless | architectures. | | When starting out, these downsides are not really that visible. | On the contrary, there is a very clear upside, which is that | everything is free when you have low usage, and you have little | to no maintenance. | | It is only later, when you have built a hot mess of lambda | workflows, which become more and more rigid due to | interdependencies, that you wish you had just gone the monolith | route and spent the few extra hundreds on something self-managed. | (Or even less now, e.g. on fly.io) | | A question for author: what if not using Elixir? | chrismccord wrote: | I talk about FLAME outside elixir in one one of the sections in | the blog. The tldr; is it's a generally applicable pattern for | languages with a reasonable concurrency model. You likely won't | get all the ergonomics that we get for free like functions with | captured variable serialization, but you can probably get 90% | of the way there in something like js, where you can move your | modular execution to a new file rather than wrapping it in a | closure. Someone implementing a flame library will also need to | write the pooling, monitoring, and remote communication bits. | We get a lot for free in Elixir on the distributed messaging | and monitoring side. The process placement stuff is also really | only applicable to Elixir. Hope that helps! | jrmiii wrote: | > functions with captured variable serialization | | Can't wait for the deep dive on how that works | hinkley wrote: | A pattern I see over and over, which has graduated to somewhere | between a theorem and a law, is that motivated developers can | make just about any process or architecture work for about 18 | months. | | By the time things get bad, it's almost time to find a new job, | especially if the process was something you introduced a year | or more into your tenure and are now regretting. I've seen it | with a handful of bad bosses, at least half a dozen times with | (shitty) 'unit testing', scrum, you name it. | | But what I don't know is how many people are mentally aware of | the sources of discomfort they feel at work, instead of a more | nebulous "it's time to move on". I certainly get a lot of | pushback trying to name uncomfortable things (and have a lot | less bad feelings about it now that I've read Good to Great). | Nobody wants to say, "Oh look, the consequences of my actions." | | The people materially responsible for the Rube Goldberg machine | I help maintain were among the first to leave. The captain of | that ship asked a coworker of mine if he thought it would be a | good idea to open source our engine. He responded that nobody | would want to use our system when the wheels it reinvented | already exist (and are better). That guy was gone within three | to four months, under his own steam. | antod wrote: | That's why I'm always wary of people who hardly ever seem to | stay anywhere more than a couple of years. | | There's valuable learning (and empathy too) in having to see | your own decisions and creations through their whole | lifecycle. Understanding how tech debt comes to be, what | tradeoffs were involved and how they came to bite later. | Which ideas turned out to be bad in hindsight through the | lens of the people making them at the time. | | Rather than just painting the previous crowd as incompetent | while simultaneously making worse decisions you'll never | experience the consequences of. | | Moving on every 18-24 months leaves you with a potentially | false impression of your own skills/wisdom. | katzgrau wrote: | And don't forget that the developer fought like hell to use | that new process, architecture, pattern, framework, etc | icedchai wrote: | I couldn't even stand having a dozen lambdas. The app was | originally built by someone who didn't think much about | maintenance or deployment. Code was copy-pasted all over the | place. Eventually, we moved to a "fat lambda" monolith where a | single lambda serves multiple endpoints. | viraptor wrote: | > that you wish you had just gone the monolith route | | Going from hundreds of lambdas to a monolith is overreacting to | one extreme by going the other one. There's a whole spectrum of | possible ways to split a project in useful ways, which simplify | development and maintenance. | p10jkle wrote: | I'm working on something that I think might solve the problem | in any language (currently have an sdk for typescript, and java | in the works). You can avoid splitting an application into 100s | of small short-running chunks if you can write normal service- | orientated code, where lambdas can call each other. But this | isn't possible without paying for all that time waiting around. | If the Lambdas can pause execution while they are blocked on | IO, it solves the problem. So I think durable execution might | be the answer! | | I've been working on a blog post to show this off for the last | couple of weeks: | | https://restate.dev/blog/suspendable-functions-make-lambda-t... | solardev wrote: | Superficially, this sounds similar to how Google App Engine and | Cloud Run already work | (https://cloud.google.com/appengine/migration- | center/run/comp...). Both are auto-scaling containers that can | run a monolith inside. | | Is that a fair comparison? | chrismccord wrote: | They handle scaling at only highest level, similar to spinning | up more dynos/workers/webservers like I talk about in the | intro. FLAME is about elastically scaling individual hot | operations of your app code. App Engine and such are about | scaling at the level of your entire app/container. Splitting | your operations into containers then breaks the monolith into | microservice pieces and introduces all the downsides I talk | about in the post. Also, while it's your code/language, you | still need to interface with the mount of proprietary offerings | to actual accomplish your needs. | hq1 wrote: | So how does it work if there are workers in flight and you | redeploy the main application? | bo0tzz wrote: | The workers get terminated. If the work they were doing is | important, it should be getting called from your job queue and | so it should just get started up again. | chrismccord wrote: | If you're talking about inflight work that is running on the | runner, there is a Terminator process on the runner that will | see the parent go away, then block on application shutdown for | the configured `:shutdown_timeout` as long as active work is | being done. So active processes/calls/casts are given a | configurable amount of time to finish and no more work is | accepted by the runner. | | If you're talking about a FLAME.call at app shutdown that | hasn't yet reached the runner, it will follow the same app | shutdown flows of the rest of your code and eventually drop | into the ether like any other code path you have. If you want | durability you'd reach for your job queue (like Oban in Elixir) | under the same considerations as regular app code. Make sense? | arianvanp wrote: | One thing I'm not following how this would work with IAM etc. The | power of Lambda to me is that it's also easy to deal with | authorization to a whole bunch of AWS services. If I fire off a | flame to a worker in a pool and it depends on say accessing | DynamoDB, how do I make sure that that unit of work has the right | IAM role to do what it needs to do? | | Similarly how does authorization/authentication/encryption work | between the host and the forked of work? How is this all secured | with minimal permissions? | xavriley wrote: | > how does authorization between the host and the forked work? | | On fly.io you get a private network between machines so comms | are already secure. For machines outside of fly.io it's | technically possible to connect them using something like | Tailscale, but that isn't the happy path. | | > how do I make sure that the unit of work has the right IAM | | As shown in the demo, you can customise what gets loaded on | boot - I can imagine that you'd use specific creds for services | as part of that boot process based on the node's role. | timenova wrote: | I have a question about distributed apps with FLAME. Let's say | the app is running in 3 Fly regions, and each region has 2 | "parent" servers with LiveViews and everything else. | | In that case, how should the Flame pools look like? Do they | communicate in the same region and share the pools? Or are Flame | pools strictly children of each individual parent? Does it make a | difference in pricing or anything else to run on hot workers | instead of starting up per parent? | | What would you recommend the setup be in such a case? | | Aside: I really liked the idea of Flame with Fly. It's a really | neat implementation for a neat platform! | chrismccord wrote: | > Or are Flame pools strictly children of each individual | parent? | | Confirm. Each parent node runs its own pool. There is no global | coordination by design. | | > Does it make a difference in pricing or anything else to run | on hot workers instead of starting up per parent? | | A lot would depend on what you are doing, the size of runner | machines you decide to start in your pools (which can be | different sizes from the app or other pools), etc. In general | Elixir scales well enough that you aren't going to be running | your app in every possible region. You'll be in a handful of | regions servicing traffic in those regions and the load each | region has. You _could_ build in your own global coordination | on top, ie try to find processes running on the cluster already | (which could be running in a FLAME runner), but you 're in | distributed systems land and it All Depends(tm) what you're | building the tradeoffs you want. | timenova wrote: | Thanks for the reply! | | Can I suggest adding some docs to Fly to run Flame apps? To | cover the more complex aspects of integrating with Fly, such | as running Flame machines with a different size compared to | the parent nodes, what kind of fly.toml config works and | doesn't work with Flame, such as the auto_start and auto_stop | configurations on the parent based on the number of requests, | and anything else particularly important to remember with | Fly. | hinkley wrote: | > Also thanks to Fly infrastructure, we can guarantee the FLAME | runners are started in the same region as the parent. | | If customers think this is a feature and not a bug, then I have a | very different understanding about what serverless/FaaS is meant | to be used for. My division is pretty much only looking at edge | networking scenarios. Can I redirect you to a CDN asset in Boston | instead of going clear across the country to us-west-1? We would | definitely NOT run Lamba out of us-west-1 for this work. | | There are a number of common ways that people who don't | understand concurrency think they can 'easily' or 'efficiently' | solve a problem that provably do not work, and sometimes | tragicomically so. This feels very similar and I worry that fly | is Enabling people here. | | Particularly in Elixir, where splitting off services is already | partially handled for you. | aidos wrote: | I used a service years ago that did effectively this. PiCloud | were sadly absorbed into Dropbox but before that they had exactly | this model of fanning out tasks to workers transparently. They | would effectively bundle your code and execute it on a worker. | | There's an example here. You'll see it's exactly the same model. | | https://github.com/picloud/basic-examples/blob/master/exampl... | | I've not worked with Elixer but I used Erlang a couple of decades | back and it appears BEAM hasn't changed much (fundamentally). My | suspicion is that it's much better suited for this work since | it's a core part of the design. Still, not a totally free lunch | because presumably there a chance the primary process crashes | while waiting? | thefourthchime wrote: | I created something similar at my work, which I call "Long | Lamda", the idea is that what if a lambda could run more than 15 | minutes? Then do everything in a Lambda. An advantage of our | system as is you can also run everything locally and debug it. I | didn't see that with the FLAME but maybe I missed it. | | We use it for our media supply chain which processes a few | hundred videos daily using various systems. | | Most other teams drank the AWS Step Koolaid and have thousands of | lambas deployed, with insane development friction and | surprisingly higher costs. I just found out today that we spend | 6k a month on "Step Transitions", really?! | jrmiii wrote: | > you can also run everything locally and debug it. I didn't | see that with the FLAME but maybe I missed it. | | He mentioned this: | | > With FLAME, your dev and test runners simply run on the local | backend. | | and this | | > by default, FLAME ships with a LocalBackend | seabrookmx wrote: | I'm firmly in the "I prefer explicit lambda functions for off- | request work" camp, with the recognition that you need a lot of | operational and organizational maturity to keep a fleet of | functions maintainable. I get that isn't everyone's cup of tea or | a good fit for every org. | | That said, I don't understand this bit: | | > Leaning on your worker queue purely for offloaded execution | means writing all the glue code to get the data into and out of | the job, and back to the caller or end-user's device somehow | | I assumed by "worker queue" they were talking about something | akin to Celery in python land, but it actually does handle all | this glue. As far as I can tell, Celery provides a very similar | developer experience to FLAME but has the added benefit that if | you do want durability those knobs are there. The only real | downside seems you need redis or rabbit to facilitate it? I don't | have any experience with them but I'd assume it's the same story | with other languages/frameworks (eg ruby+sidekiq)? | | Maybe I'm missing something. | jrmiii wrote: | Yeah, I think this was more inward focusing on things like | `Oban` in elixir land. | | He's made the distinction in the article that those tools are | great when you need durability, but this gives you a lower | ceremony way to make it Just Work(tm) when all you're after is | passing off the work. | josevalim wrote: | Wouldn't you lose, for example, streaming capabilities once you | use Celery? You would have to first upload the whole video, | then enqueue the job, and then figure out a mechanism to send | the thumbnails back to that client, while with FLAME you get a | better user experience by streaming thumbnails as soon as the | upload starts. | | I believe the main point though is that background workers and | FLAME are orthogonal concepts. You can use FLAME for | autoscaling, you can use Celery for durability, and you could | use Celery with FLAME to autoscale your background workers | based on queue size. So being able to use these components | individually will enable different patterns and use cases. ___________________________________________________________________ (page generated 2023-12-06 23:00 UTC)