[HN Gopher] Reliability: It's Not Great
       ___________________________________________________________________
        
       Reliability: It's Not Great
        
       Author : bishopsmother
       Score  : 518 points
       Date   : 2023-03-06 17:47 UTC (5 hours ago)
        
 (HTM) web link (community.fly.io)
 (TXT) w3m dump (community.fly.io)
        
       | revskill wrote:
       | Fly.io seems like Vercel 1.0 (where you can just deploy docker
       | image and done), but it's more than that, with configurable
       | volumes, secrets,...
       | 
       | I'm bullish on fly.io.
        
       | [deleted]
        
       | skywhopper wrote:
       | Interesting issues. Nothing surprising for anyone who's run a
       | global SaaS before, especially if growth has been incredibly
       | fast. I find the gripes about Consul, Nomad, and Vault
       | interesting since it sounds like the problems are mainly due to
       | poor architectural decisions. Fly is rewriting those tools rather
       | than invest in deploying them properly and in the process are
       | running into new issues that those tools have already solved,
       | which doesn't give me confidence that the path forward will be
       | any less bumpy.
        
       | victorbjorklund wrote:
       | Great post. Love how transparent Fly is. Im a customer but
       | without any life and death important apps. And yea they had some
       | issues lately so good they are addressing it.
        
       | tebbers wrote:
       | I really feel for Fly, as a potential customer. They are trying
       | really hard. I would still love to use them one day and this post
       | is definitely a step in the right direction. Growing is painful
       | but they have smart people working there so fingers crossed that
       | they sort this ASAP and it doesn't become existential.
        
       | ChrisMarshallNY wrote:
       | Well, I feel for them. Scaling up is a bitch.
       | 
       | I've been lucky, in the past, but a lot of that, is because I
       | have "overengineered," and the tools/frameworks have advanced to
       | meet the new demand.
       | 
       | I am in the middle of a complete, bottom-to-top rewrite of the
       | app we've been developing for the last couple of years. It's
       | going great, but making this leap was a fraught decision.
       | 
       | It's mainly, so I wouldn't have to write a post like that, in a
       | year or two.
       | 
       | We spent all the time refining it, until we had what we wanted,
       | and it worked great on our small test team.
       | 
       | Then, I loaded up a test server with 10,000 fake users, and
       | tossed the app at that. To be fair, we don't think we'll have
       | even that many users for quite a while. It's a very specialized
       | demographic.
       | 
       | * SOB *
       | 
       | It no do so well.
       | 
       | At that point, I had to decide whether to fix the issues (they
       | were quite fixable), or revisit the architecture.
       | 
       | The main issue with the architecture, was that it was an
       | "accreted" app, with changes gradually being factored in, as we
       | progressed. The main reason for this, is because no one really
       | knew what they wanted, until we ran it up the flagpole (sound
       | familiar?).
       | 
       | The business logic was distributed throughout the app. That was
       | ... _ugly_.
       | 
       | I envisioned myself, a year or two down the road, sucking on a
       | magnum, because the app had turned into a Cruftosaurus, and was
       | coming for me in my nightmares.
       | 
       | So I decided to rewrite, as we hadn't done any kind of MVP or
       | public beta, so we actually had the runway to do this.
       | 
       | I refined the entire business logic of the app into a single,
       | platform-agnostic SPM module, which took just over a month, and
       | have started to develop the app around that. It's pretty much a
       | rewrite, but I am recycling a lot of the app code. We also
       | brought in our designer, and he's looking at every screen I make.
       | It's working well for him.
       | 
       | Like I said, it's going great. Better than I expected.
       | 
       | I know that I have a huge luxury, and I'm grateful. I can credit
       | a lot of that, to doing some stress-testing before we got to a
       | point where we had a bunch of users to support. I was able to go
       | in, and go all Victor Frankenstein on the model.
       | 
       | The result, so far, is that this thing screams, and you don't
       | really even notice that there's that many users on it. The model
       | has already been proven (that SPM module), and all we're doing,
       | is chrome (which is a _ton_ of work).
        
         | bbkane wrote:
         | Hope this isn't a dumb question, but what's an "SPM module"?
        
       | [deleted]
        
       | soperj wrote:
       | For django, they should really contribute to 2 scoops django
       | cookie cutter program, so that you can get an out of the box
       | django instance that can just deploy to Fly.io.
        
         | nilsbunger wrote:
         | Their problem right now isn't adding more customers - they seem
         | to have more than they can handle!
         | 
         | If I were them I'd focus as many resources as possible on
         | making the stack rock-solid, and away from acquiring more
         | customers or adding more capabilities.
         | 
         | In fact I'd try to down-scope some features if at all possible,
         | like the example they give of disabling app deploys while
         | they're doing platform updates.
         | 
         | We use fly.io at a small scale and it's worked really well for
         | us, but the money is in customers at a larger scale who must
         | have 100% reliability.
        
         | llimllib wrote:
         | https://github.com/ehmatthes/django-simple-deploy will deploy
         | to fly (or platform.sh) out of the box, should be pretty much
         | the experience you're describing
        
           | soperj wrote:
           | thank you!
        
         | mixmastamyk wrote:
         | May be wrong, but they seem to be very focused on interesting
         | stacks like Elixir and RoR while building on Go/Rust. The
         | corollary being neglect of the bread/butter stacks with high
         | market share, like Python and Java/JavaScript. Don't think I've
         | seen a blog post discussing those three beyond a passing
         | mention?
         | 
         | Not the end of the world, but mildly disappointing. At least
         | they are all in with Postgres and Linux, a great foundation.
        
           | jjtheblunt wrote:
           | I thought the only thing their tech stack was all in on was
           | docker images being rewritten to run on firecracker as a
           | substrate.
           | 
           | is it not agnostic about things like Elixir etc, at the tech
           | level, though they've got super nice documentation for those
           | tools you mentioned?
        
             | ilaksh wrote:
             | What's necessary to change for them to run on Firecracker?
        
               | ign0ramus wrote:
               | Here's a post explaining how they do it
               | https://fly.io/blog/docker-without-docker/
        
               | ilaksh wrote:
               | Yeah, saw that before. I thought he was saying you have
               | to change your Dockerfiles to use them with fly.io. Just
               | misinterpreted that sentence.
        
           | [deleted]
        
           | tomwojcik wrote:
           | I don't disagree with you, but they tweeted
           | 
           | > We'll readily admit our docs still have a Django-shaped
           | hole in them.
           | 
           | https://twitter.com/flydotio/status/1578039196618575874?t=nu.
           | ..
        
           | ilaksh wrote:
           | You can use any Docker image.
        
         | tomwojcik wrote:
         | I needed a cookie cutter for my side projects so I've created
         | one for sqlite that I actively use and a similar one for
         | postgres. Both are very basic, contributions welcome.
         | 
         | Sqlite https://github.com/tomwojcik/django-fly-sqlite-template
         | 
         | Psql https://github.com/tomwojcik/django-fly-postgres-template
        
       | nu11ptr wrote:
       | Not a client of fly.io, but dang impressive for the company to be
       | this open and honest. Definite respect - wish more companies were
       | like this. It puts them on my short list almost immediately for
       | future needs.
        
         | nathell wrote:
         | Came here to say pretty much this. Technical issues come and
         | go; open communication is a core part of the company culture,
         | and builds up trust.
        
       | samwillis wrote:
       | Fundamentally I think some of the problems come down to the
       | difference between what Fly set out to build and what the market
       | currently want.
       | 
       | Fly (to my understanding) at its core is about _edge_ compute.
       | That is where they started and what the team are most excited
       | about developing. It 's a brilliant idea, they have the skills
       | and expertise. They are going to be successful at it.
       | 
       | However, at the same time the market is looking for a successor
       | to Heroku. A zero dev ops PAAS with instant deployment, dirt
       | simple managed Postgres, generous free level of service, lower
       | cost as you scale, and a few regions around the world. That isn't
       | what Fly set out to do... exactly, but is sort of the market they
       | find themselves in when Heroku then basically told its low value
       | customers to go away.
       | 
       | It's that slight miss alignment of strategy and market fit that
       | results in maybe decisions being made that benefit the original
       | vision, but not necessarily the immediate influx of customers.
       | 
       | I don't envy the stress the Fly team are under, but what an
       | exciting set of problems they are trying to solve, I do envy
       | that!
        
         | decidertm wrote:
         | I'm a co-founder at Northflank. This is what we've spent 3+
         | years building. https://northflank.com.
         | 
         | I am sympathetic with much of Kurt's post. We spent a long time
         | building solutions to several of the areas highlighted (managed
         | PG, persistent volumes, secret management and service
         | discovery).
         | 
         | Making radical changes to architecture on a live cloud platform
         | is always a challenge.
         | 
         | On the front-end Northflank is a next-gen PaaS built for high
         | DX, speed, and powerful capability (real-time UI, API, CLI,
         | GitOps, IaC).
         | 
         | Our backend is built using Kubernetes as an OS: providing a
         | huge amount of flexibility on service discovery, load-
         | balancing, persistence/volumes and scale.
         | 
         | The benefit of using Kubernetes is a universal API across all
         | major cloud providers. We can scale clusters and regions across
         | EKS, GKE and AKS in seconds, either in our managed PaaS or
         | inside our customer's own cloud account.
         | 
         | Our managed dataservices: MySQL, Postgres, Redis, Mongo, Minio
         | are all built using Kubernetes Operators with a small but
         | mighty team.
         | 
         | From a generous free tier to autoscaling to managed postgres
         | and other advanced PaaS/DevOps automation workflows Northflank
         | offers something unique.
        
         | trilobyte wrote:
         | > generous free level of service,
         | 
         | This is likely the biggest culprit for a lot of these
         | companies. Too many of us have grown up in the culture of
         | getting hosting and platform for "free", but at some point the
         | companies providing it still have to pay the bills. There has
         | to be a better pricing model that let's someone deploy their
         | relatively small, low-traffic app for $10s/month or even $200 -
         | $300 / year for the basics (e.g. - Heroku free tier type
         | capabilities). It's not going to save these companies but it
         | would limit excessive growth of their own costs from a free
         | tier while at the same time still being affordable for 1 - 2
         | person teams who are trying to get something in front of users.
        
           | karmelapple wrote:
           | I agree. And I know this is unpopular, but I think none of
           | these companies should be expected to have a free tier. A
           | low-cost tier? Certainly. Perhaps even a free trial with a
           | credit card? Great.
           | 
           | But our team, who has used Heroku for over a decade, got bit
           | multiple times by Heroku having a free tier.
           | 
           | Why were we impacted by other apps? Because Heroku's load
           | balancers are shared amongst all their apps. That includes
           | all the sketchy apps running on the platform.
           | 
           | If Heroku could somehow isolate us from everyone else? Great
           | - and they offered that for awhile with a reasonably-priced
           | Add-On supported by them called SSL Endpoint. It cost about
           | $15/month and put us into a pool that was shared with other
           | folks willing to spend that much per month to run their app.
           | 
           | I understand that's not great for a hobby project. But for
           | those of us trying to run a large product on Heroku and not
           | have to spend multiple extra thousands of dollars every month
           | for a Heroku Private Space, this was a great way of pooling:
           | put a small fee in place for one pool of resources. Not many
           | malware writers or other misbehaving app creators will
           | probably want to spend that much per month.
           | 
           | But they axed that a few years ago. Only a couple months
           | after when we were thrown back into the load balancer pool
           | with all the other free apps, one of the IPs was marked as
           | spam and we had to figure out a kind of janky solution.
           | 
           | Additionally, Heroku seemingly spent a ton of resources on
           | free tier support, malware fighting, etc. I hope to see more
           | features on Heroku since they've dropped that support... but
           | I haven't seen much evidence of that in roughly six months
           | since they did that. But we'll see.
        
             | trilobyte wrote:
             | Nice write up!
             | 
             | I wish I shared your enthusiasm for where Heroku could go
             | but I have a few friends at Salesforce I've asked about how
             | they see Heroku internally and it really doesn't seem like
             | it is going to get much love. Hope to be wrong though.
        
               | karmelapple wrote:
               | Thanks! I have talked with two Heroku folks who say (to
               | me, a paying customer of Heroku Enterprise) that Heroku
               | is absolutely in active development.
               | 
               | I let them know they need to demonstrate that to me. They
               | have a roadmap [1], but it seems to have barely anything
               | moving forward, including some really important concepts
               | like http/2 support.
               | 
               | [1] https://github.com/orgs/heroku/projects/130
        
             | ignoramous wrote:
             | > _And I know this is unpopular, but I think none of these
             | companies should be expected to have a free tier._
             | 
             | Free tier is a GTM motion which makes sense for novel tech
             | products like Fly because: https://en.wikipedia.org/wiki/Te
             | chnology_adoption_life_cycle
        
           | mike_hearn wrote:
           | If you don't need all that much, Oracle Cloud does offer a
           | free tier for VMs. You get 2 AMD VMs or 4 ARM VMs and even a
           | free Oracle DB, object storage, load balancing and
           | monitoring.
           | 
           | https://www.oracle.com/cloud/free/
           | 
           | It's still just a free tier so you can't expect good support,
           | but, it's there.
        
           | dcow wrote:
           | Check out DO app platform. It's literally exactly what you
           | describe.
        
         | ec109685 wrote:
         | The CloudFlare folks wrote a good blog post on how they are
         | seeing their customers use Edge compute -- latency is far down
         | on the list: https://blog.cloudflare.com/cloudflare-workers-
         | serverless-we...
        
           | everybodyknows wrote:
           | Hmm, that post is almost three years old -- still accurate?
        
             | prdonahue wrote:
             | Yes, especially as compliance and regulatory frameworks
             | continue to evolve and become more difficult to adhere to
             | as mentioned elsewhere in the comments.
             | 
             | We're inherently faster than other "serverless" platforms
             | due to the scale and homogeneous design of our network, and
             | that network has presence in nearly 50% more cities than it
             | did just 3 years ago. We were plenty fast enough then and
             | we're even faster now.
             | 
             | Other things that customers (still) really care about:
             | developer experience, ease of use, and cost. Nobody likes
             | paying the AWS tax to move data around--they just want to
             | use the best solution from the best cloud provider. Workers
             | and the associated storage primitives allow them to pick
             | and choose from the best that AWS, Azure, Cloudflare, GCP,
             | et al. have to offer.
             | 
             | (Disclaimer: I'm a long time Cloudflare employee focused on
             | App Sec, and I speak to customers regularly who look to
             | Workers largely for compliance reasons, but I don't work on
             | the Developer Platform business. Am sure my Dev Platform
             | peers will chime in with more nuanced answers!)
        
           | fmajid wrote:
           | The US CLOUD Act means a EU customer cannot use a US cloud
           | provider to host PII, even if the server itself is physically
           | in the EU, because US law will still compel the provider to
           | yield the data to US authorities. The European Commission is
           | trying to paper over the cracks with a fig leaf of judicial
           | review, but it's only a matter of time until a Schrems III
           | decision from the CJEU invalidates that polite fiction.
        
             | LunaSea wrote:
             | The amount of EU companies following this law is exactly 0.
        
               | speedgoose wrote:
               | It's not true. I know people who lost contracts because
               | they were using Azure and the customer wanted to respect
               | the law.
        
               | LunaSea wrote:
               | I've talked with companies like that as well and they
               | start with strict rules and end up allowing clouds
               | because no solution is compliant anyway.
        
               | pjmlp wrote:
               | I can attest that there are a lot more than zero in
               | Germany.
        
               | LunaSea wrote:
               | I would be glad to be shown a company with AWS, Google
               | Chrome, Google Search, Slack and all the usual suspects.
        
               | e12e wrote:
               | This simply isn't true. At least not for EEC(Norway).
        
               | LunaSea wrote:
               | I have never seen a company without Google Search, Google
               | Chrome, AWS, Microsoft 360 and the lot.
               | 
               | Which alternatives are they based on?
        
               | [deleted]
        
               | huijzer wrote:
               | Please tell the legal department of our uni. I'm stuck
               | with a home-made Kubernetes cluster where I have to mail
               | the admins for provisioning, SSL and domain management.
               | Would love to switch to Fly or Render
        
               | exac wrote:
               | I know I've personally spent a large portion of my time
               | updating systems to be compliant in the last few years,
               | in North American companies.
        
               | mro_name wrote:
               | might well have been yak shaving. If a company is under
               | US jurisdiction it simply cannot comply to EU data
               | protection.
        
               | mananaysiempre wrote:
               | ... Are those North American companies prepared to
               | willingly break EU laws then? Because in my (amateur)
               | understanding it's logically impossible to satisfy both
               | CLOUD Act requirements and EU data protection ones (not
               | just GDPR, but general due-process rights the CJEU
               | considers required for privacy violations and US courts
               | deny noncitizens).
        
         | mrkurt wrote:
         | This is, indeed, the exciting part. As Heroku fans, we never
         | really felt like it needed a replacement. And if it did, it
         | seemed like Render was the natural Heroku v.next.
         | 
         | One thing we've noticed, though, is that people do actually
         | want Heroku but close to users. It's not exactly edge compute.
         | In some cases, it's "Heroku in Tokyo". In others it's "Heroku,
         | but running in all the english speaking regions".
         | 
         | I think the thing that ate up most of our energy is also the
         | thing that might actually make this business work. We built on
         | top of our own hardware. That's the thing that made it
         | difficult to build managed Postgres. We put way more energy
         | into the underlying infrastructure than most nu-Heroku
         | companies. The cost was extreme, but I'm like 63% sure this was
         | the right choice.
        
           | jrochkind1 wrote:
           | > As Heroku fans, we never really felt like it needed a
           | replacement.
           | 
           | If Salesforce kept investing in heroku, it might not. But
           | there is a huge loss of confidence in heroku's future going
           | on among heroku's customers right now, which is part of what
           | you're seeing, as I'm sure you know. (Also I think to some
           | extent you are being political/kind towards heroku... if
           | heroku's owners were still investing in heroku for real,
           | adding 'edge' functionality like fly.io is focusing on is
           | what one would probably expect...)
           | 
           | And frankly... your tool seems more mature and... not to be
           | rude to competitors but seems to have more of that certain
           | `je ne sais quoi` of Developer Experience Happiness that
           | heroku _used_ to have and other competitors including the one
           | you mention don't really quite seem to have yet. Does what
           | you expect/need in a polished and consistent way.
           | 
           | That work you put into the underlying infrastructure
           | definitely shows, and was the right choice.
           | 
           | So I understand why people are looking to you as a heroku
           | replacement. I am too! (And I don't really need the edge
           | compute stuff; although I could potentially see using it in
           | the future, and it shows you folks are on top of things).
           | 
           | And while I kept reading you saying on HN comments that you
           | _didn 't_ want to be a heroku replacement, so were
           | unconcerned with the few places people were mentioning where
           | you still felt short of it -- when I saw your investment in
           | Rails documentation and tools (and contribs back to Rails), I
           | thought, aha, i think they've realized this is a market
           | _looking for them_ , which they are only a couple steps from
           | and it would make sense to meet.
           | 
           | When you mention in OP a "heroku exodus" to you... I'm
           | curious if that was all people who left when heroku ended
           | _free_ tier stuff, and they 've all come to you for your free
           | tier stuff... becuase that does seem dangerous, such a giant
           | spike in users who are not paying and don't bring revenue
           | with them! I don't personally use very much heroku free tier
           | stuff. I hope if that's a challenge, it's one you can get
           | over. I don't think you are under any obligation to offer
           | free stuff that can be used for real production workloads
           | indefinitely -- although, as I'm sure you know, free stuff is
           | huge for allowing people to try _before_ they buy, and
           | whatever limits you put on it to try to prevent indefinite
           | production use get in the way of someone's "try before you
           | buy" too... and at this point, _reducing_ your free offerings
           | is a lot harder PR-wise than having started out with less in
           | the first place. :(
        
           | KRAKRISMOTT wrote:
           | Just partner with Neon or other similar companies in this
           | space. Scale-to-zero replicated databases is well understood
           | technology.
           | 
           | https://neon.tech/
        
             | nikita wrote:
             | Yes, we are partnering with many companies like Hasura and
             | Replit to help with managed Postgres. Since Neon scales to
             | 0 and also autoscales to your workload it very economical
             | for the long tail of low usage customers.
        
         | mattbillenstein wrote:
         | Yeah, distributed systems at the global scale are very very
         | difficult - at least with the Heroku style problem, you'd be
         | looking at scaling in a single datacenter I think - deployments
         | to multiple datacenters wouldn't share dependencies.
         | 
         | I do wonder however if they'd be better off using less l33t
         | tech - do almost everything on Postgres vs consul and vault,
         | etc. Scaling, failover, consistency, etc is a more well-known
         | problem and there are a lot of people who've ran other DBs at
         | tremendous scale than the alternatives.
         | 
         | Simplicity is the key to reliability, but this isn't a simple
         | product, so idk.
        
         | ghiculescu wrote:
         | I coincidentally tweeted the exact same thing earlier today.
         | 
         | I selfishly hope Fly put all their focus toward becoming Heroku
         | 2.0. I'm sure some people care about all the edge latency stuff
         | but I don't know many of them.
        
         | hiAndrewQuinn wrote:
         | God, I'm glad someone else sees it as clearly as I do. I
         | learned about Fly from them acquiring freaking Litestream! The
         | SQLite replicator! The canonical database at the edge of the
         | network! Of course that's what they want to do.
        
         | bostik wrote:
         | There's a wonderfully blunt saying that applies here (too): you
         | are not in the business you think you are, you are in the
         | business your customers think you are.
         | 
         | If you offer data volumes, the _low water mark_ is how EBS
         | behaves. If you offer a really simple way to spin up Postgres
         | databases, you are implicitly promising a fully managed
         | experience.
         | 
         | And $deity forbid, if you want global CRUD with read-your-own-
         | writes semantics, the yardstick people measure you against is
         | Google's Spanner.
        
           | quickthrower2 wrote:
           | In a nutshell if you offer cloud services you need to be
           | better than the MAG clan, Digital Ocean too. And people will
           | want it dirt cheap. It's still hard to be a profitable web
           | host as it always was (MAG has the advantage that none of
           | them were web hosts at first base)
        
             | blowski wrote:
             | MAG?
        
               | jerrygenser wrote:
               | microsoft apple google
        
               | gizmo wrote:
               | Microsoft (azure) Amazon (aws) Google (gcloud)
        
               | adw wrote:
               | I'm assuming Azure, AWS, Google Cloud, but it's new to me
               | too
        
               | avidal wrote:
               | From context, I'm assuming Microsoft / Amazon / Google,
               | referring to Azure / AWS / Google Cloud respectively.
        
               | [deleted]
        
           | zamnos wrote:
           | Where does the misalignment between what the customer thinks
           | they want, and what they actually want fit in to your
           | philosophy? Google Spanner is a great example of this because
           | who _doesn 't_ want instantaneous global writes? It's just
           | that, y'know, there's a ton of businesses, especially smaller
           | ones, that don't actually need that. The smarter customers
           | realize this themselves, and can judge the premium they'd pay
           | for Spanner over something far less complex. What I'm getting
           | to is that sales is a critical company function to bridge the
           | gap between what customers want, and what customers actually
           | need, and for you to make money.
           | 
           | The first releases of EBS weren't very good and took a while
           | to get to where we are. Some places still avoid using EBS due
           | to bad experience back in 2011 when it was first released.
        
             | azurelake wrote:
             | > who doesn't want instantaneous global writes
             | 
             | I want to gently note since I see a lot of misunderstanding
             | around Spanner and global writes: Global writes need at
             | least one round trip to each data center, and so they're
             | still subject to the speed of light.
        
               | mota7 wrote:
               | Like most things, it's more complex than that, and as a
               | result it can be either faster or slower than 'median(RTT
               | to each DC in quorum)'.
               | 
               | It's a delicate balance based on the locations that rows
               | are being read and written. In the case where a row being
               | repeatedly written from only one location and not being
               | read from different location, the writes can be
               | significantly faster than would be naively expected.
        
         | richardhod wrote:
         | What are the limitations to heroku that people are going to Fly
         | for? Maybe there's a standard article that would be useful to
         | read about it?
        
           | zamnos wrote:
           | It's more about Heroku dropping free and low-cost plans,
           | which is them demonstrating that they don't currently care
           | about three low end of the market, more than any specfic
           | feature.
        
         | VWWHFSfQ wrote:
         | > dirt simple managed Postgres
         | 
         | Heroku PostgreSQL is very simple, yes. But once you need non-
         | trivial scale it's expensive and extremely non-performant. Even
         | a medium-sized RDS will outperform Heroku's most expensive
         | database offering by 20x in my experience. My company doesn't
         | even run PG on Heroku anymore. We have a VPC/Private Space
         | connection to AWS Aurora because the cost/performance
         | difference is so extreme.
        
           | snacktaster wrote:
           | I don't know the details of how Heroku implements their
           | hosted postgres service, but I'm _guessing_ that it's just a
           | bunch of PG servers running on EC2 instances. There's
           | probably a lot of CPU stealing "noisy neighbors" going on.
           | But yeah, I've also experienced Heroku's PG databases being
           | dog-slow compared to RDS for the same workloads.
        
             | mixmastamyk wrote:
             | They're probably using older or cheaper instance types. By
             | not upgrading while charging the same or more over time,
             | one can skim more profit.
        
           | karmelapple wrote:
           | I have run the experiment, and Crunchy Data's Postgres
           | servers are 4X more bang for the buck than Heroku's.
           | 
           | I let some folks at Heroku know this who are product
           | managers, and they are investigating it... but I would be
           | shocked if Heroku gets a big performance improvement anytime
           | in, say, 2023.
           | 
           | 20X seems like a lot for RDS, though I'd be curious to learn
           | more! We are switching to Crunchy because of that clear
           | cost/performance difference you mention.
        
         | satvikpendem wrote:
         | I'm going to plug Coolify, an open source Heroku alternative
         | (with Docker support too) that I'm using on a cheap $5 Hetzner
         | server which is a lot cheaper than the equivalent Fly or Render
         | etc service, and it really doesn't have much upkeep from me
         | even if you add in the time setting up the server initially,
         | which is like an hour, and afterwards, it Just Works(tm).
         | 
         | https://coolify.io
        
           | notpushkin wrote:
           | Dokku is also nice and battle-tested: https://dokku.com/
           | 
           | And may I also plug Lunni, a self-hosted Docker Swarm-based
           | PaaS I'm working on right now: https://lunni.dev/
           | 
           | Both work pretty well on $5 servers.
        
             | satvikpendem wrote:
             | Lunni looks really interesting! Looks like a Coolify
             | competitor, I'll definitely check it out. Do you have a
             | Discord to join? Coolify has one and I found it great to
             | discuss the project and talk directly to the creator.
             | 
             | I used to use Dokku but I personally liked the GUI from
             | Coolify so I've been using that. Nice to see that you have
             | a GUI as well, makes configuring apps a lot easier.
        
           | arjvik wrote:
           | No experience with either, but how does Coolify compare to
           | Dokku, the OSS Heroku alternative I've been hearing about
           | until now?
        
         | leishman wrote:
         | This is spot on. I found myself using Fly for a project because
         | it was super easy, not because I needed edge compute. TBH it's
         | still actually unclear to me who needs edge compute? What apps
         | require this sort of infra? It's not 99% of web apps right?
        
           | hinkley wrote:
           | I still think that in the next pendulum swing we'll end up
           | with edge computing and (smaller) self-hosted backends.
           | Everything old is new again, and we haven't entirely
           | recreated Akamai from first principles yet.
        
           | davnicwil wrote:
           | Personally I see this as a 'why not, if it works' type thing.
           | 
           | Sure you don't _need_ it for 99% of usecases, but if it just
           | works using familiar architectures then it _is_ also strictly
           | better for 99% of usecases so you might as well, and people
           | will naturally want it.
           | 
           | That 'familiar architectures' part is the hard bit, though.
        
             | kevincox wrote:
             | But it isn't better in 99% of use cases. Lots of use cases
             | are rendering an API response or HTML page that involves
             | multiple database requests. Therefore the distance between
             | database and app server is more important than the distance
             | between the client and the app server.
             | 
             | Edge compute can be helpful for static or quite cachable
             | content. But often this is handled as well or nearly as
             | well by a caching CDN.
             | 
             | So that leaves a few cases where edge compute is useful.
             | Where you are globally distributing the data itself (and
             | ideally moving the data around as your users travel or
             | move) which is incredibly rare and expensive to build, and
             | when you need pure computation that needs no request to
             | your backend and if 50ms of latency is important for a pure
             | computation most of the time you can just move it to the
             | client. In my experience these tend to be rare. I would
             | estimate that edge compute is actually helpful for 1-5% of
             | projects, not 99%.
        
           | vineyardmike wrote:
           | One of the big benefits of edge compute is that it's
           | geographically distributed. Doesn't make a big impact across
           | the US, but globally a lot of nations have specific data
           | laws, so it's important to host data in the required nation.
           | Keep customer data in its nation of origin, but have a single
           | control plane and platform for ever data center.
        
         | vendiddy wrote:
         | I can second this. We were evaluating moving off Heroku and to
         | Fly.io, but we didn't need all of the edge compute stuff. We
         | just want a better Heroku without having to think about
         | infrastructure and having to think about edge compute just got
         | in our way.
         | 
         | I feel like Next.js is in a similar position. While their main
         | vision is SSR, I wonder if they are missing out on a chunk of
         | the market that simply doesn't want to think about infra. We
         | use them because we just don't have to worry about webpack or
         | fiddling with deployment and hosting. We could care less about
         | SSR and in fact we disabled it app-wide.
        
           | alexgrover wrote:
           | Why would they be missing out? Vercel can host static sites
           | just fine, whether that's one generated by Next or any other
           | framework or written by hand
        
           | leerob wrote:
           | One of the key design choices of Next.js was to enable
           | granularity on the runtime (Node.js or Edge[1]) and the
           | rendering method (static or dynamic[2]) on a per-route basis.
           | If you want a full SSR site, that's okay. If you want a full
           | static site, that's also okay.
           | 
           | We often see folks wanting a mix of both. For example, maybe
           | the /about page is static, but the home page is dynamic and
           | personalized based on the visitor. You can do all of this
           | with Next.js. Our future direction is adding even further
           | granularity, enabling this decision at the data fetch level,
           | allowing you to cache results across deployments[3].
           | 
           | [1]: https://beta.nextjs.org/docs/rendering/edge-and-nodejs-
           | runti...
           | 
           | [2]: https://beta.nextjs.org/docs/rendering/static-and-
           | dynamic-re...
           | 
           | [3]: https://vercel.com/blog/vercel-cache-api-nextjs-cache
        
         | sirsinsalot wrote:
         | Digital Ocean gave me the PaaS replacement and managed PG and I
         | couldn't be happier.
         | 
         | If anyone else is looking.
        
         | erebe__ wrote:
         | You can take a look at www.qovery.com It provides an Heroku
         | like experience but runs on your cloud account (aws, scaleway
         | or digital ocean).
         | 
         | They build on existing tech that is already working, so it is
         | more stable.
        
           | ignoramous wrote:
           | Save for a few "in-preview" features, Fly was stable too but
           | then they started growing faster than they could keep up (a
           | good problem to have!). Stability isn't a permanent state.
        
         | vineyardmike wrote:
         | I agree - fly is so easy to use (when it works) that it's hard
         | not to be impressed. BUT what I've found is that we don't need
         | edge compute, since our customers aren't that latency
         | sensitive, so it's lost on us. It's only a few more
         | milliseconds to us-east-1.
         | 
         | I've heard (on HN) of a dozen different companies vying for the
         | heroku replacement spots and yet Fly seemed to capture the
         | attention. I couldn't name another one off hand.
         | 
         | What I truly want and probably lots of other people too is
         | Flyctl (and workflow) for AWS. The same simplicity to run as
         | fly, but give me something cheap in Virginia or the Dalles.
        
           | latchkey wrote:
           | > What I truly want and probably lots of other people too is
           | Flyctl for AWS. The same simplicity to run as fly, but give
           | me something cheap in Virginia or the Dalles.
           | 
           | Google Cloud. It is painfully easy to spin up managed
           | postgres, super easy to deploy gcp cloud functions or gcp
           | cloud run. It isn't expensive either and just works.
        
             | 0cf8612b2e1e wrote:
             | If someone is not already using the holy trinity
             | (AWS/Azure/GCP) there is probably a reason.
        
               | monsieurbanana wrote:
               | I'm not using gcp anymore because it's not worth risk
               | losing access to my personal gmail account just to play
               | around with pet projects.
               | 
               | I might be paranoid, but I just don't feel comfortable
               | when there's so much in play.
        
               | guhcampos wrote:
               | Create a new Gmail account?
        
               | vineyardmike wrote:
               | Google associates different accounts that are from the
               | same owner when handling issues FYI. So if they think
               | your account is doing something wrong on GCP, be wary of
               | associated accounts.
        
               | namaria wrote:
               | Separating concerns, isolating things that are not
               | related, these are some basic tenets of good engineering.
               | Yet we all keep rolling the ball of mud downhill and act
               | shocked it keeps growing and swallowing everything.
        
               | 0cf8612b2e1e wrote:
               | Totally agree with this mindset. My digital life is on
               | the line because Google refuses to separate services.
        
               | amluto wrote:
               | Egress pricing, for one.
               | 
               | fly.io charges an outrageous 2 cents/GB. Google is over
               | 4x that.
               | 
               | At fly.io rates, 1Gbps average over a month is $6400/mo.
               | Google is tiered and you're looking at over $10k/mo.
               | 
               | For comparison, a cheap managed switch that can handle
               | 1Gbps cost about $100, maybe a bit more if you want a
               | nice one. A nice router is more. You can rent _an entire
               | rack_ , including power, cooling, and an unmetered 1Gbps
               | for $300-$1k/mo (with maybe some wiggle room on both
               | ends). You can buy a pretty nice server, amortize the
               | price over a week or two, and still come out ahead.
               | 
               | You certainly get considerable value from a major cloud
               | provider, and a lot of their other services are
               | reasonably priced, but, depending on your workload, the
               | egress prices and the corresponding Hotel California
               | factor may make using a major cloud provider a poor
               | proposition.
        
             | mattbillenstein wrote:
             | Do you have a guide in mind?
             | 
             | If it's sorting and sifting and clicking a bunch of stuff
             | in the console, that's not painfully simple. If it's some
             | easy cli commands, I think that's in the ballpark...
        
           | cldellow wrote:
           | Render.com is another spiritual successor of Heroku. I'd love
           | a world where Fly and Render are both very successful
           | companies.
        
             | vorticalbox wrote:
             | Render has some great features like making a new sub domain
             | for when a PR is opened so you can test it as a fully
             | working API before you merge
        
               | alexgrover wrote:
               | That's supported on most PAAS these days, including
               | Heroku.
        
               | vorticalbox wrote:
               | On their free tiers though?
        
               | [deleted]
        
               | alexgrover wrote:
               | Well, no longer free on Heroku, but it was
        
             | rychco wrote:
             | Yeah I like them both a lot, having tried deploying small
             | projects on each. However, I've defaulted to render at the
             | moment because I've found it painless for my current
             | project, and edge compute is low on my list of priorities.
             | 
             | Though to be fair, even if render collapsed overnight, I
             | think I'd still be equally satisfied after moving to fly.
        
             | te_chris wrote:
             | Not gonna happen. Both will get acquired because that's how
             | things work now
        
               | jamil7 wrote:
               | Not sure why this is downvoted, it's a valid point.
        
               | anurag wrote:
               | (Render founder) I'd love to understand why you think
               | this is the only outcome. Render has positive gross
               | margin and a clear path to profitability based on both
               | our growth so far and the tailwinds in this space. I'm
               | also aware of other companies like ours that have grown
               | all the way to IPO or are well on their way.
               | 
               | I'm very explicit both internally and externally that an
               | acquisition is a failure mode for Render. We're building
               | this for the very long term and plan to keep it that way.
        
               | te_chris wrote:
               | I guess I'm just default cynical these days seeing how
               | much money's still floating around and the scale of the
               | cloud big 3. Apologies, it wasn't personal. I admire your
               | vision and hope it can work, money always seems to talk
               | eventually though. We need more companies that have the
               | nerve to hold on and develop on their own.
        
               | jstummbillig wrote:
               | Unless a company is very explicit about this not being in
               | the books, I tend to share this outlook.
               | 
               | From the perspective of a recent founder, it's downright
               | spooky to build around any SaaS, considering how few of
               | them have been around for 10+ years, when that is
               | certainly what our business is aiming for.
               | 
               | I know (and share the feels): Devs tend to get excited
               | about the new thing - but if Google Workspace shut down
               | next month, we would be in so much operational trouble.
               | When other peoples fancies stand in the way of the entire
               | operation you are responsible for, it actually begs the
               | question how much closed source SaaS you can allow before
               | it starts to be quite frankly irresponsible.
               | 
               | We are not imagining things. SaaS of all sizes shut down
               | all the time, and when you are heavily relying on them
               | and building software around them to run a business the
               | prospect is spooky as hell.
        
               | zamnos wrote:
               | The difference between (free) Gmail and Google workspace
               | is that workspace is a paid product. If you're big enough
               | to warrant an AM, you can get terms which include
               | continuity of business planning if Google _does_ happen
               | to shut down Workspace. (They won 't.)
        
               | manmal wrote:
               | Is your argument that Workspace is a paid product and
               | therefore won't be shut down? If yes, let's keep in mind
               | that Stadia was paid-for too. My trust in the longevity
               | of Google products has been damaged beyond repair.
        
               | giovannibonetti wrote:
               | The difference is that Stadia was definitely losing
               | money, whereas Google Workspace might be profitable.
        
               | sethammons wrote:
               | I'm guessing that downvotes come from those who see the
               | macro environment changing. With increased rates,
               | borrowing to purchase companies may make less sense.
        
               | te_chris wrote:
               | Macro makes it harder to raise funding too though - VC no
               | longer as attractive given the risks and higher interest
               | rates available
        
               | morelisp wrote:
               | These threads from mrkurt a few months ago seem relevant
               | here -
               | 
               | https://news.ycombinator.com/item?id=32955520
               | 
               | If they are a multiplier for a whole portfolio, there's
               | not much reason for any particular branch to purchase
               | them.
               | 
               | (This post seems like some evidence they might actually
               | be building the wrong thing, though.)
        
           | trunnell wrote:
           | > Flyctl for AWS
           | 
           | Have you tried AWS Copilot? I'm having good success with it.
           | Probably not quite as simple as flyctl, but still it's only
           | one command to deploy a container.
           | 
           | I would really like fly.io to overcome these hurdles. I bet
           | they will.
        
           | manmal wrote:
           | I can second that I've seen render.com mentioned very often,
           | maybe even more so than fly.
        
           | gen220 wrote:
           | > What I truly want and probably lots of other people too is
           | Flyctl for AWS. The same simplicity to run as fly, but give
           | me something cheap in Virginia or the Dallas.
           | 
           | Pardon the ignorance, is this not the Amplify CLI [1] ?
           | 
           | [1]: https://docs.amplify.aws/cli/
        
             | pid-1 wrote:
             | No
        
               | scubbo wrote:
               | Can you elaborate?
        
               | ctvo wrote:
               | Things that just work and are delight to use and the AWS
               | Amplify CLI are not often mentioned together. The Amplify
               | CLI is a growing collection of poorly thought out, poorly
               | implemented functionality that looks good in demos, but
               | falls apart under any close inspection.
        
           | zoomzoom wrote:
           | I think this whole category is interesting, from the next-gen
           | PaaS to the cloud-native ecosystem. Totally empathize with
           | how hard what fly is doing in terms of scale and reliability
           | is.
           | 
           | At Coherence (withcoherence.com) we're focused on a developer
           | experience layer on top of AWS/GCP. You might describe it as
           | flyctl for AWS.
        
       | jrochkind1 wrote:
       | I remain kind of amazed about how heroku managed to pull off what
       | they pulled off, in the first case.
       | 
       | Also:
       | 
       | > The Heroku exodus broke our assumptions. Pre-Heroku, most of
       | the apps we were running were spread across regions. And: we were
       | growing about 15% per month. But post-Heroku, we got a huge
       | influx of apps in just a few hot spots -- and at 30% per month.
       | 
       | I hadn't before seen anyone with a big picture view confirm a
       | heroku exodus was happening, although a lot of people _suspected_
       | it or had anecdotes.
       | 
       | But if fly is seeing a pretty enormous number of customers moving
       | from heroku to fly... oh wait, now I'm wondering, is this mainly
       | a result of heroku ending _free_ services, and those are free
       | customers coming to fly for free services?
       | 
       | If so... that's a pretty big burden to take on without revenue to
       | match, it does seem kind of dangerous for fly.
        
       | tiffanyh wrote:
       | Would the simple solve be that Fly.io just mark any new service
       | of theirs "beta" for x-months post launch?
        
         | samwillis wrote:
         | None of the services that they have had issues with in the
         | recent past are new, they have been running for at least a
         | couple of years. They would need to put a "beta" sticker on the
         | whole platform for that suggestion to work.
         | 
         | But the post makes it clear that the issue isn't that they had
         | problems with new services. It was rapid customer growth before
         | they had time to scale up the infrastructure as they had
         | planned to do.
        
       | ec109685 wrote:
       | I wonder what types of RPS they are seeing that required a gossip
       | based protocol to broadcast state around versus a more
       | traditional data store.
       | 
       | I take it that it's far more important that the local region know
       | about changes than a remote region, which makes a mastered store
       | in one location as the source of truth problematic.
       | 
       | I also wonder why these companies don't backstop themselves on
       | the public cloud? Failing into an AWS seems better than running
       | out of capacity and some its services could be used in
       | circumstances where an open source technology isn't ready.
        
         | likecarter wrote:
         | Yeah, reading the post made it seem like they followed "best
         | practices" without really thinking things through. KISS.
        
       | [deleted]
        
       | lll-o-lll wrote:
       | At first I was all like "Ha ha, losers can't scale"
       | 
       | And then I was "Huh, these technical challenges are actually
       | pretty difficult"
       | 
       | And _then_ I was all "crap, these are a bunch of technologies I
       | was about to add to our stack"
       | 
       | Thanks heaps fly.io people; having the humility to honestly talk
       | about the challenges and failures massively helps people such as
       | myself as we navigate new unfamiliar technologies. If more
       | companies were willing to do this, it'd be a lot easier to avoid
       | common pitfalls.
        
         | chucky_z wrote:
         | The tech in their stack is still pretty good. Unless you're
         | supporting tens of thousands of customers and trying to make
         | the promises that fly makes today. Look at the fly engineer
         | replies in this thread.
         | 
         | Also they basically only use OSS versions, they could go give
         | Hashicorp some money to solve their Vault problems. They could
         | probably partner with SecondQuadrant for PG as two examples.
         | That might not make sense for their business though.
         | 
         | Hard problems are hard no matter the choice.
        
           | lll-o-lll wrote:
           | Sure, I was going for a little humour there. A little riff on
           | the whole "we always judge others until we walk in their
           | shoes".
           | 
           | The take away I was hoping for is "providing insights into
           | how we struggle helps others"
        
       | lopatin wrote:
       | It sounds like they need more money to scale the shared stack
        
       | e1g wrote:
       | This reads like a mea culpa from an indie hacker, but Fly.io had
       | 5+ years and raised $40M to get these basic _fundamentals_ right.
       | And we get promises of a new status page.
        
         | sph wrote:
         | Big companies fail spectacularly as well, so it is refreshing
         | to read a indie hacker-style mea culpa than a pile of nonsense
         | PR one would expect from a company that raised $40M.
         | 
         | Honesty pays off in the long run, but it's something businesses
         | quickly forget past a certain stage.
        
         | ignoramous wrote:
         | Well, that's one way to look at it.
         | 
         | Fly's been many things over the course of its lifetime [0], but
         | I believe their latest pivot (on what they call "Machines") is
         | pretty darn good. I've been using Machines since Oct last year,
         | and things have gotten better week-over-week. Like with any
         | platform, Fly has its own idiosyncrasies, which don't take much
         | to get hang of. That said, I am the only person in my tech shop
         | that deals with Fly. Some orgs with larger teams and heavier
         | apps that deploy frequently or run DBs / disks on Fly (I don't)
         | have had a rough few months; so that's there too.
         | 
         | [0] Ex A: https://news.ycombinator.com/item?id=13985940
        
           | mbStavola wrote:
           | Fly Machines, if I understand them correctly, feels like a
           | step backwards. Sure they might work better than "standard"
           | Fly apps, but one of the motivating cases for Fly is being
           | able to effortlessly scale across the world without having a
           | Ph.D. in CS and a fistful of certs for Cloud engineering.
           | That vision for Fly is awesome, game changing even, ignoring
           | their current stability issues.
           | 
           | Machines isn't that. From the documentation, it appears as
           | though it's "just" a VM pinned to a single region and none of
           | the "magic" of Fly really applies. If the server your VM is
           | hosted on goes down, Fly won't redeploy your container. It's
           | just downtime. Spinning up in other regions is something you
           | have to think about and actually _do_. It seems closer to
           | Heroku than it does Fly.
           | 
           | Maybe I am totally misunderstanding Fly Machines and their
           | use-case, maybe they're aiming to close the gap between
           | Machines and Fly apps. It's just a bit of a bummer to see
           | something that looks like walking back the original "promise"
           | of Fly and makes me question whether or not Fly is going to
           | just become like every other PaaS (even if it's a really good
           | one).
        
             | ignoramous wrote:
             | Agree. Kurt's mentioned on the forums that _autoscale_ is
             | coming to Fly Machines. They haven 't implemented it just
             | yet.
             | 
             | Even without _autoscale_ , spinning up Machine clones in
             | any of the 30+ Fly regions is as easy an instant scale-out
             | you'll likely come across on any of the _NewCloud_
             | platforms.
        
         | phreack wrote:
         | It is concerning that they feel notifying a problem on the
         | status page hurts their ego. It is under no circumstances
         | something personal, and ideally it should even be automated.
        
       | [deleted]
        
       | sergiotapia wrote:
       | It's been almost a year since I gave Fly a review
       | (https://news.ycombinator.com/item?id=31391116) and it's a bummer
       | that they're still struggling to get things right. Double bummer
       | because I love Phoenix and Elixir and they employ Chris McCord
       | there.
       | 
       | Maybe they were _too_ ambitious at the start? They have a hard
       | road ahead of them, and competition like Render.com and
       | Northflank have provided me with solutions to all of my problems.
       | Great dev ux, great prices and predictable solutions. They also
       | keep pushing out very useful features. A third competitor also
       | sprung up Railway! There's certainly blood in the water.
       | 
       | Will they catch up to others before the competition solves the
       | "global mesh" unique value proposition Fly.io currently has?
       | That's the $1MM question.
        
         | zomglings wrote:
         | I read your review, and had a question so I thought I'd follow
         | up here. You mentioned render.com as a competitor - does render
         | host its own infrastructure or do they act as a go-between
         | between their users and AWS/GCP/whatever?
        
           | cldellow wrote:
           | They act as a go-between in that they ultimately host on
           | AWS/GCP. They host their own infrastructure in that they
           | appear to run Kubernetes and have built out their own
           | deployment and service fabric, so they're just using the
           | underlying machines as dumb compute, they're not, eg,
           | building on RDS.
           | 
           | In March 2021, someone asked a question about carbon
           | emissions of their data centres. They said they hosted on
           | both GCP and AWS, but mentioned they were interested in
           | moving to their own bare metal [1].
           | 
           | In April 2021, I asked a question about egress fees to
           | Google, and they walked back a bit the comment about moving
           | to bare metal [2].
           | 
           | As of March 2022, they're still in AWS/GCP [3].
           | 
           | As of September 2022, workloads for new users deploy into
           | AWS, even in regions that were previously served by GCP [4].
           | 
           | [1]: https://community.render.com/t/does-render-use-green-
           | energy/...
           | 
           | [2]: https://community.render.com/t/is-render-com-hosted-in-
           | googl...
           | 
           | [3]: https://community.render.com/t/are-your-servers-owned-
           | by-you...
           | 
           | [4]: https://community.render.com/t/which-render-regions-map-
           | to-w...
        
             | anurag wrote:
             | (Render founder) We're still on public clouds because even
             | if it doesn't help with margins, it helps us move faster on
             | features our customers want. It's all one big
             | prioritization problem (and lots of little ones too!).
        
               | zomglings wrote:
               | I'm curious how significant a risk products like AWS
               | Lightsail are to your business - it seems you are
               | competing in the same market, but:
               | 
               | 1. They have vastly different ongoing capital and
               | cashflow requirements than you do.
               | 
               | 2. They have all the leverage when it comes to the
               | question of your continued operations on their cloud.
               | 
               | I'm also curious if they have already offered to just buy
               | you out since you're clearly succeeding where they seem
               | to just be treading water. (But not expecting you to
               | answer this question. :) )
        
       | iamdbtoo wrote:
       | I'm a big fan of fly.io. From their hiring process to the product
       | itself it's all carried out in a thoughtful manner. I hope they
       | can weather this rough time.
        
       | emschwartz wrote:
       | One of my colleagues keeps repeating "reliability is our number
       | one feature".
       | 
       | I'm not sure it is for 100% of early stage startups, but I guess
       | it is once you exceed some minimum usage threshold.
       | 
       | That said, definitely appreciate the detailed explanation.
        
         | jpdb wrote:
         | > One of my colleagues keeps repeating "reliability is our
         | number one feature".
         | 
         | I think reliability is the #1 feature at any stage because if
         | you're unavailable, you're at best useless and more than likely
         | you are actively harmful because your users have an
         | expectation.
         | 
         | However, if you're unavailable outside of times customers don't
         | expect you to be there then you're not actually unavailable.
         | This is more likely for an early stage start-up, but you don't
         | typically choose or know when you're expected to be available
         | nor do you always get to choose when you're unavailable.
        
           | sa46 wrote:
           | In terms of confidentiality, availability, and integrity:
           | I'll bet LastPass would gladly trade availability right now
           | to regain confidentiality.
        
           | ignoramous wrote:
           | Our team at AWS had a poster up on the wall that more or less
           | went:
           | 
           | 1. Security
           | 
           | 2. Durability
           | 
           | 3. Availability
           | 
           | 4. Speed
           | 
           | Similar:
           | https://twitter.com/colmmacc/status/1071088017190711296
        
         | cschep wrote:
         | TL;DR -- It's very domain specific if reliability is your
         | number one feature.
         | 
         | For a startup that is hosting other people's production
         | application/data then this is absolutely true. Less than 100%
         | always needs to be addressed.
         | 
         | For a startup that is selling bingo cards then reliability
         | probably isn't nearly as important. I'm guessing there were
         | certain holidays that were more important than others as far as
         | reliability goes though? Maybe patio11 can chime in :)
        
         | jrochkind1 wrote:
         | > One of my colleagues keeps repeating "reliability is our
         | number one feature".
         | 
         | > I'm not sure it is for 100% of early stage startups,
         | 
         | I mean, it probably depends on the nature of the startup?
         | Platform-as-service seems particularly sensitive to reliability
         | (whether or not it's "#1 feature"), in a way that might not be
         | true of startups in other spaces.
        
       | yamrzou wrote:
       | I'm not a user of Fly.io. I can't help but notice how remarkable
       | the effect of open communication on potential end users like me.
       | I remember reading about their reliability problems on HN some
       | time ago. That biased my view of the company. After reading this,
       | the open communication and transparency restored my trust in
       | them, and would make them again a potential candidate for future
       | projects. Because now I know that they acknowledge the problem
       | and that they are trying to improve things.
        
         | willio58 wrote:
         | Agreed, this is how company communication should be.
         | 
         | I don't use Fly but would consider them in the future even
         | given their recent issues.
         | 
         | I look at this in contrast to Twitter who had/has? an outage
         | today. Their leadership is opaque and doesn't take
         | responsibility for the issues they are causing.
        
           | alfalfasprout wrote:
           | In fairness, a CEO who has basically been Kanye-ing himself
           | and his company into irrelevance is a low bar.
        
         | newaccount2021 wrote:
         | [dead]
        
         | alfalfasprout wrote:
         | This is huge. Even as a member of a larger company, this stuff
         | matters. If you have a vendor that doesn't bullshit you when
         | things go wrong, you can actually trust. This is how you avoid
         | companies having the "hmmm they seem to be having lots of
         | issues recently, let's consider moving off them" conversation.
        
         | snapetom wrote:
         | This is probably therapy, but your message and fly.io's post
         | resonates a lot with what I'm going through. I took a product
         | owner role about 6 months ago, my first, with a company that
         | has turned out to be just a mired mess, and a product
         | universally hated both internally and externally.
         | 
         | Long story short, it's completely over-engineered by a bunch of
         | intellectual engineers with no focus, no discipline, and no
         | oversight. It ended up not delivering on any promises it made,
         | and there were a lot of them.
         | 
         | I was warned left and right before presentations and meetings,
         | "this customer hates your product because of ...." I started
         | off every meeting with saying, "we're rearchitecting the
         | product, this is how we're doing it, this is the tech we are
         | using." Immediately there was a sense of relief from customers,
         | followed by questions like, "why can't <current product>
         | deliver <feature> that was promised?" I'm completely honest
         | with bad decisions that were made and how it impacted the
         | feature. Sure, there is skepticism on what we are doing, and I
         | tell them they should absolutely be skeptical based on our
         | track record. The result has been customers who have hated my
         | product now offering to work with us on development.
         | 
         | I've also been completely forthcoming on configuration,
         | security, resources, and setup issues I am finding, many of
         | them are absolutely freakin' insane. I've flat out told
         | customers it's frankly embarrassing and never let us do
         | something like this in the future. The best feedback on this
         | was, "At least you're telling us something. We usually get
         | silence from this team."
         | 
         | God, this is the most depressing job ever.
        
           | zamnos wrote:
           | Can you help me in a detailed sense - what did you tell
           | customers? did you literally say there's product is
           | "completely over-engineered by a bunch of intellectual
           | engineers with no focus, no discipline, and no oversight"?
           | That seems a little over-honest to me but of course I wasn't
           | there.
        
           | hinkley wrote:
           | Architectural astronauts.
        
           | mrkurt wrote:
           | I feel this. I hope you get over the hump and your job gets
           | fun. We've had flashes, at least, but I do think what we're
           | doing (and probably what you're doing) require some
           | irrational behavior.
        
           | ndneighbor wrote:
           | Part of what I hated about Product Management at my last role
           | was the consistent helplessness I felt when I was on calls
           | with our customers. I could tell our product wasn't meeting
           | their needs but all I could do was try my best to give the
           | engineers context on how best to eventually meet them.
           | 
           | I remember my first few days on the job just being ripped to
           | shreds by our customers who (understandably) were slighted.
           | Don't miss those days at all.
        
           | claytonjy wrote:
           | Your job sure does sound depressing, and it's not one I would
           | succeed at, but if you can power through and turn this
           | product around that's a hell of an accomplishment you'll have
           | to be proud of.
           | 
           | I'm curious what you'd like to do next. You could probably
           | have a great career doing these sorts of turnarounds
           | repeatedly across companies, maybe even as a consultant, but
           | would you want to?
        
         | leetrout wrote:
         | mrkurt[1] is also active here and has been very transparent in
         | his comments about scaling issues.
         | 
         | Similar to this post he commented a week ago:
         | 
         | > In a year we'll either be ahead of those, or not growing
         | anymore due to ongoing capacity issues. I'm hoping for the
         | former.
         | 
         | I am rooting for Fly! Great team. The company reminds me of
         | early HashiCorp.
         | 
         | [1] https://news.ycombinator.com/user?id=mrkurt
        
         | gizmo wrote:
         | This post is carefully worded corporate messaging, but because
         | they write for their developer audience it has an informal "oh
         | shucks we messed up bad y'all" vibe to it. But make no mistake,
         | this is 100% corporate messaging.
         | 
         | I get that growing is super hard. And maybe fly will grow up to
         | be a good platform some day. But that's the future. Today,
         | they're flying by the seat of their pants and I mostly feel
         | sorry for people who were tricked into thinking this platform
         | is ready for production use.
        
           | spoiler wrote:
           | I'm not sure why the cynicism around their candor. Do you
           | think it's not genuine just because it was posted by a
           | company employee?
           | 
           | Your post implies corporate messaging is bad. And anything
           | posted by a company--or at least I don't know where you draw
           | the line--can be considered corporate messaging. Am I just
           | reading too much into your phrasing?
        
             | gizmo wrote:
             | It's _strategic_ messaging. It can 't be genuine, because
             | of what it is. The benefit they get is publicity and damage
             | control, and as you can tell by the many responses here, it
             | buys them time because many developers are willing to give
             | them the benefit of the doubt.
             | 
             | Companies that engage in this kind of candor are careful
             | not to disclose those things that would really hurt their
             | business. Those things are still kept secret. If the CEO
             | accidentally sexually harassed an employee that's not
             | getting disclosed. A mea culpa is only offered for the
             | issues that are already known regarding scaling, downtime,
             | and missing features. Struggles they have because they're
             | choosing to grow so fast.
        
               | dadrian wrote:
               | Sorry, what? Do you expect that no company can think
               | about what to write before they post it, or that any post
               | about anything internal must cover all internal issues?
               | Posts must be either all roses or a no-thought laundry
               | list of everything bad?
        
               | ignoramous wrote:
               | > _Do you expect that no company can think about what to
               | write before they post it..._
               | 
               | I guess, you and GP are in agreement for the _strategic_
               | part of the argument at least, if not the _genuine_ part
               | of it.
               | 
               | As someone who's been active on Fly's community forums
               | for close to 18 months now, I think Fly employs some of
               | the most genuine and helpful engs you'll see, so I'll
               | give them the benefit of the doubt.
        
           | skrtskrt wrote:
           | professionals don't get tricked into thinking a platform is
           | ready for production use
           | 
           | If you don't have SLOs and SLAs, then you get what you get,
           | essentially. Even a company with a great reputation can
           | completely reverse course with a single bad incident, and you
           | get nothing in return if there's not a contract.
        
             | AtlasBarfed wrote:
             | Honestly, if you are a small fish to AWS... what is an SLA?
             | 
             | They can trot out a low level person to stall you with
             | questions, or an AI question generator that maximizes the
             | amount of time you waste on your end, and call that "SLA
             | met".
             | 
             | And even if they DON'T meet the SLA on occasion, you built
             | your stack on AWS. You are laying in the bed you made.
             | 
             | SO, what, AWS throws some free credits (that their 30-40%
             | margin easily absorbs)?
             | 
             | The only big stick in these types of things is having dual-
             | cloud capability, where you can move your service quickly
             | from one cloud to the other. Stateless API servers? Maybe.
             | Database servers? ouch. Cassandra could reliably span two
             | clouds, man would AWS kill you on their ludicrously
             | overpriced network costs.
             | 
             | Has anyone does Postgres replication across providers as a
             | useful production system? Doubt it.
        
           | Trufa wrote:
           | I don't get what you're saying, this isn't a brag disguised
           | as a confession, they are actually admitting to poor
           | performance, of course it's to eventually make users trust
           | them, but a) I don't see nothing bad with that b) they are
           | choosing the hard route.
           | 
           | They are being open and transparent (afaik) even if carefully
           | worded, which I also don't blame them for.
        
         | mbesto wrote:
         | Me too.
         | 
         | However, this is a double edge sword. Their key value
         | proposition _is_ scale  / speed which makes it concerning that
         | they haven't "solved" that yet.
        
         | bodecker wrote:
         | Open communication is great when there are incidents, but even
         | better is having no incidents. (of course there are nuances
         | depending on specific context)
        
       | pier25 wrote:
       | I've been using Fly for over two years or so. The sentiment of
       | this post doesn't align with my personal (anecdotal) experience.
       | 
       | The PG issues hit me two times in the previous weeks but other
       | than that it's been working great for me.
       | 
       | With the move to v2 apps (using their new machines infra) things
       | are actually faster and smoother than ever.
       | 
       | About a year or so ago their CLI was quite buggy but I haven't
       | really hit any bugs in months.
       | 
       | I will remain with Fly for the time being. Hopefully they don't
       | close shop!
        
         | tptacek wrote:
         | We're nowhere even within the line of sight of closing up shop.
         | We just haven't been doing a good job of aggressively
         | communicating (a) when things go wrong and (b) what we're doing
         | to account for it.
         | 
         | The Fly.io of 2023 looks almost nothing like that of 2021 (all
         | for the better), and it's not obvious to our users what's
         | changed. We've been doing a shitty job of communicating, and
         | we're taking our licks for it now.
        
           | chris_st wrote:
           | A lot can happen in 11 years :-)
           | 
           | And thanks a lot for fly.io -- it's working great for my
           | (rather small) use cases.
        
             | mrkurt wrote:
             | Oh my god that's a great callback.
        
         | russellthehippo wrote:
         | Agree - V2 apps on machines are incredibly slick to launch
         | (create/start/stop), get info on with graphql, and scale up and
         | out. Magic. When the PG administration experience is that good
         | I'll move it all over.
        
       | claytonjy wrote:
       | Very interesting to see Kurt assert theyre going to "solve
       | managed Postgres", and I'm super curious to know what that means.
       | Does it mean something like RDS, or more like CrunchyData?
       | 
       | I could see them building something RDS-like on their own, but if
       | they're trying to go further than that I wonder if they'll buy or
       | partner with other companies rather than doing it themselves.
       | Neon strikes me as a Postgres-as-a-service that could pair well
       | with Fly.
        
         | mmcclure wrote:
         | That comment jumped out to me too, my recollection was that
         | they've been pretty vocal about that not being something they
         | wanted to solve themselves as a core competency. I'm not quite
         | parsing if these two blurbs should still be taken together or
         | if the second sentence is refuting the first.
         | 
         | > The second problem we have with Postgres was a poor choice on
         | my part. We decided to ship "unmanaged Postgres" to buy
         | ourselves time to wait for the right managed Postgres provider
         | to show up.
         | 
         | > We're going to solve managed Postgres. It's going to take a
         | while to get there, but it's a core component of the
         | infrastructure stack and we can't afford to pretend otherwise.
         | 
         | +1 to Neon seeming like a good fit, but it's also very much a
         | beta (alpha?) both as a product and company (at least from my
         | impression). I'm not sure that's a bet they'd want to make
         | right now given the context of this post.
        
           | nikita wrote:
           | (Neon CEO)
           | 
           | We are launching our paid tier March 15th and will be
           | production ready shortly after. We are running 20K+ databases
           | and measuring reliability and uptime.
           | 
           | Generally reliability is a function of architecture (we are
           | solid there), good SRE practices, and a long tail of event
           | you live through, fix, and make sure they never happen again.
           | The bigger the fleet the faster the learning.
        
         | craigkerstiens wrote:
         | Craig here from Crunchy Data. Not sure if you mean Crunchy Data
         | is like RDS or isn't, in some cases we're very similar as a
         | managed service provider. But are focused on a better developer
         | experience and quality support.
         | 
         | We've had a number of customers that use us for the database
         | and fly for the app. We had a user benchmark a number of heroku
         | alternatives with various database providers and we were
         | actually better response time than the unmanaged instances on
         | fly themselves in addition to all other providers they tested -
         | https://webstack.dancroak.com/
         | 
         | I won't speak for Fly, but we're big fans of them and think we
         | pair quite well together.
        
           | mrkurt wrote:
           | Yes we think they pair well together too. I believe the ball
           | is in your court though. ;)
        
             | winslett wrote:
             | <3
        
           | claytonjy wrote:
           | I haven't used CrunchyData for work, but I see you as
           | offering what RDS does plus plenty more. RDS does a lot, but
           | after using Timescale Cloud professionally I saw how much RDS
           | _doesn 't_ do, like actually-simple upgrades, one-click
           | forks, etc. and Crunchy looks similar in going beyond RDS.
           | 
           | I think the community would really love to see a direct
           | Fly+Crunchy integration!
        
         | aeyes wrote:
         | If I was in their shoes I'd probably aim for a "serverless"
         | Postgres experience where you get a connection string and you
         | know nothing else.
         | 
         | I think RDS, Crunchy, Aiven and others aren't quite there yet.
        
           | chime wrote:
           | They kind of offer that with their Redis (via Upstash). But
           | for our use-case, we needed it to be managed PG and Redis.
           | Going out of the LAN introduces too much latency.
        
             | chronark wrote:
             | Upstash Redis for Fly runs on Fly infrastructure and we
             | observe latencies in the low single digit milliseconds.
        
         | jrochkind1 wrote:
         | I don't even understand what you mean as the difference between
         | "something like RDS" and "something like CrunchyData" -- they
         | seem like similar products to me?
        
           | claytonjy wrote:
           | I see RDS as the absolute bare minimum for a managed
           | database; providers like Timescale or Crunchy tend to add
           | some pretty useful stuff on top.
        
             | jrochkind1 wrote:
             | For my own curiosity, I am interested in hearing what
             | features Crunchy adds on top that RDS doesn't have, that
             | folks find pretty useful!
             | 
             | (Timescale -- I think i know, it adds features specifically
             | about storing time series? But I don't think crunchy has
             | additional domain-specific stuff like this?)
        
       | pyentropy wrote:
       | Almost half of the issues are caused by their use of HashiCorp
       | products.
       | 
       | As someone that has started tons of Consul clusters, analyzed
       | tons of Terraform states, developed providers and wrote a HCL
       | parser, I must say this:
       | 
       | HashiCorp built a brand of consistent design & docs, security,
       | strict configuration, distributed-algos-made-approachable... but
       | at its core, it's a _very_ fragile ecosystem. The only benefit of
       | HashiCorp headaches is that you will quickly learn Golang while
       | reading some obscure github.com /hashicorp/blah/blah/file.go :)
        
         | tptacek wrote:
         | We are asking to HashiCorp products to do things they were not
         | designed to do, in configurations that they don't expect to be
         | deployed in. Take a step back, and the idea of a single global
         | namespace bound up with Raft consistency for a fleet deployed
         | in dozens of regions, providing near-real-time state
         | propagation, is just not at all reasonable. Our state
         | propagation needs are much closer to those of a routing
         | protocol than a distributed key-value database.
         | 
         | I have only positive things to say about every HashiCorp
         | product I've worked with since I got here.
        
           | pyentropy wrote:
           | I respect that. Can you elaborate a bit on the routing
           | protocol thing? I assume you used WAN gossip?
           | 
           | I love the simplicity of fly.io & wish you all the best
           | improving Fly's reliability!
        
             | tptacek wrote:
             | If you've ever implemented IS-IS or OSPF before, like 80%
             | of the work is "LSP flooding", which is just the process
             | that gets updates about available links from one end of the
             | network to another as fast as possible without drowning the
             | links themselves in update messages. Flooding algorithms
             | don't build consensus, unlike Raft quorums, which
             | intrinsically have a centralized set of authorities that
             | keep a single source of truth for all the valid updates.
             | 
             | An OSPF router uses those updates to do build a forwarding
             | table with a single-point shortest path first routine, but
             | there's nothing to say that you couldn't instead use the
             | same notion of publishing weighted advertisements of
             | connectivity to, for instance, build a table to map
             | incoming HTTP requests to backends that can field them.
             | 
             | The point is, if you're going to do distributed consensus,
             | you've got a dilemma: either you're going to have the Ents
             | moot in a single forest, close together, and round trip
             | updates from across the globe in and out of that forest
             | (painfully slow to get things in and out of the cluster),
             | or you're going to try to have them moot long distance
             | (painfully slow to have the cluster converge). The other
             | thing you can do, though, is just sidestep this: we really
             | don't have the Raft problem at all, in that different hosts
             | on our network do not disagree with each other about
             | whether they're running particular apps; if worker-sfu-
             | ord-1934 says it's running an instance of app-4839, I
             | pretty much don't give a shit if worker-sfu-maa-382a says
             | otherwise; I can just take ORD's word for it.
             | 
             | That's the intuition behind why you'd want to do something
             | like SWIM update propagation rather than Raft for a global
             | state propagation scheme.
             | 
             | But if you're just doing service discovery for a well-
             | bounded set of applications (like you would be if you were
             | running engineering for a single large company and their
             | internal apps), Raft gives you some handy tools you might
             | reasonably take advantage of --- a key-value store, for
             | instance. You're mostly in a single data center anyways, so
             | you don't have the long-distance-Entmoot problem. And
             | HashiCorp's tools will federate out across multiple data
             | centers; the constraints you inherit by doing that
             | federation mostly don't matter for a single company's
             | engineering, but they're extremely painful if you're
             | servicing an _unbounded_ set of customer applications and
             | providing each of them a _single global picture_ of their
             | deployments.
             | 
             | Or we're just holding it wrong. Also a possibility.
        
             | [deleted]
        
       | chucky_z wrote:
       | mrkurt have you considered some of the lower tiers of vault
       | enterprise that allow for performance replicas that just outright
       | solve that problem? might be cheaper than an engineer at this
       | point.
        
       | bmorton wrote:
       | Can we please stop with the fly.io spam.
       | 
       | >I've hesitated to share this because, well, I'm fighting a
       | debilitating feeling of failure. Fear, too.
       | 
       | This is gross and I don't buy the emotional play -- this whole
       | piece is just an ad.
        
         | sph wrote:
         | It's a post on their forum, hardly an ad.
        
       | outworlder wrote:
       | > This is a theme. Existing open source is not designed for
       | global deployment
       | 
       | Eh? Unless you are consuming something as a service and it
       | actually advertises it as a feature, nothing is ready for 'global
       | deployment'.
       | 
       | If you have a 'centralized' secret storage, then you have made it
       | tied to a region. Want to have redundancies and lower latency?
       | You'll have to distribute it. Vault has docs about this:
       | https://developer.hashicorp.com/vault/tutorials/day-one-raft...
        
       | theloco wrote:
       | I love reading stuff like this. I don't use fly, don't plan to,
       | not totally sure everything it does and will check it out. But
       | this is some great raw data on how stressful it is after you
       | launch.
        
       | ashiban wrote:
       | One of the key challenges we observe is that if you're small
       | enough, a Heroku like experience works well - and most of your
       | needs would be covered by virtually any combination of
       | techstacks.
       | 
       | It gets significantly more challenging when you grow, either in
       | feature complexity or scale complexity - and then very few
       | services can offer what AWS/GCP/Azure offer - albeit at the
       | increased engineering/monetary cost of using them.
       | 
       | We're building a different kind of approach[0] that aims to
       | absorb the mechanical cost of using public cloud capabilities
       | (that are proven to scale) without hiding it altogether.
       | 
       | [0] https://github.com/KlothoPlatform/klotho
        
       | deivid wrote:
       | I'm a bit sour reading this. I've always liked fly and
       | particularly the engineering blog, so much so that a couple of
       | months ago I decided to apply for an infra position, to work on
       | some of these very topics. Sadly after 4~5 rounds of interviews
       | (including a workday) they just ghosted me.
        
         | [deleted]
        
         | tptacek wrote:
         | If that happened, it absolutely was not on purpose. Shoot me an
         | email at thomas@fly.io.
        
         | ignoramous wrote:
         | > Don't feel too bad nor take it personal. They probably have a
         | lot of applicants, and are looking to grow their team by hiring
         | someone with very specific skills.
         | 
         | > I also applied a few months ago while I was in the middle of
         | my job search. For one, I couldn't really answer their
         | "favorite syscall question" because I've never dealt with
         | syscalls :) so maybe I just wasn't a good fit.
         | 
         | Surely, everyone's favourite syscall is _exit()_
        
       | 1023bytes wrote:
       | I get it, I like fly.io, but the last outage made me switch to
       | Railway.app
        
       | a3w wrote:
       | What is this about?
        
       ___________________________________________________________________
       (page generated 2023-03-06 23:00 UTC)