[HN Gopher] How we successfully handled 2.5x traffic in a week
       ___________________________________________________________________
        
       How we successfully handled 2.5x traffic in a week
        
       Author : talonx
       Score  : 87 points
       Date   : 2020-05-13 17:01 UTC (5 hours ago)
        
 (HTM) web link (engineering.khanacademy.org)
 (TXT) w3m dump (engineering.khanacademy.org)
        
       | [deleted]
        
       | programminggeek wrote:
       | On a largely content based app/site, most of "scaling" comes down
       | to caching. However you do that is up to you, but somewhere
       | between caching at the browser layer, proxy layer, web server
       | layer, or memcache layer, things should be fast and scalable
       | without getting too fancy.
        
       | parhamn wrote:
       | Notably:
       | 
       | - No Rust/Go rewrite
       | 
       | - GC not disabled
       | 
       | - Didn't apply the latest research on k/v storage
       | 
       | Jokes aside, this is the fun parts of hosted software and glad to
       | hear the "things don't have to be so hard" side of things. Hope
       | it continues working out!
        
         | spyspy wrote:
         | If you're already using GCP, my general advice for new projects
         | is almost always some form of "just throw it on AppEngine". No,
         | you don't need multi-region deployments. No, you don't need
         | 32TB of memory per instance. No, you do not need kubernetes.
         | No, istio is not going to solve this. No, you're not hosting
         | your own kafka cluster.
         | 
         | I've found devs are always trying to over-engineer complex
         | solutions to dead simple problems. Just let Google do it and
         | get some sleep.
        
           | gamegod wrote:
           | "Just put all your eggs in one basket!"
           | 
           | K
        
             | batter wrote:
             | AppEngine doesn't scale fast, gradual traffic increase fits
             | better for AE. We have spikes x 1000 and back within one
             | minute, stable 10 boxes with average hardware (compute
             | engine) handle it more reliably than more than 100 AE
             | Golang instances which will be hammering everything
             | downstream. Also GAE costs will be insane compared to
             | compute engine.
        
             | loktarogar wrote:
             | it's easy to put them all in one basket if you only have
             | one or two eggs
        
             | infogulch wrote:
             | Just don't pretend that the two eggs you start with are
             | twenty just yet.
        
             | sg47 wrote:
             | Nope, just one or two eggs. When those eggs hatch and you
             | have a massive chicken farm, you can start putting your
             | eggs in multiple baskets.
        
             | freehunter wrote:
             | Starting simple isn't putting all your eggs in one basket.
             | In fact it's such a common recommendation (and so commonly
             | forgotten about) that there are several phrases designed
             | just to teach this one lesson:
             | 
             | KISS - Keep it simple, silly
             | 
             | YAGNI - You ain't gonna need it
             | 
             | MVP - Minimum viable product
             | 
             | Overengineering
             | 
             | Real artists ship
             | 
             | I'm sure there's a ton more examples but that's just off
             | the top of my head. Point is, until you know that you need
             | high availability and multi-zone disaster recovery and etc
             | etc, just engineer it for the problems you actually have.
        
           | realbarack wrote:
           | For new projects sure, but you need an escape hatch. App
           | Engine costs can spiral out of control. I know of at least
           | one startup that was pretty successful in finding product
           | market fit but sunk their own ship because they weren't able
           | to migrate off of App Engine quickly enough.
        
             | cglace wrote:
             | If you run on app engine flexible it shouldn't be hard to
             | migrate.
        
         | pier25 wrote:
         | Anyone knows which stack they are actually using?
        
           | guessmyname wrote:
           | Khan Academy uses Python [1], Google App Engine [2], React.js
           | [3] and recently Go [4] among other languages [5].
           | 
           | They have used Backbone.js [6] and experimented with other
           | programming languages like Kotlin [7].
           | 
           | Read their engineering blog [8] and you will know more and
           | maybe learn a few things.
           | 
           | [1] http://engineering.khanacademy.org/posts/python-
           | refactor-1.h...
           | 
           | [2] http://engineering.khanacademy.org/posts/transaction-
           | safety....
           | 
           | [3] http://engineering.khanacademy.org/posts/upgrade-buttons-
           | lin...
           | 
           | [4] http://engineering.khanacademy.org/posts/goliath.htm
           | 
           | [5] https://github.com/khan/
           | 
           | [6] http://engineering.khanacademy.org/posts/upgrade-buttons-
           | lin...
           | 
           | [7] http://engineering.khanacademy.org/posts/kotlin-
           | adoption.htm
           | 
           | [8] http://engineering.khanacademy.org/
        
         | snazz wrote:
         | The Go rewrite is in progress:
         | http://engineering.khanacademy.org/posts/goliath.htm
        
         | VWWHFSfQ wrote:
         | But notably missing from the blog post is how much this stack
         | costs to run at that scale. $100k/month? $200k/month?
        
         | minhazm wrote:
         | I know you're joking, but they actually are doing a Go rewrite.
         | https://engineering.khanacademy.org/posts/goliath.htm
        
       | freefriedrice wrote:
       | TL;DR: Load Balancers and a clear policy means the cloud works as
       | advertised.
       | 
       | Seven years ago I was at a medical conference in Portland, Oregon
       | with a panel of "experts" discussing the security and
       | accessibility of medical record systems and wearable devices.
       | There was a principal engineer from Intel on the panel. When
       | someone asked about the cloud, this tall, lanky, long-bearded man
       | with a thick accent stood up and said:
       | 
       | "The cloud? (chuckles) What is the cloud? Where is the cloud? Is
       | it over here? (Points to a table) Is it over there? (points to
       | another table). The cloud is a joke, man. It's a complete joke."
       | 
       | EDIT: added an anecdote for SEO. :)
        
       | cagenut wrote:
       | I love it when people have both the inclination and the political
       | pull to keep an environment super minimalist like this. Fastly to
       | AppEngine is a blazing fast combo and so well sorted to "just
       | work".
        
       | polote wrote:
       | If Khan Academy uses Youtube to serve their video and uses Fastly
       | to serve static content, what makes it hard to scale ?
       | 
       | I mean being able to scale that easily is a great thing, but is
       | there anything worth sharing with the world in their case ?
        
         | infogulch wrote:
         | Yes, for the same reason that researchers should publish null
         | results: _all_ of the data is useful. Getting confirmation that
         | a particular formulation of a strategy works or does not work
         | is valuable in and of itself, regardless of the exact outcome.
         | The only reason why it wouldn 't be valuable would be if there
         | were a plethora of similar reports of successfully scaling this
         | solution, which I do not, so their experience is very welcome.
         | 
         | To turn it around, why would you ever want someone to _not_
         | share their experience with the world if they took time to
         | write it down? It 's not like you must read it; it doesn't cost
         | you anything to exist. But if someone's experience adds to the
         | library of human knowledge, even a little bit, why would one
         | try to reject that?
        
         | dangoor wrote:
         | I need to write a blog post about this :)
         | 
         | A lot of people seem to think of Khan Academy as a bunch of
         | videos. Many have also seen the exercises and articles. Those
         | things _are_ all pretty static (though it gets more complex
         | when you consider how _much_ content there is and how many
         | languages it 's localized into).
         | 
         | There's a whole bunch of dynamic behavior around that static
         | content. Keeping track of progress to tell a learner how
         | they're doing, plus to help recommend the next place to go in
         | the content. Reporting on progress to parents and teachers.
         | Letting teachers create assignments and manage their
         | classrooms. Bubbling up information to school districts.
         | 
         | Content pages have discussions and clarifications. There are
         | notifications to tell students about new assignments, for
         | example.
         | 
         | There are connections to tests, like the SAT prep or
         | integration with the MAP test, which involve connecting our
         | accounts with external accounts in order to help students based
         | on those test results.
         | 
         | And a bunch of other stuff that isn't coming to mind right now
         | because I'm just naming things off the top of my head.
         | 
         | Doing all these things across a user base of millions of
         | monthly users can get quite involved.
        
         | SilasX wrote:
         | "How to scale: Make it somebody else's problem."
        
           | spyspy wrote:
           | A surprisingly legitimate solution.
        
             | SilasX wrote:
             | Yes, with the caveat that you may have to check that
             | they're actually capable of handling the load and you don't
             | get a surprise notice that "uh we can't do this, you're on
             | your own".
        
               | RcouF1uZ4gsC wrote:
               | I am pretty confident that YouTube will be able to scale
               | to handle any increased load coming from my service.
        
               | judge2020 wrote:
               | But they might not at any time, at least without
               | charging. With an apparent internal push for some
               | services to become self-sustaining (see Google Maps API,
               | Recaptcha), YouTube embeds might be next.
        
               | dangoor wrote:
               | Khan Academy today supports serving video outside of
               | YouTube, which is blocked in some schools. We could
               | essentially flip a switch to not use YouTube, but the
               | cost would be substantial because those videos go
               | Fastly->S3, so anything not in cache is going to result
               | in S3 egress charges.
        
               | toomuchtodo wrote:
               | It seems unlikely Khan Academy doesn't have the technical
               | competency to deploy their video content to an alternate
               | host rapidly, whether that be an object store (Backblaze)
               | or dedicated servers with very cheap bandwidth (Hetzner,
               | OVH, and similar), perhaps even using PeerTube.
               | 
               | There's a reason other non profits like Wikipedia and the
               | Internet Archive run their own hardware, networking, and
               | connectivity to transit providers. And before the "doing
               | that is expensive!" argument comes up, note how expensive
               | having someone else do these things are. Lots of margin
               | built into cloud services.
        
               | icelancer wrote:
               | They can. Doesn't mean they will.
        
           | HenryBemis wrote:
           | I'd rename it to "pay someone else to do it". In the case of
           | YouTube, even if the hosting is free, YT still makes a profit
           | (ads, tracking, etc.) so it's a win-win for all.
        
             | SilasX wrote:
             | Or, to phrase it differently: their technical problem, your
             | financial problem.
        
         | DevKoala wrote:
         | Yes. In this case, they share the fact they used common sense.
         | This is not saying that all of the other major re-architecture
         | blog posts are flawed. However, it is good to know that scaling
         | when using cloud vendor specific tools is as easy as
         | advertised.
        
         | tomnipotent wrote:
         | Like the rest of the system that isn't just videos and static
         | content?
        
           | cameronfraser wrote:
           | It's a cookie cutter app engine stack which scales
           | automatically
        
         | cbhl wrote:
         | There's quite a bit of dynamic content, no? Things like
         | exercise grading, and progression through the skill tree.
        
           | jonny_eh wrote:
           | Which is powered by Google's AppEngine, which scales very
           | easily, at least technically easily.
        
             | tyree731 wrote:
             | s/easily/expensively/
             | 
             | Khan Academy doesn't have infinite money.
        
               | ses1984 wrote:
               | what makes GAE expensive?
        
               | freehunter wrote:
               | Anything is expensive if you have to scale up beyond your
               | financial ability to do so.
        
         | kzrdude wrote:
         | I'd like to hear engineering stories from Youtube in that case.
         | What does their operation look like? Do they ever tell?
        
           | londons_explore wrote:
           | Make an educated guess about the design of youtubes serving
           | infrastructure, and I'm pretty certain you're right.
           | 
           | There's only really one sensible way to do it, and that's the
           | way it's done.
        
       | xhkkffbf wrote:
       | This is a good example of how cloud tools make this kind of
       | scaling easy.
       | 
       | The trickier part can be the cost-- which this piece notes will
       | increase roughly linearly with the number of users. If Khan
       | Academy is free, I think this means those who are generous are
       | going to need to keep giving to keep it that way. Let's step up,
       | everyone.
        
       | cheungyinglon wrote:
       | is fastly the best service for caching?
        
         | booi wrote:
         | Depends on who you ask but.. no they are not. But I think
         | they're pretty good from a cost/performance perspective.
        
         | stephenr wrote:
         | I haven't used them but I think one of the bigger benefits (for
         | some) is that they're running (a forked version of) varnish as
         | their caching proxy, and you can provide your own VCL for doing
         | "fancy" stuff.
        
         | batter wrote:
         | from what i've heard Fastly is cheaper
        
         | pier25 wrote:
         | I'm very happy with Cloudflare's workers so far.
         | 
         | You can store stuff in workers KV (sessions, images, complete
         | static sites, etc) even interact with their global cache with
         | an API.
        
       | [deleted]
        
       | ashtonkem wrote:
       | 2.5x is a surprisingly small jump for having all of your brick
       | and mortar competitors shut down for an indeterminate period of
       | time. Either Khan academy had amazing penetration into the
       | education space, or the follow through rates for kids educating
       | at home is abysmally low.
       | 
       | Disclaimer: I don't have children, so I have no real world
       | experience with Khan.
        
         | arkades wrote:
         | All the kids I know of (Jr High / High School) already actively
         | participate in Khan Academy - on some topics at school
         | requirement, on other topics at their own discretion because
         | they're already accustomed to the platform.
        
           | ShamelessC wrote:
           | I can see myself using it to learn things that my (admittedly
           | lousy) teachers weren't able to teach. But, is it really true
           | that teachers are straight up assigning Khan Academy material
           | as part of the course requirements? That's interesting to me
           | and I had no idea that was going on.
        
         | dangoor wrote:
         | There's a lot going on in your observation, and this is all
         | speculation on my part (even though I am a Khan Academy
         | employee).
         | 
         | We did have quite a bit of usage and awareness among schools
         | already before the shutdowns started. Couple that with there
         | being many options for teaching online... I wouldn't be
         | surprised if a lot of schools just switched to having their
         | teachers attempt to do their normal teaching via Zoom (which
         | sounds really hard to me!). Many schools had contracts of
         | various sorts with other online learning platforms.
         | 
         | Some schools or classes haven't had great follow through rates,
         | which is unfortunate, but educators all over have had to
         | quickly adjust. I suspect that more robust plans will be in
         | place by the fall, given how much uncertainty there is for fall
         | classes. Khan Academy is, at least, an always-free resource
         | that's there for people if they need it.
         | 
         | That 2.5x is starting from a large base, and there's also a lot
         | of activity in online education generally.
        
       | hinkley wrote:
       | Khan Academy is actively soliciting donations right now, as is
       | referenced in the footnote to the article:
       | 
       | > Khan Academy's increased usage has also increased our hosting
       | costs, and we're a not-for-profit that relies on philanthropic
       | donations from folks like you.
        
       | jordache wrote:
       | if they're just referencing youtube videos, what is there to
       | scale up? Speedier downloads of static content and repeat visits
       | from likely the relatively stable set of user base?
        
         | dangoor wrote:
         | This question is not uncommon, so I really should write a blog
         | post I can refer to. I've got another comment in this thread
         | about this: https://news.ycombinator.com/item?id=23171877
        
       | jliptzin wrote:
       | 2.5x in a week is news? I have worked on many things from viral
       | apps, blogs that get picked up by large news orgs, etc that need
       | to scale sometimes 100x or more in a day.
        
         | bdibs wrote:
         | 1 to 100 is easier than 100M to 250M (made up numbers, but you
         | get the point).
        
           | polote wrote:
           | it depends on what you have to scale, if this is just php
           | rendering, this is still relatively easy to do.
        
         | iliaznk wrote:
         | You mean like scaling from 1 user to 100 users?
        
           | jliptzin wrote:
           | No, something like 1,000 to 100,000
        
         | dangoor wrote:
         | As an extreme, can you imagine if Facebook doubled its usage in
         | a week? It depends on what the site does and what floor it's
         | starting from.
         | 
         | We (Khan Academy) are certainly not at Facebook's scale, but we
         | do run a site with a lot of dynamic behavior and millions of
         | monthly users. If we had our own bare metal in data centers, we
         | either would have been way overprovisioned or would have been
         | scrambling to keep up.
        
       | sdan wrote:
       | Love these sort of explanations on how companies and people run
       | their infra. Great job KA!
        
         | jordache wrote:
         | it's a rather high level, low in insightful detail article.
        
       ___________________________________________________________________
       (page generated 2020-05-13 23:00 UTC)