[HN Gopher] How we successfully handled 2.5x traffic in a week ___________________________________________________________________ How we successfully handled 2.5x traffic in a week Author : talonx Score : 87 points Date : 2020-05-13 17:01 UTC (5 hours ago) (HTM) web link (engineering.khanacademy.org) (TXT) w3m dump (engineering.khanacademy.org) | [deleted] | programminggeek wrote: | On a largely content based app/site, most of "scaling" comes down | to caching. However you do that is up to you, but somewhere | between caching at the browser layer, proxy layer, web server | layer, or memcache layer, things should be fast and scalable | without getting too fancy. | parhamn wrote: | Notably: | | - No Rust/Go rewrite | | - GC not disabled | | - Didn't apply the latest research on k/v storage | | Jokes aside, this is the fun parts of hosted software and glad to | hear the "things don't have to be so hard" side of things. Hope | it continues working out! | spyspy wrote: | If you're already using GCP, my general advice for new projects | is almost always some form of "just throw it on AppEngine". No, | you don't need multi-region deployments. No, you don't need | 32TB of memory per instance. No, you do not need kubernetes. | No, istio is not going to solve this. No, you're not hosting | your own kafka cluster. | | I've found devs are always trying to over-engineer complex | solutions to dead simple problems. Just let Google do it and | get some sleep. | gamegod wrote: | "Just put all your eggs in one basket!" | | K | batter wrote: | AppEngine doesn't scale fast, gradual traffic increase fits | better for AE. We have spikes x 1000 and back within one | minute, stable 10 boxes with average hardware (compute | engine) handle it more reliably than more than 100 AE | Golang instances which will be hammering everything | downstream. Also GAE costs will be insane compared to | compute engine. | loktarogar wrote: | it's easy to put them all in one basket if you only have | one or two eggs | infogulch wrote: | Just don't pretend that the two eggs you start with are | twenty just yet. | sg47 wrote: | Nope, just one or two eggs. When those eggs hatch and you | have a massive chicken farm, you can start putting your | eggs in multiple baskets. | freehunter wrote: | Starting simple isn't putting all your eggs in one basket. | In fact it's such a common recommendation (and so commonly | forgotten about) that there are several phrases designed | just to teach this one lesson: | | KISS - Keep it simple, silly | | YAGNI - You ain't gonna need it | | MVP - Minimum viable product | | Overengineering | | Real artists ship | | I'm sure there's a ton more examples but that's just off | the top of my head. Point is, until you know that you need | high availability and multi-zone disaster recovery and etc | etc, just engineer it for the problems you actually have. | realbarack wrote: | For new projects sure, but you need an escape hatch. App | Engine costs can spiral out of control. I know of at least | one startup that was pretty successful in finding product | market fit but sunk their own ship because they weren't able | to migrate off of App Engine quickly enough. | cglace wrote: | If you run on app engine flexible it shouldn't be hard to | migrate. | pier25 wrote: | Anyone knows which stack they are actually using? | guessmyname wrote: | Khan Academy uses Python [1], Google App Engine [2], React.js | [3] and recently Go [4] among other languages [5]. | | They have used Backbone.js [6] and experimented with other | programming languages like Kotlin [7]. | | Read their engineering blog [8] and you will know more and | maybe learn a few things. | | [1] http://engineering.khanacademy.org/posts/python- | refactor-1.h... | | [2] http://engineering.khanacademy.org/posts/transaction- | safety.... | | [3] http://engineering.khanacademy.org/posts/upgrade-buttons- | lin... | | [4] http://engineering.khanacademy.org/posts/goliath.htm | | [5] https://github.com/khan/ | | [6] http://engineering.khanacademy.org/posts/upgrade-buttons- | lin... | | [7] http://engineering.khanacademy.org/posts/kotlin- | adoption.htm | | [8] http://engineering.khanacademy.org/ | snazz wrote: | The Go rewrite is in progress: | http://engineering.khanacademy.org/posts/goliath.htm | VWWHFSfQ wrote: | But notably missing from the blog post is how much this stack | costs to run at that scale. $100k/month? $200k/month? | minhazm wrote: | I know you're joking, but they actually are doing a Go rewrite. | https://engineering.khanacademy.org/posts/goliath.htm | freefriedrice wrote: | TL;DR: Load Balancers and a clear policy means the cloud works as | advertised. | | Seven years ago I was at a medical conference in Portland, Oregon | with a panel of "experts" discussing the security and | accessibility of medical record systems and wearable devices. | There was a principal engineer from Intel on the panel. When | someone asked about the cloud, this tall, lanky, long-bearded man | with a thick accent stood up and said: | | "The cloud? (chuckles) What is the cloud? Where is the cloud? Is | it over here? (Points to a table) Is it over there? (points to | another table). The cloud is a joke, man. It's a complete joke." | | EDIT: added an anecdote for SEO. :) | cagenut wrote: | I love it when people have both the inclination and the political | pull to keep an environment super minimalist like this. Fastly to | AppEngine is a blazing fast combo and so well sorted to "just | work". | polote wrote: | If Khan Academy uses Youtube to serve their video and uses Fastly | to serve static content, what makes it hard to scale ? | | I mean being able to scale that easily is a great thing, but is | there anything worth sharing with the world in their case ? | infogulch wrote: | Yes, for the same reason that researchers should publish null | results: _all_ of the data is useful. Getting confirmation that | a particular formulation of a strategy works or does not work | is valuable in and of itself, regardless of the exact outcome. | The only reason why it wouldn 't be valuable would be if there | were a plethora of similar reports of successfully scaling this | solution, which I do not, so their experience is very welcome. | | To turn it around, why would you ever want someone to _not_ | share their experience with the world if they took time to | write it down? It 's not like you must read it; it doesn't cost | you anything to exist. But if someone's experience adds to the | library of human knowledge, even a little bit, why would one | try to reject that? | dangoor wrote: | I need to write a blog post about this :) | | A lot of people seem to think of Khan Academy as a bunch of | videos. Many have also seen the exercises and articles. Those | things _are_ all pretty static (though it gets more complex | when you consider how _much_ content there is and how many | languages it 's localized into). | | There's a whole bunch of dynamic behavior around that static | content. Keeping track of progress to tell a learner how | they're doing, plus to help recommend the next place to go in | the content. Reporting on progress to parents and teachers. | Letting teachers create assignments and manage their | classrooms. Bubbling up information to school districts. | | Content pages have discussions and clarifications. There are | notifications to tell students about new assignments, for | example. | | There are connections to tests, like the SAT prep or | integration with the MAP test, which involve connecting our | accounts with external accounts in order to help students based | on those test results. | | And a bunch of other stuff that isn't coming to mind right now | because I'm just naming things off the top of my head. | | Doing all these things across a user base of millions of | monthly users can get quite involved. | SilasX wrote: | "How to scale: Make it somebody else's problem." | spyspy wrote: | A surprisingly legitimate solution. | SilasX wrote: | Yes, with the caveat that you may have to check that | they're actually capable of handling the load and you don't | get a surprise notice that "uh we can't do this, you're on | your own". | RcouF1uZ4gsC wrote: | I am pretty confident that YouTube will be able to scale | to handle any increased load coming from my service. | judge2020 wrote: | But they might not at any time, at least without | charging. With an apparent internal push for some | services to become self-sustaining (see Google Maps API, | Recaptcha), YouTube embeds might be next. | dangoor wrote: | Khan Academy today supports serving video outside of | YouTube, which is blocked in some schools. We could | essentially flip a switch to not use YouTube, but the | cost would be substantial because those videos go | Fastly->S3, so anything not in cache is going to result | in S3 egress charges. | toomuchtodo wrote: | It seems unlikely Khan Academy doesn't have the technical | competency to deploy their video content to an alternate | host rapidly, whether that be an object store (Backblaze) | or dedicated servers with very cheap bandwidth (Hetzner, | OVH, and similar), perhaps even using PeerTube. | | There's a reason other non profits like Wikipedia and the | Internet Archive run their own hardware, networking, and | connectivity to transit providers. And before the "doing | that is expensive!" argument comes up, note how expensive | having someone else do these things are. Lots of margin | built into cloud services. | icelancer wrote: | They can. Doesn't mean they will. | HenryBemis wrote: | I'd rename it to "pay someone else to do it". In the case of | YouTube, even if the hosting is free, YT still makes a profit | (ads, tracking, etc.) so it's a win-win for all. | SilasX wrote: | Or, to phrase it differently: their technical problem, your | financial problem. | DevKoala wrote: | Yes. In this case, they share the fact they used common sense. | This is not saying that all of the other major re-architecture | blog posts are flawed. However, it is good to know that scaling | when using cloud vendor specific tools is as easy as | advertised. | tomnipotent wrote: | Like the rest of the system that isn't just videos and static | content? | cameronfraser wrote: | It's a cookie cutter app engine stack which scales | automatically | cbhl wrote: | There's quite a bit of dynamic content, no? Things like | exercise grading, and progression through the skill tree. | jonny_eh wrote: | Which is powered by Google's AppEngine, which scales very | easily, at least technically easily. | tyree731 wrote: | s/easily/expensively/ | | Khan Academy doesn't have infinite money. | ses1984 wrote: | what makes GAE expensive? | freehunter wrote: | Anything is expensive if you have to scale up beyond your | financial ability to do so. | kzrdude wrote: | I'd like to hear engineering stories from Youtube in that case. | What does their operation look like? Do they ever tell? | londons_explore wrote: | Make an educated guess about the design of youtubes serving | infrastructure, and I'm pretty certain you're right. | | There's only really one sensible way to do it, and that's the | way it's done. | xhkkffbf wrote: | This is a good example of how cloud tools make this kind of | scaling easy. | | The trickier part can be the cost-- which this piece notes will | increase roughly linearly with the number of users. If Khan | Academy is free, I think this means those who are generous are | going to need to keep giving to keep it that way. Let's step up, | everyone. | cheungyinglon wrote: | is fastly the best service for caching? | booi wrote: | Depends on who you ask but.. no they are not. But I think | they're pretty good from a cost/performance perspective. | stephenr wrote: | I haven't used them but I think one of the bigger benefits (for | some) is that they're running (a forked version of) varnish as | their caching proxy, and you can provide your own VCL for doing | "fancy" stuff. | batter wrote: | from what i've heard Fastly is cheaper | pier25 wrote: | I'm very happy with Cloudflare's workers so far. | | You can store stuff in workers KV (sessions, images, complete | static sites, etc) even interact with their global cache with | an API. | [deleted] | ashtonkem wrote: | 2.5x is a surprisingly small jump for having all of your brick | and mortar competitors shut down for an indeterminate period of | time. Either Khan academy had amazing penetration into the | education space, or the follow through rates for kids educating | at home is abysmally low. | | Disclaimer: I don't have children, so I have no real world | experience with Khan. | arkades wrote: | All the kids I know of (Jr High / High School) already actively | participate in Khan Academy - on some topics at school | requirement, on other topics at their own discretion because | they're already accustomed to the platform. | ShamelessC wrote: | I can see myself using it to learn things that my (admittedly | lousy) teachers weren't able to teach. But, is it really true | that teachers are straight up assigning Khan Academy material | as part of the course requirements? That's interesting to me | and I had no idea that was going on. | dangoor wrote: | There's a lot going on in your observation, and this is all | speculation on my part (even though I am a Khan Academy | employee). | | We did have quite a bit of usage and awareness among schools | already before the shutdowns started. Couple that with there | being many options for teaching online... I wouldn't be | surprised if a lot of schools just switched to having their | teachers attempt to do their normal teaching via Zoom (which | sounds really hard to me!). Many schools had contracts of | various sorts with other online learning platforms. | | Some schools or classes haven't had great follow through rates, | which is unfortunate, but educators all over have had to | quickly adjust. I suspect that more robust plans will be in | place by the fall, given how much uncertainty there is for fall | classes. Khan Academy is, at least, an always-free resource | that's there for people if they need it. | | That 2.5x is starting from a large base, and there's also a lot | of activity in online education generally. | hinkley wrote: | Khan Academy is actively soliciting donations right now, as is | referenced in the footnote to the article: | | > Khan Academy's increased usage has also increased our hosting | costs, and we're a not-for-profit that relies on philanthropic | donations from folks like you. | jordache wrote: | if they're just referencing youtube videos, what is there to | scale up? Speedier downloads of static content and repeat visits | from likely the relatively stable set of user base? | dangoor wrote: | This question is not uncommon, so I really should write a blog | post I can refer to. I've got another comment in this thread | about this: https://news.ycombinator.com/item?id=23171877 | jliptzin wrote: | 2.5x in a week is news? I have worked on many things from viral | apps, blogs that get picked up by large news orgs, etc that need | to scale sometimes 100x or more in a day. | bdibs wrote: | 1 to 100 is easier than 100M to 250M (made up numbers, but you | get the point). | polote wrote: | it depends on what you have to scale, if this is just php | rendering, this is still relatively easy to do. | iliaznk wrote: | You mean like scaling from 1 user to 100 users? | jliptzin wrote: | No, something like 1,000 to 100,000 | dangoor wrote: | As an extreme, can you imagine if Facebook doubled its usage in | a week? It depends on what the site does and what floor it's | starting from. | | We (Khan Academy) are certainly not at Facebook's scale, but we | do run a site with a lot of dynamic behavior and millions of | monthly users. If we had our own bare metal in data centers, we | either would have been way overprovisioned or would have been | scrambling to keep up. | sdan wrote: | Love these sort of explanations on how companies and people run | their infra. Great job KA! | jordache wrote: | it's a rather high level, low in insightful detail article. ___________________________________________________________________ (page generated 2020-05-13 23:00 UTC)