[HN Gopher] Writing a Mini-CDN to Learn Nginx/Prometheus/Grafana...
       ___________________________________________________________________
        
       Writing a Mini-CDN to Learn Nginx/Prometheus/Grafana/Lua
        
       Author : dreampeppers99
       Score  : 244 points
       Date   : 2022-12-26 12:17 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | vitorbaptistaa wrote:
       | Beautifully written! Thanks for sharing, Leandro.
        
         | dreampeppers99 wrote:
         | <3
        
       | daniels1006 wrote:
       | Great content, helpful and inspiring.
       | 
       | Thanks!
        
       | xmorse wrote:
       | The hard part of building a CDN is scaling it. The best approach
       | imo is to use fly.io to host an anycast IP (with horizontal
       | scaling) and store cache files on disk
       | 
       | Fly.io also has a Grafana dashboard built in for your machines
        
         | berndinox wrote:
         | Agree, Fly.io is great for such usecases. Is there any
         | CDN/Proxy solution or guide available for fly?
        
           | iampims wrote:
           | https://fly.io/blog/the-5-hour-content-delivery-network/
        
           | [deleted]
        
       | nnadams wrote:
       | Is it possible for CDNs to cache per URL per user? I'm thinking
       | of something like /favorites where one URL would list something
       | different for everyone. When I've setup caching on backend it was
       | keyed off the user.
       | 
       | This was a very informative read!
        
         | Matthias247 wrote:
         | You can configure whether the cache key includes a particular
         | header or query parameter in a lot of CDNs. So as long as your
         | user identify is transmitted in one of those, it would work.
        
         | jay6282 wrote:
         | User-aware CDN would require scripting of some kind to handle
         | sessions. However, if the data is not sensitive you could use
         | random string uris to publicly available files. That way it is
         | difficult to guess/brute force the url to the files.
         | (sensitive=person identifiable data)
        
           | mnutt wrote:
           | Many CDNs support caching based on a particular cookie value,
           | incorporating it into the cache key. I'd just be extra
           | careful, the worst case for many server settings is an
           | inoperable service but choosing the wrong cache key can
           | easily result in a data leak. (serving one user's response to
           | another user)
        
         | rmetzler wrote:
         | I don't know why you want to hurt yourself.
         | 
         | If these are public, put them on /favorites/$USERNAME or
         | something similar. If they are private, don't cache them.
         | 
         | You can cache with specific headers as cache keys, but I would
         | advise against doing this too much / abusing it. It really
         | makes caching complicated. And from a data privacy standpoint
         | it's better to opt-in into caching. I've witnessed incidents
         | where visitors saw the private profile page of another user,
         | because it was cached in the CDN.
        
         | nesarkvechnep wrote:
         | You can use the `Vary` header.
        
       | friendlyHornet wrote:
       | Thanks for this
        
         | dreampeppers99 wrote:
         | my pleasure
        
       | chrsig wrote:
       | I'm curious if any HNers have opinions on prometheus vs other
       | time series databases like influxdb?
       | 
       | I periodically consider a grafana & backend setup for when
       | datadog becomes cost prohibitive for metrics with several tags.
        
         | beardedetim wrote:
         | At $dayjob we're considering replacing DataDog with Grafana and
         | friends, already using it elsewhere to great affect.
         | 
         | Haven't used influxdb yet so can't speak as a comparison but
         | from my usage, I'm sold on Grafana, Loki, Prometheus, and
         | friends over DataDog. It mixed with OTel have been a real
         | pleasure to use.
        
         | firstSpeaker wrote:
         | Go with Mimir. It is Prometheus compatible and horizontally
         | scalable for read/write path separately.
         | 
         | Mimir: https://github.com/grafana/mimir
        
           | flyingsky wrote:
           | You did not answer OPs question tho'. prometheus vs influxDB.
        
           | [deleted]
        
         | xiwenc wrote:
         | We have been using prometheus at a client for little over a
         | year now. Since we need to keep metrics for years, prometheus
         | cannot seem to be able to deal with it well. One behavior we
         | observed is it crashes consistently in k8s. We couldn't pin
         | down the root cause but suspect it's the amount of metrics we
         | collect continuously and keep (archive).
         | 
         | Now we are considering to switch to thanos or mimir.
        
       | sandGorgon wrote:
       | this is very very cool! One thing i would definitely like to see
       | is domain name resolution. Shopify, Dukaan, Vercel all make a big
       | deal out of it ...going all the way to BGP.
       | 
       | https://twitter.com/subhashchy/status/1536769406801309696
        
       | asjkaehauisa wrote:
       | Why didn't you use varnish for that?
        
         | tecleandor wrote:
         | I guess it's "...to Learn Nginx/Prometheus/Grafana/Lua".
         | 
         | Per the first line of the link: "The objective of this repo is
         | to build a body of knowledge on how CDNs work by coding one
         | from "scratch". "
        
       | jay6282 wrote:
       | The hard part of building a CDN is to know when you need it.
       | 99.9% of all websites with CDN do not need it. Serving static
       | files consumes so little resources that a single server can serve
       | billions of users as long as you dont use script for serving the
       | file. The most cost-effective with also the lowest latency
       | solution is to never use CDN. If your webserver provider charge
       | you a lot for traffic you are better off using another provider.
        
         | youngtaff wrote:
         | > The most cost-effective with also the lowest latency solution
         | is to never use CDN
         | 
         | The lowest latency solution is to put the content near the user
         | and a CDN is probably the easiest way of doing that if someone
         | needs to server a geographically dispersed audience
        
         | latchkey wrote:
         | > _The most cost-effective with also the lowest latency
         | solution is to never use CDN._
         | 
         | CloudFlare is free at my tier and gives me the ability to have
         | the lowest latency.
        
       | mnutt wrote:
       | This is nicely written, and a lot of it mirrors my experience
       | using nginx as a pseudo-cdn. Another area worth exploring might
       | be http3, ssl session caching, and general latency/ttfb
       | optimizations.
        
       | jeacken wrote:
       | Another example of a project duped into thinking Lua is
       | "powerful". It is small. That is it. Lua has near zero useful
       | functionality and makes the developer repeatedly reinvent
       | functionality over and over and over again.
       | 
       | https://media1.giphy.com/media/TFO2mwVPIFoOJcuTSC/giphy.gif
        
         | klelatti wrote:
         | Would you like to expand on why you think Lua is a bad choice
         | for this particular project and what you would have used
         | instead. That would be much more helpful than a generic attack
         | on the language itself.
        
         | berkut wrote:
         | It's small, fast, and doesn't have a GIL lock, so concurrent
         | executions are trivial.
        
       | hardwaresofton wrote:
       | It would be nice to discuss the common approaches to global name
       | resolution --- anycast vs geo-routing.
        
         | wrigby wrote:
         | IIRC the industry standard is to serve your authoritative DNS
         | with anycast, and have those servers do geo-based dns
         | resolution to shift HTTP traffic to a nearby edge POP.
        
       | zspace2 wrote:
       | Very good project. thanks for sharing
        
       ___________________________________________________________________
       (page generated 2022-12-26 23:00 UTC)