[HN Gopher] Redis explained
       Redis explained
       Author : mgrouchy
       Score  : 319 points
       Date   : 2022-08-11 15:06 UTC (7 hours ago)
 (HTM) web link (architecturenotes.co)
 (TXT) w3m dump (architecturenotes.co)
       | jrm4 wrote:
       | As someone who doesn't code for a living but teaches it to mostly
       | novices, this helps (because before this I had no clue what it
       | was except that it had something to do with databases.) Typically
       | for my courses we just use some flavor of SQL and call it a day
       | (and that kind of spoils us because of how declarative it tends
       | to be) -- roughly, what's the "explain like I'm 10" use case for
       | Redis over something else? From what I'm seeing, it's mostly an
       | "efficiency" thing?
         | tmpz22 wrote:
         | The traditional line of thinking is:
         | * you're building a web-ish application and need to store
         | session data
         | * you don't want to go through the overhead of building a
         | strongly typed relational table
         | * you know minimal operations stuff
         | * just use redis, its easy to deploy, easy to code for, and
         | available on all major cloud platforms as a managed service
         | ---
         | The problem is there are tradeoffs and session storage becomes
         | a fundamental architectural decision once your application
         | matures. So something you added as a once-off so you can get
         | back to feature development is now a foundational pillar.
         | avmich wrote:
         | Relational databases are optimized for typical operations over
         | data structured in tables. So, joins and records. However
         | sometime you want something simpler - like LIFO queue - and
         | wouldn't mind to have is faster. Redis allows to have this -
         | the variety of data structures it has is much bigger than with
         | relational databases. They (Redis and RDBs) both have their
         | uses, of course. Ideally you would structure your system to use
         | one of them where appropriate according to data requests.
           | cerved wrote:
           | What data structures?
             | detaro wrote:
             | https://redis.io/docs/data-types/
         | TheBlight wrote:
         | If you're not super concerned about reliability but really need
         | speed. That's when Redis really makes the most sense, IMO.
         | notyourday wrote:
         | Blazingly fast serialized access to the shared data structures
         | over the network by multiple writers and readers.
         | hobs wrote:
         | I have worked at places where every page load hits the
         | database, and we've scaled ok, mainly because it was b2b stuff.
         | However a simple redis instance in front of the database
         | serving as a readable cache changes the rules of the game
         | significantly - depending on the complexity of your calculation
         | and your end result subsequent "page loads" or whatever you are
         | doing can be tens of thousands (or more) times as efficient,
         | and if you decide to use an expensive database or a cloud
         | database this can help you a lot.
         | Eventually the hard part is you might have bugs in
         | synchronizing the state of redis and your database, look to
         | existing implementations for your stack instead of reinventing
         | the wheel.
       | lawrencevillain wrote:
       | Really love this style of writing. Pairing the
       | diagrams/illustrations with the easy to grok copy is really
       | helpful for folks like myself who have been mainly focused on the
       | front-end.
       | What tool do you use for your diagramming, is it all hand-drawn?
         | dsmmcken wrote:
         | Font for the handwriting is Skippy Sharp, incase anyone else
         | was wondering.
         | googletron wrote:
         | Its hand drawn with some fonts for the titles.
           | stjohnswarts wrote:
           | This is what I do for my presentations (on a wacom of
           | course). I have gotten grief over it, but I work faster and
           | get less distracted by the eccentricacies of powerpoint and
           | figma. My handwriting is abysmal so I will do that in a
           | "handwriting font" to sort of look hand drawn. Even if I have
           | to convert them later for some big wig, at least I have my
           | "rough draft". Plus it all feels a little more human.
       | dsmmcken wrote:
       | Beautifully presented, worth reading just for the illustrations.
       | topspin wrote:
       | I am thinking of using Redis as a lightweight queuing mechanism.
       | An event source will MULTI a small amount of metadata as a hash
       | and append a list. Event sinks will BLPOP the list and retrieve
       | and delete the metadata key. One requirement is the events
       | survive power loss.
       | Is there anything inherently wrong with this? Gotchas? A mockup
       | I've done works great so far.
         | CSDude wrote:
         | There used to be disque by antirez, which died.
         | https://github.com/antirez/disque
           | topspin wrote:
           | See my other reply where the disque author talks about
           | exactly that.
         | renonce wrote:
         | In case the event sink crashes or the connection to Redis is
         | lost, you could lose events. Redis Streams are better designed
         | for use cases where more reliable delivery is needed and have a
         | ton more features, though it comes with more complexity.
           | topspin wrote:
           | I hadn't looked at streams yet. Thanks.
           | This page from AWS about Redis streams goes exactly to my use
           | case: Redis Streams and Message Queues:
           | https://aws.amazon.com/redis/Redis_Streams_MQ/
         | zo1 wrote:
         | RabbitMQ. It's so cheap and easy to startup a super performant
         | queuing broker with docker these days. And the libraries are
         | all there, async ready and with established patterns. Closest
         | to zero code you can get for this. You'll likely end up
         | reimplementing all those patterns and support around them using
         | redis.
         | If you want something quick and easy and dirty, go with Redis.
         | But switch to Rabbit when you start having to write a lot of
         | handling and other code.
       | btilly wrote:
       | I think a few more concrete use cases would help.
       | First, a key limitation that every architect should pay
       | attention. Redis reaches the limits of what you can do in well-
       | written single-threaded C. One of those limits is that you
       | really, _really_ , *really* don't want to go outside of RAM.
       | Think about what is stored, and be sure not to waste space. (It
       | is surprisingly easy to leak memory.)
       | Second, another use case. Replication in Redis is cheap. If your
       | data is small and latency is a concern (eg happened to me with an
       | adserver), then you can locate read-only Redis replicas
       | everywhere. The speed of querying off of your local machine is
       | not to be underestimated.
       | And third, it is worth spending time mastering Redis data
       | structures. For example suppose you have a dynamic leaderboard
       | for an active game. A Redis sorted set will happily let you
       | instantly display any page of that leaderboard, live, with 10
       | million players and tens of thousands of updates per second.
       | There are a lot of features like that which will be just perfect
       | for the right scenario.
         | koolba wrote:
         | > One of those limits is that you really, really, _really_ don
         | 't want to go outside of RAM. Think about what is stored, and
         | be sure not to waste space. (It is surprisingly easy to leak
         | memory.)
         | You can have massive amounts of RAM these days. You're sooner
         | to hit big-O limits from bad architectural decisions than run
         | out of memory. If you do get to that point you likely have
         | enough value in your usage to justify scaling out further and
         | sharding.
         | > And third, it is worth spending time mastering Redis data
         | structures.
         | Bingo. The true secret to properly using Redis: understanding
         | the big-O complexity of each operation (...and ensuring that
         | none of your interactions are more than logarithmic).
           | btilly wrote:
           | _You can have massive amounts of RAM these days. You're
           | sooner to hit big-O limits from bad architectural decisions
           | than run out of memory. If you do get to that point you
           | likely have enough value in your usage to justify scaling out
           | further and sharding._
           | Absolute disagreement.
           | It is very easily to accidentally leak a few hundred MB per
           | week in a busy Redis system. The code will look and work
           | fine...at first. It is correspondingly hard to track down and
           | clean up the leak a few months later. (Particularly if there
           | are multiple such to track down.) Yes, you can go for years
           | just buying larger and larger EC2 instances. But that will
           | also come with a shocking price tag.
           | I know of a number of organizations that this happened to.
           | And pretty much every bad Redis story I hear about had this
           | as a root cause. That is why I brought it up as an important
           | consideration.
             | jasonwatkinspdx wrote:
             | Yes, this matches my experience.
             | Redis excels as a memcached alternative with some useful
             | operations. Where people get into trouble with redis is
             | treating it as a persistent data store, when despite it's
             | ability to replicate and persist, redis has some
             | constraints you need to work within. At best think of redis
             | as something that can hold a materialized view, but where
             | it can become corrupted at any random time, so you'll need
             | the ability to rematerialized it from something else. And
             | second, you absolutely have to be conscious of how close
             | you are to ram limits.
             | renonce wrote:
             | Redis is production-ready and it has a lot of features to
             | help you track down problems with either memory or CPU
             | usage. For example: `redis-cli --bigkeys` will help you
             | find the very large keys. For smaller keys that occur too
             | often, sampling a few hundred keys should be sufficient to
             | help you find what type of keys are taking more space than
             | necessary.
             | Once you get the Redis database designed well, there is a
             | lot of things you can do before hitting the limit where you
             | can't install any more RAMs onto a new machine. For
             | example, there are no more than a billion .com domains out
             | there. Say a single record takes 100 bytes on average,
             | consisting of the domain name and a glue record pointing to
             | the IP of its authoritative DNS server. Then it takes just
             | 100GB of memory to store enough information to handle all
             | queries to .com domains in the world. It's not so hard to
             | obtain a machine with 768GB memory these days, and 2TB
             | machines are not uncommon.
             | And if you worry about the price tag - don't use EC2. You
             | can rent a 1TB RAM dedicated server at
             | https://www.hetzner.com/dedicated-
             | rootserver/ax161/configura... for $600 per month. At
             | Scaleway you can rent it for $1000 per month: https://www.s
             | caleway.com/en/pricing/?tags=baremetal,availabl.... AWS is
             | notoriously hard to be made cost effective.
             | jbboehr wrote:
             | You can also "leak" rows in a traditional RDBMS or even a
             | filesystem. Why is this particular notable for Redis?
               | btilly wrote:
               | A traditional RDBMs or filesystem is designed for high
               | throughput and concurrency, even if some tasks are
               | blocked on data. Additionally both have options to
               | partition steadily growing things. If needed with old
               | partitions being moved to tape backup while the server
               | continues running.
               | Redis is a single threaded program acting against RAM
               | whose philosophy is that it does things fast then moves
               | to the next job. If it needs to access memory that got
               | paged to disk, the whole server stops and waits to get
               | it. Nobody can do anything.
               | Because Redis doesn't have to deal with locking and
               | concurrency, it can run much faster on the same
               | resources. But when concurrency is required, it is stuck
               | because it doesn't have it.
           | itake wrote:
           | > You can have massive amounts of RAM these days.
           | True, but I am finding that balancing CPU and RAM can be
           | tricky. Slapping 128GB on a 1-core machine means you quickly
           | have CPU limitations.
             | tomnipotent wrote:
             | Redis is single-threaded and will have no problem
             | saturating a 10G NIC with a single socket.
               | itake wrote:
               | My concern is how fast it takes a CPU to scan through all
               | of that memory.
           | googletron wrote:
           | > understanding the big-O complexity of each operation
           | (...and ensuring that none of your interactions are more than
           | logarithmic).
           | This is a good idea, maybe a prompt for another post.
         | LewisVerstappen wrote:
         | > Replication in Redis is cheap. If your data is small and
         | latency is a concern (eg happened to me with an adserver), then
         | you can locate read-only Redis replicas everywhere. The speed
         | of querying off of your local machine is not to be
         | underestimated.
         | Do you face any consistency issues with doing this?
           | btilly wrote:
           | No. Replication time was measured in hundredths of a second,
           | and Redis operations are atomic. So all queries got a
           | consistent view of the data, and the lag to update was very
           | reasonable.
             | anonymousDan wrote:
             | So in other words, potentially yes since there is some lag
             | :)?
       | groffee wrote:
       | It's a good article, but a couple of hopefully constructive
       | points
       | 1, .toc-wrap covers the image on desktop
       | 2, the image is way too busy, there's too much going on
       | its_bbq wrote:
       | I've been looking into tech stacks to make a collaborative editor
       | and Redis CRDTs come up a lot. IIUC this requires a Redis db
       | running in each users machine and they connect P2P with each
       | other. Do I understand right? Anyone have good resources for
       | this? I've also seen Riak come up as an alternative. Do they work
       | similarly?
       | dtertman wrote:
       | At desktop resolution, the floating table of contents menu blocks
       | out two of the (excellent) illustrations (second and second-
       | last). Deleting aside.toc was very helpful.
         | dsmmcken wrote:
         | Yes, I would suggest increasing z-index on images so they pass
         | above the toc. Adding a large dropshadow to the images the same
         | color as the background would make it look like it fades out as
         | it passes by. That's what I did for our blog that has a similar
         | floating TOC + images that escape the text width.
       | xnorswap wrote:
       | I'm not too familiar with redis and this may well help, so thank
       | you.
       | I see some data-types on the right. It surprises me that redis
       | doesn't have a numeric data type. I understand that at its heart
       | it is just a key-value store and doesn't ever need to do range-
       | based lookup but it still surprises me.
       | One consequence of "everything is a string" I've run into
       | (although probably a sign I'm "doing it wrong"), is serialisation
       | overhead in the client.
       | If redis is expecting strings then it's left to the client to
       | choose an appropriate serialisation which can have either
       | performance or other pitfalls.
         | morelisp wrote:
         | How would a native number type avoid some serialization
         | overhead that using e.g. 4 byte BE keys yourself must pay?
         | voxic11 wrote:
         | Numbers in redis can be natively represented using BITFIELDS.
         | > BITFIELD player:1:stats SET u32 #0 1000
         | 1) (integer) 0
         | > BITFIELD player:1:stats INCRBY u32 #0 -900
         | 1) (integer) 100
         | > BITFIELD player:1:stats GET u32 #0
         | 1) (integer) 100
           | xnorswap wrote:
           | OK, that's helpful thank you.
           | That said, all the keys themselves are still strings and
           | therefore you can't have a SET of numbers or bitfields.
         | [deleted]
       | witnesser wrote:
       | The only person stand out to witness a use case is a adserverer,
       | I read the 1st 100lines of comments. It is like california
       | highway system particularly when I witnessed, the billboard is
       | very outstanding. The jams an pits, people are very nice to them.
         | witnesser wrote:
         | The above is just random comment. So I have a long time
         | question, how cache miss is handled.
       | rfrey wrote:
       | A question so noob I'm almost shy to ask it:
       | The simplest scenario in the article is a single Redis instance
       | residing on the same machine as the application. What's the
       | benefit to this versus just storing data directly within the
       | application?
         | louissm_it wrote:
         | Storing the data directly inside the application still means
         | you need to store it somewhere, likely a SQL database (such as
         | PostgreSQL). These databases are insanely well engineered and
         | very very fast, but compared to a key value store such as Redis
         | and Memchached they are comparatively slow and resource hungry
         | (because they are optimized for different things).
         | So if you can fetch some cached data from a Redis key, even if
         | on the same machine, it will cost you significantly less than
         | querying a relational database.
         | halukakin wrote:
         | Not all applications can store data out of the box. For
         | instance some ways of PHP have embedded caches, some others
         | don't have cache by default and you would need to install cache
         | software (for instance apcu). Also, redis has many different
         | types of data. For instance coding something similar to its
         | "hash" data type is not trivial.
         | ok123456 wrote:
         | Short lived processes/workers.
         | radicalriddler wrote:
         | Redis persists on disk (well, it's optional), if you restart
         | your server I'd assume that it'd be able to restore the disk
         | data into memory, versus your applications memory, which would
         | just be lost.
         | I'm not a Redis user, but that's based on what I've read
         | piaste wrote:
         | Your application and runtime are probably tuned to act as
         | servers, with short-lived requests and little or no persistent
         | stage, and they may not play well with keeping a bunch of
         | persistent data around forever.
         | I personally first reached for Redis when I needed to
         | asynchronously process a bunch of JSON uploaded by clients via
         | POST. I initially just stuck them in a ConcurrentQueue in
         | memory, but no matter how much I fiddled with HostedServices
         | and BackgroundWorkers and whatever the MS documentation
         | recommended, the ASP.NET Core app would occasionally 'lose'
         | that queue before it could be consumed (or the consuming loop
         | would get stuck, with the same result).
         | You are also probably running your app on a pretty high-level
         | language, with bytecode and reflection and all that nice stuff
         | - if not even an interpreted language - while Redis is raw C
         | code and will outperform your homebrew double-linked list or
         | hash set.
       | anton96 wrote:
       | Very interesting.
       | This is leading me to think, using redis as the sole database is
       | very tempting but the Ram requirement is making me think twice.
       | Wouldn't there be a database like redis that only stores the
       | latest data into memory and keeps the rest in an AOF file ?
         | remote-dev wrote:
         | Not to make this an ad, but you can actually do better with
         | Redis Enterprise using Redis on flash (part of the flexible and
         | annual plans). It stores hot data in RAM and "warm" data in
         | flash. Here is a good 68s video on the subject:
         | https://www.youtube.com/watch?v=hFQnhPstqLM
       | googletron wrote:
       | My personal nightmare happened and accidentally published a rough
       | draft! It has since been updated! Apologies!
       | googletron wrote:
       | I wrote a little post on how Redis works and its various setups!
       | How does everyone setup Redis? Elasticache is a good answer too
       | :P
         | tpmx wrote:
         | [Potentially inaccurate content removed by author]
           | lawrencevillain wrote:
           | The saltiness isn't a good look here. Especially seeing as
           | he's not the poster.
           | It's the HN algorithm which is probably due to the fact that
           | other posts from his domain have done relatively well, plus
           | the actual poster here has quite a bit of karma.
             | tpmx wrote:
             | I really don't like how you're straight-up presuming that
             | the person I'm responding to here identifies as male.
           | xnorswap wrote:
           | It's a new article so it's relatively easy to explain:
           | HN automatically combines submissions so that subsequent
           | submissions count as upvotes for the first submission.
           | If a popular source posts a new article, users will "rush" to
           | post it to HN to reap that sweet karma and the winner will
           | "catch" the upvotes of the others.
             | tpmx wrote:
             | That could explain it. Thanks!
             | Is this behavior documented anywhere on
             | news.ycombinator.com?
               | mindcrime wrote:
               | There isn't a ton of documentation per-se about HN
               | behavior. There is:
               | https://news.ycombinator.com/newsguidelines.html
               | and
               | https://news.ycombinator.com/newsfaq.html
               | and a handful of posts by dang, sama, pg, etc. over the
               | course of the years. most of the rest is what long-time
               | users have just figured out through observation. There's
               | a Git repo[1] out there that aggregates a lot of that
               | stuff, but keep in mind that it's technically unofficial.
               | That said, I think most of what's there is widely
               | considered to be correct.
               | [1]: https://github.com/minimaxir/hacker-news-
               | undocumented
               | tpmx wrote:
               | Thanks, that's a good summary of what I've seen
               | referenced throughout my years here.
               | I can't find any reference to something like "combine the
               | scores of new submissions of the same URL to the first
               | submission's score" though. I guess that's either new
               | information or incorrect.
               | manigandham wrote:
               | You can try this yourself. Go to the 'new' page and
               | submit an existing URL. You'll be redirected to the
               | existing post which will now have at least one more vote.
               | mindcrime wrote:
               | _I can 't find any reference to something like "combine
               | the scores of new submissions of the same URL to the
               | first submission's score" though. I guess that's either
               | new information or incorrect._
               | I think that falls into the "noticed through observation"
               | bucket. I'm relatively sure that it is correct, as I've
               | noticed that behavior myself. But, I have no official
               | standing here and I could be totally wrong. But that sure
               | seems to be what happens in my experience.
               | tpmx wrote:
               | So you may have misunderstood your observations, just
               | like I maybe did.
               | mindcrime wrote:
               | That's absolutely possible. This particular pattern has
               | seemed pretty consistent over the years, but unless
               | somebody from the HN admin crew chimes in, I guess we'll
               | never be 100% sure.
         | secondcoming wrote:
         | We use both MemoryStore and normal instances. The latter for a
         | use case where the data is shardable and so we run a redis
         | process on each core and the client picks the right one. It
         | saves a lot of money over using MemoryStore.
         | It also saves you from Google performing maintenance on the
         | machine and deleting all your Lua scripts.
         | KeyDB is becoming increasingly popular though.
         | The biggest problem with Redis, at least in C++ land, is the
         | client libraries. hiredis doesn't support Redis Cluster, and
         | other 3rd party clients that do are of unknown quality.
         | bcjordan wrote:
         | I've been using UpStash's serverless Redis offering and it's
         | worked super well for my needs. Scales to zero/free which was
         | nice for getting started, and using their http SDK didn't need
         | to worry about concurrent connection limits when calling from
         | simultaneous cloud functions. & not a second of downtime in the
         | few months I've used it so far.
         | Want to move more of my app's datastore to Redis now that I've
         | learned more about sorted sets etc.
           | tpmx wrote:
           | Awesome ad! High-five! [Borat moment]
       | theden wrote:
       | This is great, the visual explanations work really well
       | One thing that threw me off is that it says for an SSD a random
       | read is 150ms, but 1MB sequential read is 1ms? Shouldn't
       | sequential reads be faster, or are two different read sizes being
       | compared or something? If so, the ambiguity may confuse some
       | people to think random reads are faster
         | sharikous wrote:
         | My interpretation is that 150us is the minimum latency no
         | matter what for any size, since the seek time is provided for
         | comparison for HDs
         | darkcha0s wrote:
         | Well I'm guessing that it's referring to the fact that 1MB
         | sequential is essentially a bunch of random reads?
         | AFAIK, on SSD's there is no concept/guarantee that blocks are
         | adjacent, so a sequential read is just a bunch of random reads.
           | jasonwatkinspdx wrote:
           | The way the Flash Translation Layer works is complicated, but
           | long story short, there's still an advantage to sequential
           | reads and writes on SSDs. The difference in latency and
           | throughput isn't as dramatic as with spinning disks, but is
           | still there. Random vs sequential writes have big
           | implications for the long term health and performance of the
           | SSD.
       | xnorswap wrote:
       | > Send 1KB over a 1GBps network
       | This is said to have a 10ms latency in the chart. But I'm fairly
       | sure that is a calculation of bandwidth based on 1KB / 1GBps
       | 10ms is about 3Km, so at most a 1.5Km round-trip.
       | For a chart labelled latency, I'm surprised to see bandwidth
       | calculations included. Any network hop would actually have far
       | greater latency, if nothing else because communication typically
       | involves more than a single round-trip for acknowledgement, etc.
       | It might be worth making it clear some of the numbers are about
       | bandwidth not latency.
         | foota wrote:
         | Distance is of course a factor, but at fixed distance size
         | matters a lot, and most applications are at more or less a
         | fixed latency.
         | googletron wrote:
         | Fair point! Will update! I think the focus was on pure line
         | latency. Check out the more detailed post here.
         | https://gist.github.com/jboner/2841832
       | omarshammas wrote:
       | Very informative and love the illustrations.
       | I'm building a new website and am using sidekiq for background
       | job processing which relies on redis behind the scenes to store
       | all the job data. I configured a high availability redis instance
       | with `maxmemory-policy noeviction` to ensure no data is lost.
       | The website is still in its infancy so not thinking about scale
       | for the next little while but curious if you have any tips or
       | gotchas to keep an eye out for. Thanks!
       | ptbg wrote:
       | I love this style of post. You cover a wide range of topics in an
       | easy to understand way. Keep up the great work!
       (page generated 2022-08-11 23:00 UTC)