[HN Gopher] Redis explained ___________________________________________________________________ Redis explained Author : mgrouchy Score : 319 points Date : 2022-08-11 15:06 UTC (7 hours ago) (HTM) web link (architecturenotes.co) (TXT) w3m dump (architecturenotes.co) | jrm4 wrote: | As someone who doesn't code for a living but teaches it to mostly | novices, this helps (because before this I had no clue what it | was except that it had something to do with databases.) Typically | for my courses we just use some flavor of SQL and call it a day | (and that kind of spoils us because of how declarative it tends | to be) -- roughly, what's the "explain like I'm 10" use case for | Redis over something else? From what I'm seeing, it's mostly an | "efficiency" thing? | tmpz22 wrote: | The traditional line of thinking is: | | * you're building a web-ish application and need to store | session data | | * you don't want to go through the overhead of building a | strongly typed relational table | | * you know minimal operations stuff | | * just use redis, its easy to deploy, easy to code for, and | available on all major cloud platforms as a managed service | | --- | | The problem is there are tradeoffs and session storage becomes | a fundamental architectural decision once your application | matures. So something you added as a once-off so you can get | back to feature development is now a foundational pillar. | avmich wrote: | Relational databases are optimized for typical operations over | data structured in tables. So, joins and records. However | sometime you want something simpler - like LIFO queue - and | wouldn't mind to have is faster. Redis allows to have this - | the variety of data structures it has is much bigger than with | relational databases. They (Redis and RDBs) both have their | uses, of course. Ideally you would structure your system to use | one of them where appropriate according to data requests. | cerved wrote: | What data structures? | detaro wrote: | https://redis.io/docs/data-types/ | TheBlight wrote: | If you're not super concerned about reliability but really need | speed. That's when Redis really makes the most sense, IMO. | notyourday wrote: | Blazingly fast serialized access to the shared data structures | over the network by multiple writers and readers. | hobs wrote: | I have worked at places where every page load hits the | database, and we've scaled ok, mainly because it was b2b stuff. | | However a simple redis instance in front of the database | serving as a readable cache changes the rules of the game | significantly - depending on the complexity of your calculation | and your end result subsequent "page loads" or whatever you are | doing can be tens of thousands (or more) times as efficient, | and if you decide to use an expensive database or a cloud | database this can help you a lot. | | Eventually the hard part is you might have bugs in | synchronizing the state of redis and your database, look to | existing implementations for your stack instead of reinventing | the wheel. | lawrencevillain wrote: | Really love this style of writing. Pairing the | diagrams/illustrations with the easy to grok copy is really | helpful for folks like myself who have been mainly focused on the | front-end. | | What tool do you use for your diagramming, is it all hand-drawn? | dsmmcken wrote: | Font for the handwriting is Skippy Sharp, incase anyone else | was wondering. | googletron wrote: | Its hand drawn with some fonts for the titles. | stjohnswarts wrote: | This is what I do for my presentations (on a wacom of | course). I have gotten grief over it, but I work faster and | get less distracted by the eccentricacies of powerpoint and | figma. My handwriting is abysmal so I will do that in a | "handwriting font" to sort of look hand drawn. Even if I have | to convert them later for some big wig, at least I have my | "rough draft". Plus it all feels a little more human. | dsmmcken wrote: | Beautifully presented, worth reading just for the illustrations. | topspin wrote: | I am thinking of using Redis as a lightweight queuing mechanism. | An event source will MULTI a small amount of metadata as a hash | and append a list. Event sinks will BLPOP the list and retrieve | and delete the metadata key. One requirement is the events | survive power loss. | | Is there anything inherently wrong with this? Gotchas? A mockup | I've done works great so far. | CSDude wrote: | There used to be disque by antirez, which died. | https://github.com/antirez/disque | topspin wrote: | See my other reply where the disque author talks about | exactly that. | renonce wrote: | In case the event sink crashes or the connection to Redis is | lost, you could lose events. Redis Streams are better designed | for use cases where more reliable delivery is needed and have a | ton more features, though it comes with more complexity. | topspin wrote: | I hadn't looked at streams yet. Thanks. | | This page from AWS about Redis streams goes exactly to my use | case: Redis Streams and Message Queues: | https://aws.amazon.com/redis/Redis_Streams_MQ/ | zo1 wrote: | RabbitMQ. It's so cheap and easy to startup a super performant | queuing broker with docker these days. And the libraries are | all there, async ready and with established patterns. Closest | to zero code you can get for this. You'll likely end up | reimplementing all those patterns and support around them using | redis. | | If you want something quick and easy and dirty, go with Redis. | But switch to Rabbit when you start having to write a lot of | handling and other code. | btilly wrote: | I think a few more concrete use cases would help. | | First, a key limitation that every architect should pay | attention. Redis reaches the limits of what you can do in well- | written single-threaded C. One of those limits is that you | really, _really_ , *really* don't want to go outside of RAM. | Think about what is stored, and be sure not to waste space. (It | is surprisingly easy to leak memory.) | | Second, another use case. Replication in Redis is cheap. If your | data is small and latency is a concern (eg happened to me with an | adserver), then you can locate read-only Redis replicas | everywhere. The speed of querying off of your local machine is | not to be underestimated. | | And third, it is worth spending time mastering Redis data | structures. For example suppose you have a dynamic leaderboard | for an active game. A Redis sorted set will happily let you | instantly display any page of that leaderboard, live, with 10 | million players and tens of thousands of updates per second. | There are a lot of features like that which will be just perfect | for the right scenario. | koolba wrote: | > One of those limits is that you really, really, _really_ don | 't want to go outside of RAM. Think about what is stored, and | be sure not to waste space. (It is surprisingly easy to leak | memory.) | | You can have massive amounts of RAM these days. You're sooner | to hit big-O limits from bad architectural decisions than run | out of memory. If you do get to that point you likely have | enough value in your usage to justify scaling out further and | sharding. | | > And third, it is worth spending time mastering Redis data | structures. | | Bingo. The true secret to properly using Redis: understanding | the big-O complexity of each operation (...and ensuring that | none of your interactions are more than logarithmic). | btilly wrote: | _You can have massive amounts of RAM these days. You're | sooner to hit big-O limits from bad architectural decisions | than run out of memory. If you do get to that point you | likely have enough value in your usage to justify scaling out | further and sharding._ | | Absolute disagreement. | | It is very easily to accidentally leak a few hundred MB per | week in a busy Redis system. The code will look and work | fine...at first. It is correspondingly hard to track down and | clean up the leak a few months later. (Particularly if there | are multiple such to track down.) Yes, you can go for years | just buying larger and larger EC2 instances. But that will | also come with a shocking price tag. | | I know of a number of organizations that this happened to. | And pretty much every bad Redis story I hear about had this | as a root cause. That is why I brought it up as an important | consideration. | jasonwatkinspdx wrote: | Yes, this matches my experience. | | Redis excels as a memcached alternative with some useful | operations. Where people get into trouble with redis is | treating it as a persistent data store, when despite it's | ability to replicate and persist, redis has some | constraints you need to work within. At best think of redis | as something that can hold a materialized view, but where | it can become corrupted at any random time, so you'll need | the ability to rematerialized it from something else. And | second, you absolutely have to be conscious of how close | you are to ram limits. | renonce wrote: | Redis is production-ready and it has a lot of features to | help you track down problems with either memory or CPU | usage. For example: `redis-cli --bigkeys` will help you | find the very large keys. For smaller keys that occur too | often, sampling a few hundred keys should be sufficient to | help you find what type of keys are taking more space than | necessary. | | Once you get the Redis database designed well, there is a | lot of things you can do before hitting the limit where you | can't install any more RAMs onto a new machine. For | example, there are no more than a billion .com domains out | there. Say a single record takes 100 bytes on average, | consisting of the domain name and a glue record pointing to | the IP of its authoritative DNS server. Then it takes just | 100GB of memory to store enough information to handle all | queries to .com domains in the world. It's not so hard to | obtain a machine with 768GB memory these days, and 2TB | machines are not uncommon. | | And if you worry about the price tag - don't use EC2. You | can rent a 1TB RAM dedicated server at | https://www.hetzner.com/dedicated- | rootserver/ax161/configura... for $600 per month. At | Scaleway you can rent it for $1000 per month: https://www.s | caleway.com/en/pricing/?tags=baremetal,availabl.... AWS is | notoriously hard to be made cost effective. | jbboehr wrote: | You can also "leak" rows in a traditional RDBMS or even a | filesystem. Why is this particular notable for Redis? | btilly wrote: | A traditional RDBMs or filesystem is designed for high | throughput and concurrency, even if some tasks are | blocked on data. Additionally both have options to | partition steadily growing things. If needed with old | partitions being moved to tape backup while the server | continues running. | | Redis is a single threaded program acting against RAM | whose philosophy is that it does things fast then moves | to the next job. If it needs to access memory that got | paged to disk, the whole server stops and waits to get | it. Nobody can do anything. | | Because Redis doesn't have to deal with locking and | concurrency, it can run much faster on the same | resources. But when concurrency is required, it is stuck | because it doesn't have it. | itake wrote: | > You can have massive amounts of RAM these days. | | True, but I am finding that balancing CPU and RAM can be | tricky. Slapping 128GB on a 1-core machine means you quickly | have CPU limitations. | tomnipotent wrote: | Redis is single-threaded and will have no problem | saturating a 10G NIC with a single socket. | itake wrote: | My concern is how fast it takes a CPU to scan through all | of that memory. | googletron wrote: | > understanding the big-O complexity of each operation | (...and ensuring that none of your interactions are more than | logarithmic). | | This is a good idea, maybe a prompt for another post. | LewisVerstappen wrote: | > Replication in Redis is cheap. If your data is small and | latency is a concern (eg happened to me with an adserver), then | you can locate read-only Redis replicas everywhere. The speed | of querying off of your local machine is not to be | underestimated. | | Do you face any consistency issues with doing this? | btilly wrote: | No. Replication time was measured in hundredths of a second, | and Redis operations are atomic. So all queries got a | consistent view of the data, and the lag to update was very | reasonable. | anonymousDan wrote: | So in other words, potentially yes since there is some lag | :)? | groffee wrote: | It's a good article, but a couple of hopefully constructive | points | | 1, .toc-wrap covers the image on desktop | | 2, the image is way too busy, there's too much going on | its_bbq wrote: | I've been looking into tech stacks to make a collaborative editor | and Redis CRDTs come up a lot. IIUC this requires a Redis db | running in each users machine and they connect P2P with each | other. Do I understand right? Anyone have good resources for | this? I've also seen Riak come up as an alternative. Do they work | similarly? | dtertman wrote: | At desktop resolution, the floating table of contents menu blocks | out two of the (excellent) illustrations (second and second- | last). Deleting aside.toc was very helpful. | dsmmcken wrote: | Yes, I would suggest increasing z-index on images so they pass | above the toc. Adding a large dropshadow to the images the same | color as the background would make it look like it fades out as | it passes by. That's what I did for our blog that has a similar | floating TOC + images that escape the text width. | xnorswap wrote: | I'm not too familiar with redis and this may well help, so thank | you. | | I see some data-types on the right. It surprises me that redis | doesn't have a numeric data type. I understand that at its heart | it is just a key-value store and doesn't ever need to do range- | based lookup but it still surprises me. | | One consequence of "everything is a string" I've run into | (although probably a sign I'm "doing it wrong"), is serialisation | overhead in the client. | | If redis is expecting strings then it's left to the client to | choose an appropriate serialisation which can have either | performance or other pitfalls. | morelisp wrote: | How would a native number type avoid some serialization | overhead that using e.g. 4 byte BE keys yourself must pay? | voxic11 wrote: | Numbers in redis can be natively represented using BITFIELDS. | | > BITFIELD player:1:stats SET u32 #0 1000 | | 1) (integer) 0 | | > BITFIELD player:1:stats INCRBY u32 #0 -900 | | 1) (integer) 100 | | > BITFIELD player:1:stats GET u32 #0 | | 1) (integer) 100 | xnorswap wrote: | OK, that's helpful thank you. | | That said, all the keys themselves are still strings and | therefore you can't have a SET of numbers or bitfields. | [deleted] | witnesser wrote: | The only person stand out to witness a use case is a adserverer, | I read the 1st 100lines of comments. It is like california | highway system particularly when I witnessed, the billboard is | very outstanding. The jams an pits, people are very nice to them. | witnesser wrote: | The above is just random comment. So I have a long time | question, how cache miss is handled. | rfrey wrote: | A question so noob I'm almost shy to ask it: | | The simplest scenario in the article is a single Redis instance | residing on the same machine as the application. What's the | benefit to this versus just storing data directly within the | application? | louissm_it wrote: | Storing the data directly inside the application still means | you need to store it somewhere, likely a SQL database (such as | PostgreSQL). These databases are insanely well engineered and | very very fast, but compared to a key value store such as Redis | and Memchached they are comparatively slow and resource hungry | (because they are optimized for different things). | | So if you can fetch some cached data from a Redis key, even if | on the same machine, it will cost you significantly less than | querying a relational database. | halukakin wrote: | Not all applications can store data out of the box. For | instance some ways of PHP have embedded caches, some others | don't have cache by default and you would need to install cache | software (for instance apcu). Also, redis has many different | types of data. For instance coding something similar to its | "hash" data type is not trivial. | ok123456 wrote: | Short lived processes/workers. | radicalriddler wrote: | Redis persists on disk (well, it's optional), if you restart | your server I'd assume that it'd be able to restore the disk | data into memory, versus your applications memory, which would | just be lost. | | I'm not a Redis user, but that's based on what I've read | piaste wrote: | Your application and runtime are probably tuned to act as | servers, with short-lived requests and little or no persistent | stage, and they may not play well with keeping a bunch of | persistent data around forever. | | I personally first reached for Redis when I needed to | asynchronously process a bunch of JSON uploaded by clients via | POST. I initially just stuck them in a ConcurrentQueue in | memory, but no matter how much I fiddled with HostedServices | and BackgroundWorkers and whatever the MS documentation | recommended, the ASP.NET Core app would occasionally 'lose' | that queue before it could be consumed (or the consuming loop | would get stuck, with the same result). | | You are also probably running your app on a pretty high-level | language, with bytecode and reflection and all that nice stuff | - if not even an interpreted language - while Redis is raw C | code and will outperform your homebrew double-linked list or | hash set. | anton96 wrote: | Very interesting. | | This is leading me to think, using redis as the sole database is | very tempting but the Ram requirement is making me think twice. | | Wouldn't there be a database like redis that only stores the | latest data into memory and keeps the rest in an AOF file ? | remote-dev wrote: | Not to make this an ad, but you can actually do better with | Redis Enterprise using Redis on flash (part of the flexible and | annual plans). It stores hot data in RAM and "warm" data in | flash. Here is a good 68s video on the subject: | https://www.youtube.com/watch?v=hFQnhPstqLM | googletron wrote: | My personal nightmare happened and accidentally published a rough | draft! It has since been updated! Apologies! | googletron wrote: | I wrote a little post on how Redis works and its various setups! | How does everyone setup Redis? Elasticache is a good answer too | :P | tpmx wrote: | [Potentially inaccurate content removed by author] | lawrencevillain wrote: | The saltiness isn't a good look here. Especially seeing as | he's not the poster. | | It's the HN algorithm which is probably due to the fact that | other posts from his domain have done relatively well, plus | the actual poster here has quite a bit of karma. | tpmx wrote: | I really don't like how you're straight-up presuming that | the person I'm responding to here identifies as male. | xnorswap wrote: | It's a new article so it's relatively easy to explain: | | HN automatically combines submissions so that subsequent | submissions count as upvotes for the first submission. | | If a popular source posts a new article, users will "rush" to | post it to HN to reap that sweet karma and the winner will | "catch" the upvotes of the others. | tpmx wrote: | That could explain it. Thanks! | | Is this behavior documented anywhere on | news.ycombinator.com? | mindcrime wrote: | There isn't a ton of documentation per-se about HN | behavior. There is: | | https://news.ycombinator.com/newsguidelines.html | | and | | https://news.ycombinator.com/newsfaq.html | | and a handful of posts by dang, sama, pg, etc. over the | course of the years. most of the rest is what long-time | users have just figured out through observation. There's | a Git repo[1] out there that aggregates a lot of that | stuff, but keep in mind that it's technically unofficial. | That said, I think most of what's there is widely | considered to be correct. | | [1]: https://github.com/minimaxir/hacker-news- | undocumented | tpmx wrote: | Thanks, that's a good summary of what I've seen | referenced throughout my years here. | | I can't find any reference to something like "combine the | scores of new submissions of the same URL to the first | submission's score" though. I guess that's either new | information or incorrect. | manigandham wrote: | You can try this yourself. Go to the 'new' page and | submit an existing URL. You'll be redirected to the | existing post which will now have at least one more vote. | mindcrime wrote: | _I can 't find any reference to something like "combine | the scores of new submissions of the same URL to the | first submission's score" though. I guess that's either | new information or incorrect._ | | I think that falls into the "noticed through observation" | bucket. I'm relatively sure that it is correct, as I've | noticed that behavior myself. But, I have no official | standing here and I could be totally wrong. But that sure | seems to be what happens in my experience. | tpmx wrote: | So you may have misunderstood your observations, just | like I maybe did. | mindcrime wrote: | That's absolutely possible. This particular pattern has | seemed pretty consistent over the years, but unless | somebody from the HN admin crew chimes in, I guess we'll | never be 100% sure. | secondcoming wrote: | We use both MemoryStore and normal instances. The latter for a | use case where the data is shardable and so we run a redis | process on each core and the client picks the right one. It | saves a lot of money over using MemoryStore. | | It also saves you from Google performing maintenance on the | machine and deleting all your Lua scripts. | | KeyDB is becoming increasingly popular though. | | The biggest problem with Redis, at least in C++ land, is the | client libraries. hiredis doesn't support Redis Cluster, and | other 3rd party clients that do are of unknown quality. | bcjordan wrote: | I've been using UpStash's serverless Redis offering and it's | worked super well for my needs. Scales to zero/free which was | nice for getting started, and using their http SDK didn't need | to worry about concurrent connection limits when calling from | simultaneous cloud functions. & not a second of downtime in the | few months I've used it so far. | | Want to move more of my app's datastore to Redis now that I've | learned more about sorted sets etc. | tpmx wrote: | Awesome ad! High-five! [Borat moment] | theden wrote: | This is great, the visual explanations work really well | | One thing that threw me off is that it says for an SSD a random | read is 150ms, but 1MB sequential read is 1ms? Shouldn't | sequential reads be faster, or are two different read sizes being | compared or something? If so, the ambiguity may confuse some | people to think random reads are faster | sharikous wrote: | My interpretation is that 150us is the minimum latency no | matter what for any size, since the seek time is provided for | comparison for HDs | darkcha0s wrote: | Well I'm guessing that it's referring to the fact that 1MB | sequential is essentially a bunch of random reads? | | AFAIK, on SSD's there is no concept/guarantee that blocks are | adjacent, so a sequential read is just a bunch of random reads. | jasonwatkinspdx wrote: | The way the Flash Translation Layer works is complicated, but | long story short, there's still an advantage to sequential | reads and writes on SSDs. The difference in latency and | throughput isn't as dramatic as with spinning disks, but is | still there. Random vs sequential writes have big | implications for the long term health and performance of the | SSD. | xnorswap wrote: | > Send 1KB over a 1GBps network | | This is said to have a 10ms latency in the chart. But I'm fairly | sure that is a calculation of bandwidth based on 1KB / 1GBps | | 10ms is about 3Km, so at most a 1.5Km round-trip. | | For a chart labelled latency, I'm surprised to see bandwidth | calculations included. Any network hop would actually have far | greater latency, if nothing else because communication typically | involves more than a single round-trip for acknowledgement, etc. | | It might be worth making it clear some of the numbers are about | bandwidth not latency. | foota wrote: | Distance is of course a factor, but at fixed distance size | matters a lot, and most applications are at more or less a | fixed latency. | googletron wrote: | Fair point! Will update! I think the focus was on pure line | latency. Check out the more detailed post here. | https://gist.github.com/jboner/2841832 | omarshammas wrote: | Very informative and love the illustrations. | | I'm building a new website and am using sidekiq for background | job processing which relies on redis behind the scenes to store | all the job data. I configured a high availability redis instance | with `maxmemory-policy noeviction` to ensure no data is lost. | | The website is still in its infancy so not thinking about scale | for the next little while but curious if you have any tips or | gotchas to keep an eye out for. Thanks! | ptbg wrote: | I love this style of post. You cover a wide range of topics in an | easy to understand way. Keep up the great work! ___________________________________________________________________ (page generated 2022-08-11 23:00 UTC)