[HN Gopher] In Defense of Linked Lists ___________________________________________________________________ In Defense of Linked Lists Author : grep_it Score : 150 points Date : 2022-11-04 20:46 UTC (2 hours ago) (HTM) web link (antirez.com) (TXT) w3m dump (antirez.com) | c-smile wrote: | Why do we need to defend a hammer against a sickle? | | Each tool has its own value it was designed for ... | | Try to imagine LISP without lists ... | bsder wrote: | > Try to imagine LISP without lists ... | | It's called Clojure. | | Getting rid of CONS (effectively removing singly linked lists) | is a huge benefit. | klysm wrote: | A list can have different memory representations with identical | semantics. | js2 wrote: | > Linked lists are conceptual. A node pointing to itself is the | most self centered thing I can imagine in computing: an ideal | representation of the more vulgar infinite loop. A node pointing | to NULL is a metaphor of loneliness. A linked list with tail and | head connected, a powerful symbol of a closed cycle. | | Tongue-in-cheek, but this really made me smile. :-) | tunesmith wrote: | How _do_ you detect a cycle with linked lists? :) I actually have | a side project where I 've run into this - it's a distributed | directed graph structure where the nodes can send messages to | each other. It's intended to be acyclic, but "walking the graph" | is really only used to send truth values, which are combined with | others to run Boolean logic. So in that case, the occasional | cycle doesn't matter, because if the calculated boolean value | matches that of the node's internal state, it just can cease | propagating. So a structural loop won't turn into an infinite | processing loop. | | The problem then becomes that I can't introduce NOT gates | anywhere in a cycle, because then the bit will continue flipping | and I'll get an infinite processing loop. | | So it seems my only hope is external processes that continually | walk the graph and keep track of where it visited to try and | detect loops, and I don't like how that scales... | Smaug123 wrote: | One of the standard texts on the matter: | https://aphyr.com/posts/341-hexing-the-technical-interview | dragontamer wrote: | With "just" a linked list, you need a turtle (advance one-node | at a time), and a hare (advance two-nodes at a time). | | If the turtle and hare ever meet, you have a cycle. Otherwise, | if the hare reaches the end of the list, you don't have a | cycle. | timjver wrote: | Just in case performance matters, there is a more efficient | way: have the tortoise stay in place and advance the hare | only one node at a time, and assign the hare to the tortoise | every time a power of 2 number of steps have been made. This | is known as Brent's algorithm, and it requires fewer | advancements than the original tortoise and hare algorithm by | Floyd. | | Another notable advantage of Brent's algorithm is that it | automatically finds the cycle length, rather than (in Floyd's | case) any multiple of the cycle length. | | https://en.wikipedia.org/wiki/Cycle_detection#Brent's_algori. | .. | Jtsummers wrote: | https://rosettacode.org/wiki/Cycle_detection | | Cycle detection in (currently) 44 languages. Most (all?) | use Brent's algorithm. They're operating on an iterated | function but converting most of these to detect cycles in a | linked list would be straightforward. | klysm wrote: | Does the graph have static structure? I'm not sure if you're | describing something like an actor system. | tunesmith wrote: | Akka cluster sharding :) Shape of the graph can change as | processes are being run. | mdavidn wrote: | It's not difficult. Keep a pointer to a recently-visited node. | After visiting N nodes, update the pointer to the current node, | but also double N. If the pointer ever matches the next node, | the list has a cycle. The doubling of N ensures that even large | cycles will eventually be detected. | bewaretheirs wrote: | Set two pointers to the head of the list. | | Step through the list by one element per iteration with one of | them, and with the other, step every other iteration. | | If the two pointers are ever again equal you've found a cycle. | | If you hit end of list, no cycle. | quickthrower2 wrote: | Can you just store the first reference as a local variable and | then compare each time you move to a new node? | | Or to save the comparisons make the thing the first node points | to have a stop flag of some sort. | mdavidn wrote: | That approach works only when the first node is included in | the cycle. Other cycles are possible. | Jtsummers wrote: | Consider this kind of cycle: Head -> Node1 -> | Node2 -> Node3 ^ | | +-----------------+ | | If it's a circular list then your option would work, but not | all cycles will produce circular lists. | ufo wrote: | There are a bunch of algorithms for detecting cycles. One of | the classics is the tortoise and hare algorithm, which uses | only O(1) space. | https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoi... | [deleted] | scottlamb wrote: | > I thought of writing this blog post full of all the things I | love about linked lists. | | The blog post is short, and I think the author covered about | everything. | | Yeah, linked lists are sometimes appropriate, especially | intrusive linked lists (the author called them "embedded" linked | lists). The hate they get is a backlash against CS programs that | introduce people to them early and don't emphasize enough the | reasons you should usually pick something else. If we have to err | on one side or the other, I'd rather it be telling people not to | use them. | | btw: | | > Redis can be wrong, but both Redis and the Linux kernel can't. | | Yes, they can, and I'd rather avoid argument from authority | altogether. | waynecochran wrote: | Another cool property about linked lists is that you can add an | element to the head of a list without mutating the previous list. | There are not many data structure that you can add an element to | and leave the previous version unchanged. This form of | immutability is a very nice feature that Lisp / functional | programmers enjoy. Because of this, you can concurrently add an | element to the head of a list to create new lists without having | a critical section in your code. | robocat wrote: | > you can concurrently add an element to the head of a list to | create new lists without having a critical section | | The race is on: now you have two lists when before you had one? | rileyphone wrote: | Every element of the list, from its own point of view, is | another list. | jbandela1 wrote: | In my experience, the linked lists that are useful are intrusive | linked lists where the data you care about has a next/previous | embedded pointer as opposed to an just storing an arbitrary | object inside a linked list. | | One example would be if your thread structure has these embedded | pointers, you can easily add/remove a thread to a linked list of | threads waiting for some resource without any allocation just by | pointer manipulation. | pclmulqdq wrote: | This is a very useful pattern when you have small collections | of large objects, where the vector/array implementation starts | to fall flat on its face. Kernels and high-performance servers | use these a lot for state tracking. | | If you combine intrusive linking with a pool allocator, you can | have decent locality too. | dclowd9901 wrote: | I know it's splitting hairs but doesn't that mean the structure | is, indeed, a linked list? A structure can be two things at | once. | antirez wrote: | This is a very popolar use case inside kernels implementations. | colinfinck wrote: | FWIW, I recently implemented a safe Rust crate around the | Windows variant of intrusive linked lists [1] and spoke about | it at EuroRust [2]. | | I did it only for compatibility and would still prefer | vectors whenever I can use them. Nevertheless, I put some | efforts into coming up with a useful and high-performing | implementation. To give an example: My `append` function is | O(1) and not a poor O(N) version that you rightfully | criticized in your tweet. | | [1] https://colinfinck.de/posts/nt-list-windows-linked-lists- | in-... [2] https://www.youtube.com/watch?v=IxhZIyXOIw8 | klysm wrote: | The other benefit of this kind of data structure is the ability | for one object to participate in multiple collections. | Otherwise you would have to do some kind of hash map structure | to point to them. | armchairhacker wrote: | Linked lists are great to know and understand. Linked lists are | not good for performance. In most cases, an array or other data | structure is much more efficient than a linked list. But there | are cases where linked lists' immutability/segmentation or | simplicity to implement make them the better choice. | taeric wrote: | There are, interestingly, also times when fun facets of the | linked lists mutability can lead to benefits. | https://taeric.github.io/DancingLinks.html is my attempt at | exploring Knuth's DLX algorithm. I can't recommend enough that | you look into how he did it. Very fun and exciting use of | mutable doubly linked lists. | taeric wrote: | I suspect the problem is that many people think of stuff like | Java's LinkedList when they think of linked lists. As a standard | list implementation, I think it is easy to see that a doubly | linked list just isn't that strong. Especially with how | cumbersome the List interface makes it to take advantage of the | links. | | That said, conceptually, linked items are everywhere; such that | you can easily find how to list things following the links. You | probably just don't bother keeping that in a "List" structure in | your code. | xxs wrote: | LinkedList is useless in Java. Aside the lock free version of | them, Linked Lists are better avoided in any language (Few | exceptions) | taeric wrote: | I'll forever think of this tweet when I see java's LinkeList. | https://twitter.com/joshbloch/status/583813919019573248 | | Funny, as I constantly have to tell folks to not bother using | it at the office. Everyone always assumes it has to be faster | than arraylist for whatever they happen to be doing this | time. I think the vast majority of the time it flat doesn't | matter, but I also fully expect LinkedList will lose most | speed tests. | xxs wrote: | LinkedList is terrible as the memory overhead is way too | high - if you need something better use an ArrayDeque. | jsight wrote: | I run into people that use LinkedList everywhere in their code | "for efficiency". It always confuses me. Are CS professors | teaching something that leads to this? | | In practice, it seems unusual for it to have any advantages | over just using an ArrayList in typical JVM applications. | taeric wrote: | I blame an over reliance on Big O as an arbiter of what is a | fast algorithm. Linked lists are commonly taught as having | much better insertion speed than arrays. | | This ignores constant factors and cache coherency, of course. | More, it also ignores that most lists that folks will make | have a VERY predictable access pattern. Typically just a one | way scan, if any traversal at all. With very few inserts, and | mostly just appending. | forrestthewoods wrote: | The first problem with Linked Lists is you should almost never | use them. Do they have uses? Of course! However on modern | computers memory access is, roughly, more expensive than CPU | cycles. The Linked List has been demoted to a "special case" data | structure. | | The second problem is Linked Lists are taught too early. They're | historically taught first in Data Structures 101. This results in | new programmers using LinkedLists first when they should be used | last. | readams wrote: | Linked lists are great. But they have the problem that, almost | always, whatever problem you're trying to solve would be better | done with a regular resizable vector. | | This includes problems that they should be great for, like insert | into the middle or the front. | | The reason is that in practice the way computers actually work is | that there is an enormous time penalty for jumping around | randomly in memory, and it's large enough that it's often worth | paying a O(lg n) cost to switch to something that will be | contiguous in memory and allow the CPU to prefetch data. | | There are exceptions when you really should use an actual linked | list, but your default should be a vector and not a linked list. | jstimpfle wrote: | That works if the data that you want to put in the sequence is | copyable (doesn't have "identity"), or if you can arrange so | that it gets created in the right spot from the start and you | can always choose the iteration order to be sequential in | memory. For many more complex structures, that is not the case. | ay wrote: | Could you give an example of a data that is uncopyable ? A | struct witj self-referential pointers ? | Smaug123 wrote: | Nobody has mentioned immutability yet - linked lists are easily | used as one of the simplest immutable persistent data | structures. Your definition of "better" appears to be solely | about keeping the CPU burning, but if CPU _isn 't_ the | bottleneck, I prefer "immutable linked list" over "clone a | vector many times" merely on aesthetic and reason-ability | grounds. | huhtenberg wrote: | > almost always | | As any generalization this one too is, of course, incorrect. | | Outside of the realm of academia, the need to keep data pieces | in several _containers_ at the same time, with no heap | operations on insertion or removal and O(1) removal is very | common, especially in the kernel space and embedded contexts. | The only option that fits the bill - you guessed it - are the | linked lists. | nostrademons wrote: | Note that this (and the article) describes an _intrusive_ | linked list. _Extrusive_ linked lists (like you might see in | a class library or CS 101 project), where the node structure | is heap-allocated separately from the data that it points to, | have very few practical advantages over vectors, which is why | standard libraries are increasingly defaulting to vectors | even when the class is named "list". | forrestthewoods wrote: | As any generalization this one too is, of course, incorrect. | | In the real world of application development when choosing | between Vector and LinkedList the answer is almost always | Vector. There are, of course, certain problems where | LinkedList is always the correct answer. | | However for all problem that have chosen either Vector or | LinkedList as their container of choice a super majority | should choose Vector. | tylerhou wrote: | If you're iterating, sure, use a vector. If you're pulling from | a queue and doing a lot of processing, maybe an RPC -- does the | ~70ns memory latency hit really matter that much? Probably not. | kibwen wrote: | Why impose that latency if you don't need to? It costs me | nothing to reach for a vector instead. | jalino23 wrote: | by vector you mean regular js arrays? | josephg wrote: | Broadly yes, C++'s vector is the equivalent of JS arrays. | | But the javascript array type is much more complicated. It | intelligently changes its internal structure based on how you | use it - which is pretty wild. | bjoli wrote: | While I generally agree with you, I have had multiple occasions | where amortized O(1) was unacceptable, whereas the overhead of | linked lists was ok. | imbnwa wrote: | I've always wondered why GSAP and CreateJS both rely on linked | lists for sequencing animations | im3w1l wrote: | One data structure I recently found myself wanting is a tree- | based "array", with O(log n) access by index / insertion | anywhere / deletion anywhere. | | Kinda weird how rare it seems to be. | Munksgaard wrote: | That sounds like a pretty run-of-the-mill balanced binary | tree? Rust has a BTreeMap: https://doc.rust- | lang.org/std/collections/struct.BTreeMap.ht... | layer8 wrote: | That depends on the number of elements and the size of the | payload data. Both approaches can be combined by using a linked | list of arrays. | marcosdumay wrote: | Yes. | | There is a reason why it's hard to create either a bare | linked list or a bare array on modern language. It's because | both are bad extremes, and the optimum algorithm is almost | always some combination of them. | andirk wrote: | Understanding, and even sympathizing, with the machine's toil | re: vectors vs manual linked list can separate a good engineer | from a great one. We learn linked lists, assembly, and even | machine code in Computer Science majors so that we know what's | happening under the hood and can more easily surmise the | runtime effects. | kibwen wrote: | All of these are indeed important, and will eventually (one | hopes) result in the student realizing that linked lists are | to be avoided because of their poor mechanical sympathy | resulting from their awful cache locality. | | Valid uses for a linked list are extremely niche. Yes, you | can name some, and I can name several valid uses for bloom | filters. Just because a simple linked list is easy to | implement does not mean it should be used, any more than | bubble sort should be used for its simplicity. | throwawaymaths wrote: | Wait why do linked lists have bad cache locality? It | depends on how your heap is set up. For example, you could | have an allocator that gives relatively good locality by | having high granularity and only bailing out if say your LL | gets super long (so if your lists are usually short, they | could have awesome locality) | dmitrygr wrote: | Modern CPUs detect patterns of pointer chasing common in linked | list use and prefetch to cache just fine. Your comment would | have been valid a decade or more ago. | | And _plenty_ of use cases are much better with linked lists | than resizable vectors. Eg: queues. | vbezhenar wrote: | RAM latency is huge. No amount of prefetching can fix that. | If anything, prefetching works better with arrays as RAM | reads multiple bytes at once. | xxs wrote: | Queues are best implemented with a pow2 cyclic array, and you | have a free deque, too. | attractivechaos wrote: | A ring-buffer based queue is mostly better than a linked | list but it is not necessarily the best. The typical STL | deque implementation [1], with all its complexity, is | usually faster than a ring buffer. | | [1] https://stackoverflow.com/questions/6292332/what- | really-is-a... | tylerhou wrote: | I have some experience using/modifying linked-list benchmarks | (https://github.com/google/multichase) specifically to test | memory latency. | | It is extremely difficult, maybe impossible, to design a | prefetcher that can predict the next cacheline(s) to prefetch | while traversing in a linked-list. I am not aware of a single | CPU that can do this consistently. For instance, if you run | multichase (a linked-list chaser) on GCP servers, you | generally get the expected memory latency (~70-100ns, | depending on the platform). | Borealid wrote: | I think there's more to cache locality than prefetching. In | an array, consecutive elements will be usually be adjacent in | physical memory (excepting page boundaries ), so each cache | line load where a cache line is larger than the array element | will "overfetch" into the next element. This should mean | fewer total cache loads and thus more efficient use of memory | bandwidth for situations where it is constrained (such as | walking a list). | howinteresting wrote: | For most use cases a ring buffer either with or without a | maximum length will do much better as a queue than a linked | list will. | | There are some niche use cases where linked lists are good. | Lock-free queues are one, and another big set of use cases is | where you need hard O(1) insert even with a high constant | factor, not amortized O(1). | anonymoushn wrote: | High-performance SPSC queues are always implemented with an | array. | | If you need hard O(1) insert you can usually just allocate | an array much bigger than you will need. The combination of | hard performance requirements and willingness to wait for | an allocator all the time is unusual. | gamegoblin wrote: | The place I see linked lists pop up the mosts are when | implementing LRU caches. You need a linked hashmap of sorts | where the hashmap gives O(1) lookup to the linked list node | in question, and then you can extract that node and move it | to the head of the queue in O(1) as well. | | This is a special case of where they also appear: linked | hashmap implementations, where you want a hashmap that has | a consistent iteration order. | josephg wrote: | I've seen them show up in lots of situations through | intrusive lists, which I think is the general form of the | example you're giving. | | Intrusive lists are used in the Linux kernel, in physics | engines (eg Chipmunk2d) and game engines. The way Redis | uses them for TTL expiry follows the same pattern. | spijdar wrote: | There are a lot of factors involved, but given a limited | number of _cache lines_ that can be stored in low level | cache, I would think there 'd be at least some performance | penalty for prefetching from non-linear blocks of memory vs | the inherent spatial locality in an array/vector, if only | because it'd cause more old cache to be evicted. | colinmhayes wrote: | Queues are generally implemented with with a vector of | pointers to vectors. Nicely diagrammed here | https://stackoverflow.com/questions/6292332/what-really- | is-a... Ring buffers also work better than linked lists | waynesonfire wrote: | is Erlang and JDK general enough for you? Oh, both use | linked lists, | | https://github.com/erlang/otp/blob/master/lib/stdlib/src/qu | e... https://github.com/openjdk- | mirror/jdk7u-jdk/blob/master/src/... | xxs wrote: | why would your refer LinkedBlockingQueue? You have | ArrayBlockingQueue, too. ArrayDeque is just better than | LinkedList. | [deleted] | anonymoushn wrote: | Even if your linked list is backed by an array and all the | elements are right next to each other, computing some cheap | function of the elements of a linked list is incredibly slow | compared to the alternative because of the unnecessary and | very long dependency chain. | | Vectors are basically perfect for queues. | optymizer wrote: | I'm going to call you out on this one, because it's a bold | claim, and I'd love to see an explanation and some perf | numbers. | | For example, I'm wondering how the CPU knows what the next | item in the list to prefetch it. Unlike the next item in an | array, list items could be pointing anywhere in process | memory. | | Also, what counts as a modern CPU in this context? Are we | talking latest generation desktop CPUs from Intel, or | anything after Core 2 Duo? How about mobile devices with ARM | CPUs? Would those just be ignored? | | Is there a benchmark? | dmitrygr wrote: | Feast your eyes on the "Intel(r) 64 and IA-32 Architectures | Optimization Reference Manual": https://cdrdv2-public.intel | .com/671488/248966_software_optim... (newer link, as | requested by one of the child comments) | | It details the prefetchers, how they work, what patterns | they recognize (section 3.7) | NavinF wrote: | Prefetching just one element ahead does fuck-all when ram | is 100x slower than cache especially if you need to | precharge and activate a new row which is always the case | when you're spraying tiny objects all over the place. | insanitybit wrote: | This is a great document that will absolutely confirm, in | a number of ways, that linked lists are going to be | terrible for prefetching. | [deleted] | josephg wrote: | It doesn't claim linked lists are fast. And it doesn't | have any benchmarking data. | | Actually the only reference to linked lists I see is in | the prefetcher section (3.7.3), where it explicitly | recommends that consecutive items are stored in a single | 4kb cache line or in consecutive cache lines. They give a | positive code example using an array, like other | commenters are suggesting. | fuckstick wrote: | It's a link to a document on architectures well over a | decade old - the most recent mentioned is the original | Core architecture (from 2008). Honestly, based on your | first comment I thought you were implying something about | content dependent prefetch or other techniques that I am | familiar in the academic literature but unaware of ever | being used in mainstream hardware. | | > Organize the data so consecutive accesses can usually | be found in the same 4-KByte page. > * Access the data in | constant strides forward or backward IP Prefetcher. 3-72 | | > Method 2: >* Organize the data in consecutive lines. > | * Access the data in increasing addresses, in sequential | cache lines. | | Nothing new here that contradicts the GPs skepticism. | Certainly not enough evidence for you to be a dick. | dmitrygr wrote: | yup, was link to old a document with same title. updated. | thanks. | fuckstick wrote: | Ok, but the hardware data prefetch functionality seems | basically unchanged: | | "Characteristics of the hardware prefetcher are: * It | requires some regularity in the data access patterns. -- | If a data access pattern has constant stride, hardware | prefetching is effective if the access stride is less | than half of the trigger distance of hardware prefetcher. | -- If the access stride is not constant, the automatic | hardware prefetcher can mask memory latency if the | strides of two successive cache misses are less than the | trigger threshold distance (small- stride memory | traffic). -- The automatic hardware prefetcher is most | effective if the strides of two successive cache misses | remain less than the trigger threshold distance and close | to 64 bytes." | forrestthewoods wrote: | My benchmarks say you're wrong. | | https://www.forrestthewoods.com/blog/memory-bandwidth- | napkin... | dmitrygr wrote: | You're joking right? | | "I made an extra test that wraps matrix4x4 in | std::unique_ptr." | | And that is your test? One piece of code? Not even | presenting disassembly? Who the hell knows what your | compiler wrought in response to your "std::unique_ptr" | and "matrix4x4" ? | forrestthewoods wrote: | Well I've provided 1 benchmark to your 0. I'd say I'm | ahead. | | My code benchmark is actually far more pre-fetch friendly | than a LinkedList because it can prefetch more than one | "node" at a time. In a LinkedList you can't prefetch N+1 | and N+2 at the same time because N+2 is dependent on the | result of N+1. | | I'm always open to learning new things. Modern compiler | magic is deep and dark. If sometimes a compiler magically | makes it fast and sometimes slow that'd be quite | interesting! If you have any concrete examples I'd love | to read them. | attractivechaos wrote: | Sorry for my ignorance - linked list matching the | performance of vector is something new to me. I would | like to learn more. The best way to prove your point is | to show us a benchmark. However, I couldn't find one with | a quick google search. | porcoda wrote: | Google for Intel hardware prefetcher and you'll turn up | some info. | | Re: benchmarks. Not likely exactly what you're looking for, | but the GigaUpdates Per Second benchmark | (https://en.wikipedia.org/wiki/Giga-updates_per_second) is | sort of the most-pathological-case of pointer jumping where | the jumps are random. Prefetchers don't do well with that | (for obvious reasons) - they tend to work best when there | is some pattern to the accesses. Linked data structures | often live somewhere in between - they may start with good, | regular stride patterns, but then over time as elements are | added and moved around, it turns into a tangle of | spaghetti. | | Edit: I should add more. While prefetchers do exist in | hardware, they're a bit of a murky subject. Sometimes they | speed things up, sometimes they do so bad they slow | everything down. They can be very workload dependent. And | to top it off, hardware vendors can sometimes be a bit | opaque when explaining what their prefetching hardware | actually does well on. It's probably safest to not assume | your hardware will help you when it comes to tangled | pointer jumping code, and if you can avoid it via a | different data structure it's probably good to do so. That | said, other data structures may entail other tradeoffs - | better cache usage for lookups, but potentially more | expensive insertions. As usual with data structures, its a | game of balancing tradeoffs. | insanitybit wrote: | I don't buy this. | | https://www.youtube.com/watch?v=WDIkqP4JbkE | | Prefetching in CPUs predates this talk. | | https://www.youtube.com/watch?v=Nz9SiF0QVKY | | That talk is 2018. | | Cache prefetching works best for linear accesses like | with a vector, not for random pointer lookups. | Prefetchers are also going to have an extra harder time | with double indirection since they're unable to see which | value is going to be fed to the lea. | | Prefetching is also expensive in and of itself, the CPU | doesn't do it unless you're already incurring cache hits. | This makes cache misses _even worse_. There are also | multiple prefetchers in the CPU and _the L1 prefetcher_ | is very basic and is only going to help with contiguous | accesses. Your allocator might be doing some nice things | for you that let the _one of the_ L2 prefetchers help, I | would expect that you 'll see better L2 performance due | to prefetching. | | If you have an example of a linked list being as fast as | a vector for iteration, please do show it. | waynesonfire wrote: | andrewmcwatters wrote: | Would you like to explain to the class? | [deleted] | klysm wrote: | Care to demonstrate why? I have the same experience | waynesonfire wrote: | the linked list has a wonderful capability that's | frequently seen in C where you can do O(1) removal from a | list. E.g. you can have an "object" detach itself. This is | not possible in Java's standard linked list implementation, | among other languages. | | in general, inserting and removing elements from a vector | requires a memory copying. | | There are programming languages that have linked lists as | their fundamental data structure, like Lisp and Erlang. | | For situations where I don't need random access, linked | lists are hard to beat. | | Linked lists also make wonderful immutable data structures. | | linked lists have wonderful filter and flatmap performance. | how much memory copying would a vector require? | klysm wrote: | Yes, but in practice the coefficients on the O() almost | always end up working in the dense arrays favor because | of memory locality. | suremarc wrote: | This is assuming you already have a pointer of the | element you're removing, or a neighbor thereof. | Otherwise, you'd have to traverse a portion of the linked | list, which has a significant amount of per-element | overhead. | sakras wrote: | The problem with your analysis is that you're using big-O | notation. Big-O notation is based on an abstract machine | that is pretty unrepresentative of real computers. In | practice, the memcpy is always going to be faster (until | you get to really big sizes). Put another way, linked | list is O(1) with a massive constant, and memcpy is O(n) | with a tiny constant. | | And actually this logic extends to other algorithms too. | My rule of thumb is that a linear search (despite being | O(n)) will be faster up to around n=100 than binary | search (which is O(log n)) just due to the way the CPU | operates. | zabzonk wrote: | yes, he does linked lists are, except in a few specific | cases, the wrong solution | [deleted] | koheripbal wrote: | This is not a substantive comment. | dragontamer wrote: | > regular resizable vector | | At gross complexity / fragmentation / to your memory allocator. | | Linked Lists are definitely easier to use if you're ever in a | position where you're writing your own malloc(). The | infrastructure needed to cleanly resize arrays (and also: the | pointers all going bad as you do so) has a lot of faults IMO. | | EDIT: In particular, linked lists have a fixed size, so their | malloc() hand-written implementation is extremely simple. | ay wrote: | Doubling the vector size each time it needs to be enlarged | takes care of fragmentation/memory overhead, in practice. Or | preallocating if you know the rough size. | | It does take some discipline to avoid the storage of | pointers, but once you get used to that it's quite fine. | | Source: I work on VPP, which uses vectors quite extensively. | pfdietz wrote: | Compacting garbage collectors put linked lists into nearby | memory. | josephg wrote: | Even if memory was packed by a GC, linked lists still have to | pay the cost of allocating and garbage collecting every cell | of memory individually. I doubt a GC would solve linked | lists' woes. | | Do you have some benchmarks demonstrating your claim? | throwawaymaths wrote: | Erlang GC is pretty good, plus there are some nice highly | local allocators | josefx wrote: | Will they? I don't think they reorder live objects relative | to each other, so any object allocated between two nodes | would still end up between those nodes after the GC ran. | klntsky wrote: | Not to mention path sharing (also called structural sharing) for | free. | chrisseaton wrote: | I've used linked lists where I wanted allocation of an almost- | write-only log data structure to be very fast - so not resizing a | buffer, and nobody cares about cache locality except for | allocation. In a system with a TLA buffer and bump-allocation, | this is absolutely ideal because allocation is local and very | fast, and does not cause any of the existing log to be brought | into cache. | | "Nobody uses this data structure stuff in real programming at | work! You can forget it after college!" | [deleted] | eatonphil wrote: | Here's another example for the crowd: an "unbounded" fifo queue | in a fixed memory environment. | | https://github.com/tigerbeetledb/tigerbeetle/blob/main/src/f... | bitwize wrote: | Linked lists are one of those things, like textual protocols, | that people reach for because they're easy and fun but, from an | engineering standpoint, you should never, ever use unless you can | provide a STRONG justification for doing so. The locality of | reference characteristics of vectors mean that, except for | extreme cases, they beat linked lists almost every time on modern | cached CPUs. | ignoramous wrote: | > _Linked lists are simple. It is one of those rare data | structures, together with binary trees and hash tables and a few | more, that you can implement just from memory without likely | stepping into big errors._ | | Well, if one likes linked lists enough, they can replace binary | search trees / hash tables with a skip list. Super powerful and | easier to implement than splay trees, red black trees, avl trees, | augmented trees, what-have-you. | ComputerGuru wrote: | One of the only times linked lists are "undoubtedly the right | option" is if you're writing some sort of intrusive data | structure to avoid an allocation by reusing caller stack from | another location. Not sure how many run into that often. | docandrew wrote: | When I need to avoid an allocation it's because I'm writing an | allocator :) Overlaying the list nodes on top of a block of | free memory avoids the need for a secondary data structure to | keep track of free areas. | | The other time allocations are undesirable, and therefore | linked lists are widely-used, is bare metal work when other | allocators just aren't available. You can do tricks like | representing pools of free and in-use objects with two lists | where the nodes occupy the same memory block. Allocations are a | fast O(1) unlink from the free list and O(1) re-link to the | used list, and frees are the opposite. | photochemsyn wrote: | For educational purposes (i.e. hooking easily into graphics | packages for visualization) you can make a linked list in Python, | using elements like: | | class Node: def __init__(self, dataval=None): | self.dataval = dataval self.nodepointer = None | | The only reason to do this is for education/visualization of data | structures, although I'm wondering if this can be engineered to | cause a Python memory leak, via circularization of the linked | list, or if Python's GC would catch it. Also educational perhaps. | | Where linked lists really seem to come into play is with custom | manual memory allocators in embedded programming, something of a | niche subject. | lanstin wrote: | I just used linked lists to make an LRU hard limit on the number | of items in an in memory store. Each usage, the pointers are | updated to move the item to the head. Each time the list is at | the limit, the tail is removed and freed. I may take advantage of | the machinery to make it go onto free list. This was Go and the | DLL thing is my first use of generics. | layer8 wrote: | I think I first learned about linked lists in the context of | programming for AmigaOS, which has a standardized form of | (intrusive) linked lists that is used throughout the system. I | remember being fascinated by the header node which serves as both | the head and the tail of a list due to a "clever" field layout: | https://wiki.amigaos.net/wiki/Exec_Lists_and_Queues | thedracle wrote: | There was a time when memory was conventionally expensive, memory | allocation of large blocks was slow, and small blocks was much | faster (malloc would find a small block faster than a large one | on a highly fragmented heap). | | Before having gigantic caches that would engulf nearly any sized | contiguous list, linked lists were sort of vogue, frugal, and | thought of as being fairly performant. | stephc_int13 wrote: | The good thing about linked list is how easy they are to | understand and use, almost trivial. | | The bad thing is that their cost nis ot obvious and not | consistent (difficult to predict). | | I tend prefer simple arrays. | | This talk by Mike Acton is a good introduction to understand why | going for the linked list as go-to data structure can lead to | performance issues. | | https://www.youtube.com/watch?v=rX0ItVEVjHc | belter wrote: | "Does anyone use LinkedList? I wrote it, and I never use it." - | https://news.ycombinator.com/item?id=33418705 | [deleted] | btilly wrote: | In scripting languages it is a truism that anything you would | want to do with a linked list is probably better done with an | array. And given the overhead of working in the language versus | things implemented in C, this is generally true. | | However I have a fun use for linked lists where this is NOT true. | | Anyone who has played around with algorithms has encountered | dynamic programming. So you can, for example, find the count of | subsets that sum to a particular number without actually | enumerating them all. But what if instead of counting them, you | wanted to actually find some? | | The answer turns out to be that instead of using dynamic | programming to get a count, you use dynamic programming to build | up a data structure from which you can get the answer. And the | right data structure to use turns out to be...a type of linked | list! There is no faster array equivalent for what you get. | | In other words, with a few extra fields for clarity, here is the | basic structure for subset sum of positive integers: | { current_sum: ..., count_of_solutions: | ..., current_value: ..., | solutions_using_current_value: (link in one dim of linked list), | solutions_using_previous_values: (link on other dim of linked | list), } | | I leave figuring out the dynamic programming code to generate | this as a fun exercise. Likewise how to extract, say, the 500'th | subset with this sum. Both are easy if you understand linked | lists. If not...well...consider it education! | Jtsummers wrote: | Language wars, editor wars, and now data structure wars. What | random technical thing will people go to virtual blows over next? | keepquestioning wrote: | https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most... | TillE wrote: | The Windows kernel also has linked lists as one of its few | general-purpose data structures available to drivers. | | Like any tool, it has its place. As long as you're not traversing | large linked lists, they're probably fine. | dilyevsky wrote: | Traversing a short list but frequently still going to have | terrible cache performance compared to something like vector. | Just use vectors, folks | slaymaker1907 wrote: | Allocating a huge chunk of memory all at once when the array | grows can also cause a bunch of problems. Linked lists are | also much simpler in a multi threaded context. | [deleted] | adeptima wrote: | When people asking my opinion for Rust, I loved to share them the | Linkedin List implementation link: https://doc.rust- | lang.org/src/alloc/collections/linked_list.... | | And the first feedback is why so many unsafe blocks? What is it | Option<NonNull<Node<T>>> ? '_ ? | | Another reason to share, if you can understand Linkedin List, you | are free to code in Rust ;) | dekhn wrote: | I don't know Rust but I would guess that an | Option<NonNull<Node<T>>> would be a Node of type T, that cannot | be set to null, but can be optional. This is a type I would | want to return from a function that could either return | nothing, or a pointer to a real thing. | | As for the unsafety, I would assume the authors of collections | know what they are doing and do that for performance. | proto_lambda wrote: | `NonNull` is actually a wrapper around the pointer type, with | the added invariant that the pointer cannot be null. The | compiler is made aware of this, which allows | `Option<NonNull<T>>` to be just a plain pointer, where the | `None` variant of the option corresponds to a null pointer | and the `Some(ptr)` case corresponds to a non-null pointer. | josephg wrote: | I really wish rust had better syntax for this. Raw C | pointers should probably all be Option<NonNull<T>> rather | than *mut T. | | The latter is easier to type but worse in almost every way. | Ergonomics should guide you to the former, not the latter. | kibwen wrote: | It's an interesting suggestion to impose Option semantics | on raw pointers by default (presumably with some | alternative way to receive a nullable pointer, for | completeness). But in the meantime, in your own codebase, | it's easy enough to do `type Ptr<T> = | Option<NonNull<<T>>`. | andrewflnr wrote: | I think if you tried to do that, you'd basically be | mixing type-driven optimizations with the C FFI, which | sounds sketchy, to me at least. The null-optimization for | Option is just that, an optimization, and I don't like | taking it for granted where safety/security is at stake. | nemothekid wrote: | > _When people asking my opinion for Rust, I loved to share | them the Linkedin List implementation link:_ | | This LinkedList obsession is a bit bizarre to me, and tends to | come from older programmers who come from a time when coding | interviews involved writing linked lists and balancing b-trees. | To me though it also represents the stubbornness of C | programmers who refuse to consider things like growable vectors | a solved problem. My reaction to the LinkedList coders is not | "well Rust needs to maintain ownership", its _why does your | benchmark for how easy a language is involve how easy it is to | fuck around with raw pointers_?. | | LinkedLists are a tool, but to C programmers that are an | invaluable fundamental building block that shows up early in | any C programmers education due to how simple they are to | implement and the wide range of use cases they can be used for. | But they are technically an unsafe data structure and if you | willing to let some of that stubbornness go and finally accept | some guard rails, you have to be able to see that a data | structure like linkedlists will be harder to implement. | | It has nothing to do with the language; implementing with | LinkedLists with any sort of guardrails adds a ton of | complexity, either up front (e.g. borrowchecker) or behind the | scenes (e.g. a garbage collector). When you accept this fact, | it becomes ludicrous to imply that a LinkedList implementation | is a good benchmark for the ergonomics of a language like Rust. | chlorion wrote: | Considering that unsafe rust basically just allows raw pointer | access (see link below) which is already a thing you can do | normally in C, I do not see how that's a very good argument | against it honestly. | | As for the Option type, it is exactly what it says. It's an | optional, non-nullable node generic over type T. I suppose | generics, optionals and the idea of non-nullable types might be | exotic at first, but this is not a problem with Rust as much as | it's a problem with C not being able to express these concepts | in the first place and instead expecting you to spray null | checks all over the place! | | https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html | andrewflnr wrote: | In the Rust ecosystem philosophy, linked lists are the kind of | thing you want to write once, very carefully, then re-use a | lot. That's exactly the kind of code where unsafe can be | reasonable. It's not like it would actually be safer in any | other language that didn't have an explicit unsafe keyword. | More aesthetically pleasing, perhaps, but not safer. | | You're acting like it's some epic burn on Rust that there's | some ugly code in its stdlib. It's not. Stdlibs are like that. | Furthermore, as others have pointed out, linked lists in | particular are a pathological case for Rust, meaning you're | pointing and snickering at a special case of a special case. | Most people never have to write or look at that kind of code. | svnpenn wrote: | this comes off as extremely defensive. I don't feel like the | person you're responding to, intended any kind of "sick | burn". Calm down dude. | SighMagi wrote: | There's so much code marked "unsafe" in there... | estebank wrote: | That's because in order to interact with pointers you need | unsafe, and not using unsafe in Rust requires a single | ownership tree _or_ reference counting. If you 're not keen | on either of those solutions, you shouldn't be writing a | Doubly Linked List in other languages either (because they | will have either pointers or reference counting/GC). | andrewflnr wrote: | "Those solutions" start off being hierarchical ownership or | reference counting, but turn into "pointers or reference | counting". Requiring pointers is implicit to the definition | of linked lists, and calling it out here is silly. And you | don't really need either ref counting (again, pretty | obviously, witness every real world implementation) or | hierarchical ownership outside Rust. | | Conceptually, the list as a vague concept owns all the | nodes. The ownership just stops being expressible directly | through references the way Rust wants it to be. But you | don't have to re-invent hierarchical ownership to implement | DLLs in other languages, because it's not really there in | the problem. You just have to accept that they're always | going to suck under Rust's ownership model. | tracker1 wrote: | There be dragons... | hnov wrote: | Isn't a doubly linked list basically incompatible with Rust's | idea of memory ownership? | proto_lambda wrote: | It's definitely not a data structure that makes the borrow | checker happy. Luckily it's also not a data structure that's | required or desirable in 99% of cases, but the standard | library still offers an implementation for those 1% so people | don't try to write their own (turns out getting a doubly | linked list correct isn't quite as trivial as it may seem). | Waterluvian wrote: | I'm actually a bit surprised it's in the standard library | if it's so uncommonly needed. Isn't Rust's stdlib tiny with | a philosophy of letting the community fill in most things? | estebank wrote: | There were arguments made for its non-inclusion[1]. | | [1]: https://rust-unofficial.github.io/too-many- | lists/sixth.html | Waterluvian wrote: | I enjoyed reading this. A fun style. | zozbot234 wrote: | AIUI, GhostCell and related patterns can be used to implement | doubly linked lists in safe Rust. These patterns are quite | obscure and hard to use in practice, though. | tmtvl wrote: | From what I remember Rust has some kind of RefCell with weak | references. Those could be used to make DLLs, if anyone can | find a reason to use those. That said, you could also use | zippers... | slaymaker1907 wrote: | Now imagine writing the drop implementation for a DLL that | doesn't cause a stack overflow. | steveklabnik wrote: | RefCell<T> and weak references are two different things, | but rust has both, yes. | brundolf wrote: | It's at odds with it, but so are many other data structures | found in the standard library. The solution is usually to use | unsafe { } inside the data structure to allow for those | criss-crossing pointers (while exposing safe, checker- | friendly interfaces to the outside world) | | This is a great resource that outlines several different | strategies for implementing linked lists in Rust: | https://rust-unofficial.github.io/too-many-lists/ | kibwen wrote: | When explaining why the difficulty of modeling linked lists in | Rust doesn't matter, I like to share the following book, "Learn | Rust With Entirely Too Many Linked Lists", written by the woman | who wrote the implementation you've linked above: https://rust- | unofficial.github.io/too-many-lists/ | keepquestioning wrote: | I read this and noped out of learning Rust. | AtNightWeCode wrote: | People must stop give attention to old garbage solutions that | should not be used. Never use a link list if you don't | specifically need a link list. You probably don't know why you | would need one so don't go there. | dragontamer wrote: | The most interesting thing about linked lists to me is how | trivially it can be made lock-free multithreaded with atomics. | | An array/stack can be made multithreaded but it is non-trivial to | handle the edge-cases. In particular, the ABA problem is rather | difficult. I've seen many solutions to it (ex: a synchronization | variable that puts the stack into "push-only" mode and then "pop- | only" mode. There's no ABA-problem if all threads are pushing!) | | However, pushing/popping from a Linked List stack requires no | such synchronization at all. Simply compare-and-swap the head | (and on failure, try again). Its about as simple as you can get | when it comes to atomic / lock free patterns. | [deleted] | yeputons wrote: | > Simply compare-and-swap the head (and on failure, try again). | | It doesn't help with ABA at all, does it? Unless you assume | that nodes are immutable and the same pointer is never reused, | of course. | dragontamer wrote: | I forgot about that other ABA issue. | | Still, that's easily solved: 64-bit version number + 64-bit | pointer (or 32-bit version number + 32-bit pointer) for a | 128-bit (or 64-bit) compare-and-swap. | | All modern CPUs support 128-bit CAS. | | EDIT: The version number is incremented by +1 each time. It | is unlikely that you'd overflow 64-bit version number and | risk an ABA problem, though 32-bits can be overflowed | surprisingly quickly in today's computers. | | -------- | | EDIT2: Note: I'm pretty sure (but not 100% sure) that the | Head of a linked-list stack can be "simply" compared-and- | swapped to remain lock-free and 100% valid (ie: 64-bit | compare and swap over the 64-bit pointer). I did not mean to | imply that you can "CAS any arbitrary node of a linked-list". | A fully generic linked list is possible and I've seen it with | the 128-bit CAS operator, but that wasn't what I was going | for originally. | murderfs wrote: | > EDIT2: Note: I'm pretty sure (but not 100% sure) that the | Head of a linked-list stack can be "simply" compared-and- | swapped to remain lock-free and 100% valid (ie: 64-bit | compare and swap over the 64-bit pointer). | | No, you cannot. The problem is what you're comparing and | swapping into the head during a pop. You want to do the | moral equivalent of `current = list; | list.compare_exchange(current, current->next)`, but | current->next might have changed if someone else popped the | original head, pushed something else, and then pushed the | original head again. | | You need double CAS or LL/SC or a more complicated scheme | to make this work. | dragontamer wrote: | Fair catch. | | That's obscure but it looks like you're correct in this | case. | murderfs wrote: | It might not be immediately obvious that this can happen, | but in practice, this scenario happens very frequently, | because if you free a node and then malloc a node, you're | very likely to get the same address back with most memory | allocators. | [deleted] | ayhanfuat wrote: | For completenes, here's what Bjarne Stroustrup says on why you | should avoid using linked lists: https://youtu.be/YQs6IC-vgmo | elteto wrote: | I much prefer deques to linked lists when I need O(1) push/pop | but still want to keep some semblance of cache locality. Although | you can't splice two deques together as easily as two linked | lists. | | Sadly, you can't really control the block size in C++'s | std::deque so you can't communicate useful information to the | library about your use case. I think MSVC allocates such a small | block that it is effectively a linked list. | woojoo666 wrote: | I feel like to compare linked lists and dequeues, you would | have to give a specific implementation of a dequeue | Jtsummers wrote: | Correct, deques are abstract data types and you can implement | them multiple ways, including using linked lists or growable | arrays/vectors. So they aren't directly comparable to linked | lists since under the hood they can be implemented with | linked lists (or doubly linked lists at least). You could | compare the performance of different deque implementations, | though. | intrepidhero wrote: | How does a dequeue improve the situation with cache locality | versus a linked list? | klyrs wrote: | It depends on the implementation. Python's deque, for | example, is implemented as a two-level data structure; a | circularly linked list of blocks. If you've got page-sized | blocks, then your deque can be sufficiently local for most | cases. It introduces a little extra logic, and you'll want to | keep around a spare block to prevent thrashing memory if you | repeatedly push/pop at the block boundary, but it's often | worth the pain if the data structure is anywhere near a hot | loop. | zwkrt wrote: | Mosty dequeues are implemented under the hood with arrays. A | linked list usually requires a heap allocation for each | element. | | This is speaking in generalities though, since you could have | an "alternatively implemented dequeue or a linked list | entirely on the stack. | blibble wrote: | it's typically implemented as an array (e.g. circular buffer) | layer8 wrote: | Note that a deque in general is an ADT that isn't necessarily | implemented in the way std::deque commonly is. A deque can be | implemented as a linked list. | koinedad wrote: | They are really cool. Deleting an element by assigning next to | next.next still blows my mind. | Jtsummers wrote: | If you think that's neat, check out Dancing Links used for | backtracking algorithms: | https://en.wikipedia.org/wiki/Dancing_Links ___________________________________________________________________ (page generated 2022-11-04 23:00 UTC)