[HN Gopher] How Python List Works ___________________________________________________________________ How Python List Works Author : tosh Score : 61 points Date : 2021-11-14 13:27 UTC (9 hours ago) (HTM) web link (antonz.org) (TXT) w3m dump (antonz.org) | kgm wrote: | One thing to note, in the gritty details of Python's actual | implementation of the list type, is that there's a little more to | say about how list resizing works than is immediately obvious. | The relevant source is here: | https://github.com/python/cpython/blob/main/Objects/listobje... | | As the article points out, the size progression, as detailed in | the comment in that source, is relatively modest. Your "classic" | dynamic array will simply double the size as needed. This | progression does less than that: 16, 24, 32, 40, and so on. | _But:_ The underlying implementation is using the standard C | library function realloc(). This will perform some amount of in- | place resizing on its own, and will lend a lot to the performance | of a Python list, albeit in a way that might be difficult to nail | down, since it depends on the overall usage of the C library 's | heap. | Zababa wrote: | Related to that, do heterogenous lists make sense as a default? I | can see the argument for dictionaries easily, you might want to | put a lot of different stuff inside one. However, for lists, I | fail to see the advantages of heterogenous lists as a default. | Especially since they are going to be way slower, and nobody seem | to use arrays in Python. | kzrdude wrote: | If you use the static typing eyes, the Python list is | _homogenous_ : every element is of the type reference to python | object. There's no funny business with types, just one | reference after another. | Zababa wrote: | That's a fair point, but if static typing was like that it | would be useless. "Object" is a type, but it's pretty useless | most of the time. | pdonis wrote: | _> do heterogenous lists make sense as a default?_ | | Yes, because then the list object doesn't have to do any extra | work to check the types of objects. If your application needs | that extra checking, you can add it in your application's code; | but there's no reason to burden the basic list object with | extra code that not everyone needs. | | _> Especially since they are going to be way slower_ | | Way slower than what? They're slower than Python arrays, of | course, but Python arrays can only store numeric values (since | the amount of space each value takes up is constant and known | when the array is created). | | For a container that needs to be able to store objects of any | type (including instances of user-defined classes), a | heterogeneous list is _faster_ than a homogeneous one, because, | as above, it doesn 't have to do extra work to check the types | of objects. All the other extra work is there because the size | in memory of the objects isn't known in advance; and that will | be true even if all of the objects are of the same type, since | instances still won't necessarily all have the same in-memory | size (for example, consider instances of user-defined classes | with different attributes). | Zababa wrote: | > For a container that needs to be able to store objects of | any type (including instances of user-defined classes), a | heterogeneous list is faster than a homogeneous one, because, | as above, it doesn't have to do extra work to check the types | of objects. | | My question is about that. Is storing objects of any type a | common use case, compared to storing objects of one type? | sseagull wrote: | I have found, at least in my own use of python, lists are | for "homogeneous" collections of indeterminate length. But | these are not strictly homogeneous - there may be None | entries, for example. So I do use the heterogeneous | capabilities. | | For heterogeneous collections with a fixed length, I tend | to use a tuple. That makes more sense to me, conceptually | (coming from C++). | pdonis wrote: | _> Is storing objects of any type a common use case, | compared to storing objects of one type?_ | | I'm not sure it's possible to know a general answer to | that, since Python is used in such a wide variety of | contexts. | | In my own programming, I use lists that can contain objects | of different types often enough for that use case to be | significant for me. | | One other thing to keep in mind is that Python has duck | typing, so, for example, if you are iterating over a list | and retrieving the same attribute or calling the same | method on each object, that doesn't require each object to | be of the same type, just to have an attribute with the | required name, or a method with the required name and | calling signature. | duckerude wrote: | Python's semantics make homogeneity less useful and hard to | enforce. | | Everything's passed by reference, so the list has to contain | pointers even if you know all its elements have the same type. | And some objects can change their type by assigning to | __class__, which can happen from a distance (because everything | is by reference). | | A careful implementation can solve this for some builtin types. | PyPy does it for ints: if a list only contains ints it's | represented as a proper numeric array. But then as soon as you | append some other type the whole list is converted to a more | expensive more traditional representation. | | PyPy's ints still have to pretend to be heap-allocated objects | at all times. In particular they have to do something tricky | with the id() function, which on CPython just returns an | object's memory address. The return value has to be unique and | consistent for as long as an object lives. | nayuki wrote: | How a Python list really works? See the underlying C | implementation: | https://github.com/python/cpython/blob/main/Objects/listobje... ; | https://github.com/python/cpython/blob/main/Include/listobje... | | For comparison, java.util.ArrayList is implemented in Java | itself, and is much shorter: | http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/... | azinman2 wrote: | Ya I was reading this thinking none of this is telling me how | CPython implements it. It's just a poor-man's version. | matthias509 wrote: | One point which I think bears mentioning with respect to | performance is that even with O(1) access, the performance in the | real world of a list implemented by a list of pointers vs a | contiguous block of memory will be very different. This is | because programs often iterate over lists and a list implemented | by a contiguous block of ram will have MUCH better cache | locality. | | AFAIK, this unfortunately wouldn't be possible to implement in | Python though because I don't think there would be any way to | know how much space to allocate for each item in the list. | seiferteric wrote: | Does it have to be one or the other? You could allocate blocks | of memory for the list then when it will be filled up you | allocate more and add a pointer to it. | dmurray wrote: | This breaks O(1) access by index, because the objects in the | list may have different sizes. | Znafon wrote: | If you have numeric value you can use | https://docs.python.org/3/library/array.html | kzrdude wrote: | But if you're in the real world you use numpy.ndarray instead | (this underpins half of the Python ecosystem, more or less). | killingtime74 wrote: | For more efficient "lists" there are Numpy arrays | https://numpy.org/doc/stable/reference/generated/numpy.array... | which do have cache locality | wodenokoto wrote: | Python does come with built-in arrays, but I've never seen | them used, or heard anyone recommend them for anything. | | https://docs.python.org/3/library/array.html | cozzyd wrote: | These are useful when you need a pointer to an integer for | a C or C++ API. | edgyquant wrote: | I had never heard of them and when I try to Google Python | array v set speed it just returns list v set. I did get the | official documentation but it doesn't go into speed. | uranusjr wrote: | They are recommended when you want speed, but most people | writing Python probably never need that much speed (or | would probably want numpy anyway), and the few who do | likely have more private sources to recommend it for | specific occasions. It should probably be used for more | things (instead of reaching for numpy at first chance) but | is still a niche tool. | abecedarius wrote: | The times I've used array, it was a common data structure | for communicating with a C extension. You could use a | Python list, but that's not just slower but more of a | pain to deal with on the C side. (This was before numpy | was everywhere; I don't know how much fun that might be | to work with from C.) | | If you're going pure Python then space is a better reason | than speed. I'm not sure an array is even as fast as a | list. | uranusjr wrote: | I think I use struct much more when interfacing with C, | but I can see array being useful for arrays specifically | when you expect more than a couple of elements. Both are | still quite niche though. | kgm wrote: | The standard array module was more relevant in the distant | past, before version 2.6 added the bytearray type. | Otherwise, numpy arrays are considerably more useful in | virtually every circumstance. | mark-r wrote: | In CPython at least, there's no other way. Everything is an | object, even simple types like integers. And references to | objects are implemented with pointers. Even a Numpy array which | stores a contiguous array of simple values requires conversion | to an object when you actually try to _use_ those values. | arthurcolle wrote: | I always wondered why there weren't more 'hyper-collection' | abstractions that just implement multiple collection types in one | interface, so that you can get the respective speedups of the | different collection types while still having the flexibility to | use them as you wish | karmakaze wrote: | This is the field of datastructures. There isn't 'just' one be- | all to end-all. There are so many kinds of hashing/tables, | trees/balancing, hybrid (e.g. skiplists), priority queues, etc. | Because there are so many possible combinations, if the choices | matter it's useful to use a language that lets you make your | own. This is one of the main reasons why generics for Go is a | big deal. Go always had generics, but it's only been for the | built-ins array/slice, map, and channel. With user-defined | generics any datastructures/interfaces can be conveniently made | and used without language/library/compiler support. | | Its curious that for Python this is blog post material, but for | Go it's intrinsic to understanding the performance and usage of | slices or maps, or in Java (and other langs) would be | referenced merely by the class name ArrayList. The fact that so | much can be done effectively in Python without always having to | think about such details is an achievement of the language and | ecosystem of libraries and packages. | ketralnis wrote: | C arrays are basically that. Also python dicts (most python | types including all user-defined classes are actually dicts | underneath), Lua tables, Javascript objects, and more. In most | dynamically typed languages, "the" type is actually what you're | calling a hyper-collection and not just conceptually, but also | in implementation | beervirus wrote: | How would that work? Every time you create an object, | internally it creates one of everything? ___________________________________________________________________ (page generated 2021-11-14 23:01 UTC)