hngopher.com

       [HN Gopher] The PEPs of Python 3.9
       ___________________________________________________________________
        
       The PEPs of Python 3.9
        
       Author : zdw
       Score  : 172 points
       Date   : 2020-05-28 15:02 UTC (7 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | nemetroid wrote:
       | > Another kind of clean up comes in PEP 585 ("Type Hinting
       | Generics In Standard Collections"). It will allow the removal of
       | a parallel set of type aliases maintained in the typing module in
       | order to support generic types. For example, the typing.List type
       | will no longer be needed to support annotations like "dict[str,
       | list[int]]" (i.e.. a dictionary with string keys and values that
       | are lists of integers).
       | 
       | I think this will go a long way toward making type annotations
       | feel less like a tacked-on feature.
        
         | diarrhea wrote:
         | Looking "back" now, it never occurred to me that importing List
         | when there is list is particularly strange. Now it sticks out
         | sorely. Very glad this change is happening.
        
         | mixmastamyk wrote:
         | There have been so many improvements in the typing area, I wish
         | they'd waited a while to get it right.
        
       | gorgoiler wrote:
       | I was quite surprised -- only a few days ago in fact -- to
       | discover the standard Python library has no support for Olson (as
       | in _tzdata_ ) timezones. Time arithmetic is impossible without
       | them.
       | 
       | The ipaddress library also has no support for calculating
       | subnets. It is quite hard to go from 2a00:aaaa:bbbb::/48 to
       | 2a00:aaaa:bbbb:cccc::/64. It would be less weird if the essence
       | of the documentation didn't make it sound like the library was
       | otherwise very thorough in the coverage of its implementation.
       | 
       | Can anyone write a PEP? Maybe I should get off my behind and
       | actually submit a patch for proper IP calculations? Or maybe I
       | missed it in the documentation (which, aside, I wish wasn't
       | written with such GNU-info style formality.)
        
         | chc wrote:
         | Unless I misunderstand what you're looking for, I think that
         | functionality is in there.                   original_net_48 =
         | ip_network("2a00:aaaa:bbbb::/48")         desired_subnet =
         | ip_network('2a00:aaaa:bbbb:cccc::/64')         subnets_64 =
         | original_net_48.subnets(16)         print(f"{desired_subnet} is
         | one of the computed subnets: {desired_subnet in subnets_64}")
         | #=> 2a00:aaaa:bbbb:cccc::/64 is one of the computed subnets:
         | True
        
           | gorgoiler wrote:
           | Thanks, but your second line kind of has the answer in it
           | already. It's more like...                 site =
           | ip_network('2a00:aaaa:bbbb::/48')       subnet = f(site, 64,
           | 0xcccc)
           | 
           | ...and I don't think _f()_ is in the standard library. But
           | maybe I just index the calculated subnets, from your example?
           | I'll give it a go!
           | 
           | Edit: yes! But it's a bit slow...
           | 
           | https://repl.it/repls/PoshPapayawhipNaturaldocs#main.py
        
         | sigjuice wrote:
         | The ipaddress library doesn't do link-local IPv6 addresses
         | either.
        
       | heavyset_go wrote:
       | Am I the only one who wants multi-lined anonymous functions in
       | Python? I find myself really wanting to reach for arrow functions
       | sometimes while writing Python, and end up disappointed that they
       | aren't available.
        
         | dragonwriter wrote:
         | > Am I the only one who wants multi-lined anonymous functions
         | in Python?
         | 
         | Lots of people want them (lots of people don't, too), but no
         | one has come up with a great syntax that plays nice with the
         | rest of Python and saves you much over named functions.
        
         | fabatka wrote:
         | I think the basic idea is "if you need multiple lines,you
         | should declare a proper fuction", so I wouldn't stand on one
         | foot until multiline anonymous functions in python.
        
         | trashburger wrote:
         | I believe Guido was against it as he is mostly opposed to the
         | functional style, as a matter of fact he was opposed to lambas
         | but begrudgingly added then after many requests.
        
         | quietbritishjim wrote:
         | What's wrong with using a nested named functions instead?
         | 
         | You may already be aware, but not everyone is: they capture
         | variables from outer scopes in exactly the same way that
         | lambdas do.
        
       | tln wrote:
       | Sometimes I wish that python strings weren't directly iterable...
       | 
       | this article sums it up better than I ever could
       | https://www.xanthir.com/b4wJ1
       | 
       | ...then str.strip and variants could be cleanly and logically
       | extended to allow this functionality, because passing a string
       | and a sequence of strings would be distinguishable.
       | 
       | Alas, clean and logical function design can be hard to do late in
       | a languages life.
       | 
       | PEP 593 and PEP 585 are clean and logical... glad to see that :)
        
         | joshuamorton wrote:
         | On non-iterable strings: The recursive type problem can be
         | solved, with something like what is proposed in [0]. (I have an
         | implementation of a fix on github linked from that thread,
         | there's edge case fixes and PEP's scare me, but technically
         | it's feasible).
         | 
         | [0]: https://mail.python.org/archives/list/typing-
         | sig@python.org/...
        
           | spott wrote:
           | While that may solve the recursive type problem, it doesn't
           | really solve the "iterating over strings is rarely what you
           | actually want to do" problem.
        
         | jorams wrote:
         | I agree with most of the article you link, but there's one
         | thing I don't understand: The article quickly dismisses the
         | obvious fix for recursive iterability, to make strings be
         | composed of "characters":
         | 
         | > And an obvious "fix" for this is worse than the original
         | problem: Common Lisp says that strings are composed of
         | characters, a totally different type, which doesn't implement
         | the same methods and has to be handled specially. It's really
         | annoying.
         | 
         | It seems to me this contradicts most of what the article says.
         | Sure, strings are rarely collections, so they should not be
         | iterable by default. But the final solution offered admits that
         | sometimes they are, and then you want to be able to iterate
         | over _something_. For most instances of _something_ , It does
         | not make sense for the individual "elements" to be strings.
         | Bytes are clearly not strings, code points are clearly not
         | strings, grapheme clusters are clearly not strings. Each of
         | those will provide very different methods, because they are
         | very different things. Only after that point (words, sentences,
         | etc.) does the idea of the element being the same type start
         | making sense again.
         | 
         | Clearly the concept of a "character" is too ambiguous, and
         | there is no clear "default" for what it should mean, but the
         | idea of a string consisting of some kind of element that is not
         | string appears obviously correct.
        
         | BiteCode_dev wrote:
         | You can easily distinguish them:                   if
         | isinstance(msg, str)
         | 
         | So I don't think that's a good argument for not accepting
         | iterables of strings in str methods. Things like replace()
         | would benefit a lot and it's not that hard to do, you can even
         | accept regexes optionally: https://wonderful-
         | wrappers.readthedocs.io/en/latest/string_w...
         | 
         | I agree that iterating on string is not proper design however.
         | It's not very useful in practice, and the O(1) access has other
         | performance consequences for more important things.
         | 
         | Swift did it right IMO, but it's a much younger language.
         | 
         | I also wish we stole the file api concepts from swift, and that
         | open() would return a file like object that always gives you
         | bytes. No "b" mode. If you want text, you to open().as_text(),
         | and get a decoding wrapper.
         | 
         | The idea that there are text files and binary files has been
         | toxic for a whole generation of coders.
        
           | Skunkleton wrote:
           | > The idea that there are text files and binary files has
           | been toxic for a whole generation of coders.
           | 
           | It really is nonsense isn't it? Its like asking a low level
           | api for opening files as a .doc, or as a pdf. Why would that
           | be part of the file io layer?
        
             | aidos wrote:
             | Well, I guess it's easy to argue that it's so common that
             | beginners would expect to open files as text. You can see
             | how it would evolve that way.
             | 
             | Now I'm more familiar with it I'm careful to be explicit
             | with the decoding when using text to make it super obvious
             | what's going on.
        
               | signal11 wrote:
               | I suspect it's also to do with Python's history as a
               | scripting language. Because of Perl's obvious strengths
               | in this area, any scripting language pretty much _has_ to
               | make it very easy to work with text files. Ruby does
               | something similar for instance.
               | 
               | Even languages like Java now recognise the need to
               | provide convenient access to text files as part of the
               | standard API, with Files.readAllLines() in 7,
               | Files.lines() in 8, and Files.readString() in 11.
        
               | heavenlyblue wrote:
               | My first mistake I made as a beginner was dumping a bunch
               | of binary data as text. Something would happen in the way
               | and not the whole data would be written because I was
               | writing it in text mode.
               | 
               | It just never appeared to me that the default mode of
               | writing the file would _not_ write the array I was
               | passing it.
               | 
               | It's much more important for beginners to be able to
               | learn clear recipes rather than having double standards
               | with a bunch of edge cases.
        
               | aidos wrote:
               | I've done worse. Using MySQL from php and not having the
               | encoding right somewhere along the way so all my content
               | was being mojibaked on the way in and un-mojibaked on the
               | way out so I didn't notice it until deep into a project
               | when I needed to extract it to another system.
               | 
               | EDIT thanks, I knew that didn't look quite right.
               | "Mojibaked" - such a great term.
        
               | mark-r wrote:
               | The term is actually "Mojibake", not "emoji baked".
               | https://en.wikipedia.org/wiki/Mojibake#Etymology
        
           | diarrhea wrote:
           | The issue is that                   if isinstance(msg, str)
           | 
           | will clutter code that is otherwise clean. A single type has
           | to be specially handled, which sticks out like a sore thumb.
           | 
           | As a second point, do you have more on your last sentence?
           | ("The idea that there are text files and binary files has
           | been toxic for a whole generation of coders."). I have been
           | _thoroughly_ confused about text vs. bytes when learning
           | Python /programming.
           | 
           | The two types are treated as siblings, when text files are
           | really a child of binary files. Binary files are simply
           | regular files, and sit as the single parent, without parents
           | itself, in the tree. Text files are just one of the many
           | children, that happen to yield text when their byte patterns
           | happen to be interpreted using the correct encoding (or, in
           | the spirit of Python, decoding when going from bytes to
           | text), like UTF8. This is just like, say, audio files
           | yielding audio when interpreted with the correct encoding
           | (say MP3).
           | 
           | Is this a valid way of seeing it? I have to ask very
           | carefully because I have never seen it explained this way, so
           | that is just what I put together as a mental model over time.
           | In opposition to that model, resources like books always
           | treat binary and text files as polar opposites/siblings.
           | 
           | This leads me to the initial question of whether you know of
           | resources that would support the above model (assuming it is
           | correct)?
        
             | BiteCode_dev wrote:
             | The open() API is inherited from the C way, where the world
             | is divided between text files and binary files. So you open
             | a file in "text" mode, and "binary" mode, "text" being the
             | default behavior.
             | 
             | This is, of course, utterly BS.
             | 
             | All files are binary files.
             | 
             | Some contains sound data, some image data, some zip data,
             | some pdf data, and some raw encoded text data.
             | 
             | But we don't have a "jpg" mode for open(). We do have
             | higher API we pass file objects to in order to decode their
             | content as jpg, which is what we should be doing to text.
             | Text is not an exceptional case.
             | 
             | VSCode does a lot of work to turn those bytes into pretty
             | words, just like VLC into videos. They are not like that in
             | the file. It's all a representation for human consumption.
             | 
             | The reasoning for this confusing API is that reading text
             | from a file is a common use, which is true. Espacially on
             | Unix, from which C is from. But using a "mode" is the wrong
             | abstraction to offer it.
             | 
             | If fact, Python 3 does it partially right. It has a
             | io.FileIO object that just take care of opening the stuff,
             | and a io.BufferedReader that wraps FileIO to offer
             | practical methods to access its content.
             | 
             | This what what open(mode="b") returns.
             | 
             | If you do open(mode="t"), which is the default, it wraps
             | the BufferedReader into a TextStream that does the decoding
             | part transparently for you, and returns that.
             | 
             | There is an great explanation of this by the always
             | excellent David Beazley:
             | http://www.dabeaz.com/python3io_2010/MasteringIO.pdf
             | 
             | What it should do is offering something this:
             | with open('text.txt').as_text():
             | 
             | open() would always return BufferedReadfer, as_text() would
             | always return TextStream.
             | 
             | This completly separates I/O from decoding, removing
             | confusion in the mind of all those coders that would
             | otherwise live by the illusionary binary/text model. It
             | also makes the API much less error prone: you can easily
             | see where to the file related arguments go (in open()) and
             | where to text related arguments go (in as_text()).
             | 
             | You can keep the mode, but only for "read", "write" and
             | "append", removing the weird mix with "text" and "bytes"
             | which are really related to a different set of operations.
        
               | VWWHFSfQ wrote:
               | How would this work                   with
               | open('text.txt', 'w').as_text():
        
               | BiteCode_dev wrote:
               | with open('text.txt','w').as_text() as f:
               | f.write("text")
        
             | viraptor wrote:
             | That sounds completely like a correct way to look at it.
             | I'd put "stream of bytes" and "seekable stream of bytes"
             | above files, but that's just nitpicking.
             | 
             | For me the toxic idea about text files is that they're a
             | thing at all. They're just binary files containing encoded
             | text, _without_ any encoding marker making them an ideal
             | trap. Is a utf16 file a text file? Is a shift-jis file a
             | text file? Have fun guessing edge cases. We 've already
             | accepted with unicode that the "text" or letters are
             | something separate from the encoding.
        
               | ghshephard wrote:
               | Totally agree that everything should be a byte stream.
               | Even with Python 3.x text files are still confusing - if
               | you open a UTF-8 file with a BOM in the front as a text
               | file - should that BOM be part of the file contents, or
               | transparently removed? By default, Python treats it as
               | actual content, which can screw all sorts of things up.
               | In my ideal world, _every file_ is a binary file, and
               | that if you want it to be a text file - just open it with
               | whatever encoding scheme you think appropriate (typically
               | UTF-8).
               | 
               | If you don't know the Encoding? Just write a quick
               | detect_bom function (should be part of the standard
               | library, no idea why it isn't) and then open it with that
               | encoding. I.E.:                  encoding =
               | detect_bom(fn)        with open (fn, 'r', encoding =
               | encoding) as f:           ...
               | 
               | That also has the benefit of removing the BOM from your
               | file.
               | 
               | Ultimately, putting the responsibility for determining
               | the CODEC on the user at least makes it _clear_ to them
               | what they are doing -opening a binary file and decoding
               | it. That mental model prepares them for the first time
               | they run into, say, a cp587 file.
               | 
               | I understand why Python doesn't do this - it adds a bit
               | of complexity - though you could have an "auto-detect"
               | encoding scheme that tried to determine the encoding
               | schemes, and defaults to UTF-8 - not perfect, as you
               | can't absolutely determine the CODEC of a file by reading
               | it - but better than what we have today - where your code
               | crashes when you have a BOM that upsets UTF-8 decoder.
               | 
               | I finally wrote a library function to guess codecs and
               | read text files, inspired by
               | https://stackoverflow.com/a/24370596/1637450 and haven't
               | been tripped up since.
               | 
               | But Python does _not_ make it easy to open  "text" files
               | - and I know data engineers who've been doing this for
               | years who are still tripped up.
        
               | BiteCode_dev wrote:
               | Chardet, written by mozilla, already detect encoding if
               | you need such thing.
        
         | danpalmer wrote:
         | Strings being it stable can cause problems, and another
         | commenter has pointed out that Swift handles it well.
         | 
         | However I think strings being iterable is one of the core
         | ergonomics in the language and basic types of Python that make
         | it so nice for many applications. Scripting, scraping, data
         | cleanup, data science, even basic web development, all benefit
         | hugely from little features like this. Without this sort of
         | thing Python would be a different language with different uses.
         | 
         | While I normally like safety and types, I'm personally happy
         | with things like this because it fits with Pythons strengths.
        
           | cauthon wrote:
           | I disagree, I don't think there's any meaningful benefit. For
           | example, lets say we iterated over strings as follows.
           | 
           | for char in my_str.chars(): foo()
           | 
           | That wouldn't sacrifice any ergonomics, being consistent with
           | how we already iterate over dictionary contents with
           | d.items(), and it'd address all the concerns in the parent
           | comment link
        
       | slightwinder wrote:
       | > Eventually, removeprefix() and removesuffix() seemed to gain
       | the upper hand, which is what Sweeney eventually switched to.
       | 
       | Great naming...missed their chance to make the functionality of
       | strip/lstrip/rstrip more clear by name the new methods
       | stripword/lstripword/rstripword. Which would also had the benefit
       | of consistence.
        
         | masklinn wrote:
         | stripwords could (/would) imply that it acts on words. As in,
         | whitespace separated things.
        
       | trashburger wrote:
       | I welcome the terser type hints for generics. I was wishing for
       | something terser, like:                   {str: [int]}
       | 
       | being equivalent to what is currently dict[str, list[int]] in the
       | PEP, but I guess it will have to do.
        
       | jefft255 wrote:
       | Could someone explain to me what kind of new language features
       | the new parser will allow? I'm curious and very incompetent when
       | it comes to understanding what LL(1) grammar would imply for the
       | end-user (the python programmer like me).
        
         | MHordecki wrote:
         | The linked LWN article[1] mentions context-sensitive keywords,
         | ie. a way to treat certain words as language keywords only in
         | specific contexts. For example, a new match statement that
         | wouldn't require reserving the `match` word as a language
         | keyword, which would require a breaking change and break all
         | existing code that uses `match` as a variable name.
         | 
         | Such a feature requires support from the parser.
         | 
         | [1]: https://lwn.net/Articles/816922/
        
           | kalenx wrote:
           | One good example (for those who do not want to read the full
           | article) is the async keyword. Introducing it as keyword
           | broke a few libraries which were already using them as kwarg
           | in some functions (e.g. pytorch).
        
           | saagarjha wrote:
           | I wonder if Python will go back and address other shortcoming
           | which I assume are tied to the parser, such as the inability
           | to use quotes inside the interpolated segments.
        
         | pixelmonkey wrote:
         | Extensive explanation from Python creator and PEG parser
         | implementor Guido van Rossum himself can be found in this
         | video:
         | 
         | https://youtu.be/QppWTvh7_sI
         | 
         | It's also just a fun video on language parsers in general.
        
           | wenc wrote:
           | I used a PEG parser for a language I designed because I was
           | attracted to the linear time parsing achievable using a
           | packrat parser.
           | 
           | However, I also found that parsing is rarely a performance
           | bottleneck so that wasn't a big plus.
           | 
           | PEGs are however are easier to reason about, no ambiguities,
           | so that was a good enough reason.
        
             | haberman wrote:
             | > PEGs are however are easier to reason about, no
             | ambiguities, so that was good enough reason.
             | 
             | I frequently hear this mentioned ("PEGs don't have
             | ambiguity"). It is literally true, but I don't think it's
             | true in the sense that actually matters.
             | 
             | I've blogged about this in the past
             | (https://blog.reverberate.org/2013/09/ll-and-lr-in-context-
             | wh...), but I'm not the only person saying this:
             | 
             | > PEG is not unambiguous in any helpful sense of that word.
             | BNF allows you to specify ambiguous grammars, and that
             | feature is tied to its power and flexibility and often
             | useful in itself. PEG will only deliver one of those
             | parses. But without an easy way of knowing which parse, the
             | underlying ambiguity is not addressed -- it is just
             | ignored.
             | 
             | https://jeffreykegler.github.io/Ocean-of-Awareness-
             | blog/indi...
        
             | justinpombrio wrote:
             | > PEGs are however easier to reason about
             | 
             | Yes and no.
             | 
             | PEG parsers are definitely easier to _implement_ than any
             | of the LR(k)-and-ilk parsers. So if you 're writing or
             | debugging a PEG parser, that will be easier.
             | 
             | However, while shift-reduce conflicts are confusing, they
             | are there to give the strong guarantee that the grammar is
             | unambiguous. And the parser generator will tell you this as
             | soon as you've defined the grammar, before you've even used
             | it. PEG grammars instead remove the guarantee, and let you
             | deal with any confusion that arises much later.
             | 
             | Here are some methods of reasoning that standard parsers
             | will give you that PEG parers will not:
             | 
             | 1. A ::= B | C is exactly the same as A ::= C | B.
             | 
             | 2. If A ::= B | C and you have a program containing a
             | fragment that parses as an "A" because it matched "B", then
             | you can replace that fragment with something that matches
             | "C" and it the program will still parse.
             | 
             | Neither of these rules hold in PEGs.
             | 
             | Here's a practical concern that (2) helps with. Say you
             | have a grammar for html, and a grammar for js. And you want
             | to be able to parse html with embedded JS. So you stick the
             | js grammar into the html grammar at the right places. If
             | you're using a standard (e.g. LR(k)) parser, and you _don
             | 't_ get any shift-reduce (or other) conflicts, then the
             | combined grammar works. In contrast, if you're using a PEG
             | grammar, it's possible that you've ordered things wrong and
             | there are valid JS programs that will never parse because
             | they're clobbered by html parsing rules outside of them. Or
             | vice-versa.
             | 
             | Also, realistically if you're using a PEG parser you'll
             | want one that handles left recursion, because working
             | without left recursion turns your grammar into a mess. And
             | left recursion in PEGs can have some weird behavior.
        
         | 1wd wrote:
         | https://pyfound.blogspot.com/2020/04/replacing-cpythons-pars...
         | gives "Parenthesized with-statements" as an example.
        
       | OJFord wrote:
       | > Eric Fahlgren amusingly summed up the name fight this way:
       | 
       | > > I think name choice is easier if you write the documentation
       | first:
       | 
       | > > cutprefix - Removes the specified prefix.
       | 
       | > > trimprefix - Removes the specified prefix.
       | 
       | > > stripprefix - Removes the specified prefix.
       | 
       | > > removeprefix - Removes the specified prefix. Duh. :)
       | 
       | I actually don't agree that it's so obvious, since it returns the
       | prefix-removed string rather than modifying in-place. I think
       | Fahlgren's argument would work better for `withoutprefix`.
        
         | theandrewbailey wrote:
         | Python strings are immutable, so in-place modification would
         | violate lots of rules and conventions.
        
         | kbd wrote:
         | I would have preferred 'stripprefix' for unity with 'strip',
         | 'rstrip', and 'lstrip'
        
           | mixmastamyk wrote:
           | They work differently. Strip removes all given characters
           | from the end in any order. Trim sounds better to me, like one
           | operation.
        
           | pdonis wrote:
           | That's discussed in the article: the "strip" methods don't
           | interpret strings of multiple characters as a single prefix
           | or suffix to be removed, so it was felt to be too confusing
           | to use "strip" type names for methods that _do_ interpret
           | strings that way.
        
         | thijsvandien wrote:
         | Ugh, I do like withoutprefix a lot better. That makes it
         | obvious it returns something new and that there's no reason to
         | raise ever.
        
         | BiteCode_dev wrote:
         | Strings are immutable in Python, and all strings operations
         | return new strings, including all string methods.
         | 
         | So there is no possible confusion.
        
           | ianhorn wrote:
           | If I have a string x = "this is a very long string..." and do
           | y = x[:10], then it's a whole new string? If x is near my
           | memory limits, and I do y = x[:-1] will it basically double
           | my memory usage? Is that what you meant by every string is a
           | new string?
        
             | Erlangen wrote:
             | Slicing in Python always create a new object. You can test
             | it with a list of integers..
        
               | masklinn wrote:
               | > Slicing in Python always create a new object.
               | 
               | It always creates a new object but it doesn't necessarily
               | copy the contents (even shallowly).
               | 
               | For instance slicing a `memoryview` creates a subview
               | which shares storage with its parent.
        
               | cecilpl2 wrote:
               | My favorite example of something similar to this, since
               | you brought it up:                 >>> a = [254, 255,
               | 256, 257, 258]       >>> b = [254, 255, 256, 257, 258]
               | >>> for i in range(5): print(a[i] is b[i])       ...
               | True       True       True       False       False
               | 
               | In Python, integers in the range [-5, 256] are statically
               | constructed in the interpreter and refer to fixed
               | instances of objects. All other integers are created
               | dynamically and refer to a new object each time they are
               | created.
        
               | ianhorn wrote:
               | It'll always create a new object but my understanding is
               | that at least in numpy the new and old object will share
               | memory. Am I wrong there too?
        
               | dialamac wrote:
               | CPython is pretty terrible. Numpy has the concept of
               | views, cpython doesn't do anything sophisticated.
        
               | aidos wrote:
               | Correct. In Numpy the slices are views on the underlying
               | memory. That's why they're so fast, there's no copying
               | involved. Incidentally that's also why freeing up the
               | original variable doesn't release the memory (the slices
               | are still using it).
        
             | masklinn wrote:
             | > If I have a string x = "this is a very long string..."
             | and do y = x[:10], then it's a whole new string?
             | 
             | Yes. And doing otherwise is pretty risky as the Java folks
             | discovered, ultimately deciding to revert the optimisation
             | of substring sharing storage rather than copying its data.
             | 
             | The issue is that while data-sharing substringing is
             | essentially free, it also _keeps the original string alive_
             | , so if you slurp in megabytes, slice out a few bytes you
             | keep around and "throw away" the original string, that one
             | is still kept alive by the substringing you perform, and
             | you basically have a hard to diagnose memory leak due to
             | completely implicit behaviour.
             | 
             | Languages which perform this sharing explicitly -- and
             | especially statically (e.g. Rust) -- don't have this issue,
             | but it's a risky move when you only have one string type.
             | 
             | Incidentally, Python provides for opting into that
             | behaviour for _bytes_ using memory views.
        
             | [deleted]
        
             | sadfklsjlkjwt wrote:
             | Yes.
        
             | ORioN63 wrote:
             | Yes. Although you can use `islice` from itertools to get
             | around this problem, when a problem.
        
             | BiteCode_dev wrote:
             | If x is near your memory limits, and you do y = x[:-1], you
             | will get a MemoryError :)
             | 
             | For those situations, bytes() + memoryview() or bytearray()
             | can be used, but then you are on your own.
        
               | ianhorn wrote:
               | Huh, I've had a wrong understanding of that for over a
               | decade! TIL, thanks.
        
               | BiteCode_dev wrote:
               | Hey!
               | 
               | https://xkcd.com/1053/
               | 
               | And honestly, I would be rich if I got a dollar every
               | time a student does this:                   msg.upper()
               | 
               | Instead of:                   msg = msg.upper()
               | 
               | And then call me to say it doesn't work.
        
               | pbowyer wrote:
               | > And honestly, I would be rich if I got a dollar every
               | time a student does this:
               | 
               | > msg.upper()
               | 
               | > Instead of:
               | 
               | > msg = msg.upper()
               | 
               | > And then call me to say it doesn't work.
               | 
               | On this, isn't the student's reasoning sensible? E.g. "If
               | msg is a String object that represents my string, then
               | calling .upper() on it will change (mutate) the value,
               | because I'm calling it on itself"?
               | 
               | If the syntax was upper(msg) or to a lesser extent
               | String.upper(msg) then the new-to-programming me would
               | have understood more clearly that msg was not going to
               | change. Have you any insights into what your students are
               | thinking?
        
               | BiteCode_dev wrote:
               | A student don't know anything about mutability, and since
               | Python signatures are not explicit, there is no way to
               | know they have to do that.
               | 
               | It's just something to be told. A design decision, like
               | there are thousands to learn in IT, that you just can't
               | guess.
        
               | cesarb wrote:
               | IMO, this is a defect in the language: the lack of a
               | "must_use" annotation or similar. If that annotation
               | existed, and the .upper() method was annotated with it,
               | the compiler could warn in that situation.
        
               | diarrhea wrote:
               | But you are free to do                 if title ==
               | user_input.upper():
               | 
               | That is, you convert a string to upper without binding
               | the result to a name. You just use it in-place and
               | discard the result, which is fine.
               | 
               | With compiler, you mean mypy or linters?
        
               | TkTech wrote:
               | That's still "using" the resulting value for a
               | comparison. CPython isn't an optimizing compiler, or it
               | would completely remove the call to upper().
               | >>> def up(v):         ...     v.upper()         ...
               | >>> dis.dis(up)         2           0 LOAD_FAST
               | 0 (v)                     2 LOAD_METHOD              0
               | (upper)                     4 CALL_METHOD              0
               | 6 POP_TOP                     8 LOAD_CONST
               | 0 (None)                     10 RETURN_VALUE
               | >>> def up(v):         ...     if v.upper() ==
               | "HelloWorld":         ...        return True         ...
               | >>> dis.dis(up)         2           0 LOAD_FAST
               | 0 (v)                     2 LOAD_METHOD              0
               | (upper)                     4 CALL_METHOD              0
               | 6 LOAD_CONST               1 ('HelloWorld')
               | 8 COMPARE_OP               2 (==)                     10
               | POP_JUMP_IF_FALSE       16              3          12
               | LOAD_CONST               2 (True)                     14
               | RETURN_VALUE                 >>   16 LOAD_CONST
               | 0 (None)                     18 RETURN_VALUE
               | 
               | Notice in the first example, right after CALL_METHOD the
               | return value on the stack is just immediately POP'd away.
               | The parent is saying that when you run `python
               | example.py` CPython should see that the return value is
               | never used and emit a warning. This would only happen
               | because `upper()` was manually marked using the suggested
               | `must_use` annotation.
        
               | delaaxe wrote:
               | He meant that writing a line of code with only contents:
               | msg.upper()
               | 
               | should trigger a warning as this clearly doesn't do
               | anything.
        
               | BiteCode_dev wrote:
               | Python is interpretted, not compiled, and completly
               | dynamic. You cannot check much statically.
               | 
               | In fact, any program can replace anything on the fly, and
               | swap your string for something similar but mutable.
               | 
               | It's the trade off you make when choosing it.
        
               | [deleted]
        
       | theandrewbailey wrote:
       | No PEP 554 (subinterpreters). That's been moved to 3.10:
       | https://www.python.org/dev/peps/pep-0554/
        
         | BiteCode_dev wrote:
         | Given how heated was the debate about those, it's good we
         | didn't try to go too fast with it.
         | 
         | I'm full of hopes for this feature, but it's going to be a slow
         | hard work, and we'll only rip the benefit on the long run. So
         | no use to rush it.
         | 
         | I feel like we rushed asyncio and type hints, and it took years
         | to make them usable after they were introduced.
        
       | traes wrote:
       | Is the next release still planned to be called Python 4? I seem
       | to recall GvR saying that at one point, though I could be
       | mistaken.
        
         | BiteCode_dev wrote:
         | 3.10.
         | 
         | https://www.python.org/dev/peps/pep-0619/
        
           | twic wrote:
           | After that will be Python 3.11, and after that, Python for
           | Workgroups 3.11.
        
             | cozzyd wrote:
             | Followed by inexplicably jumping to pithon.
        
             | doersino wrote:
             | Python 95 is gonna blow everyone's minds.
        
               | BiteCode_dev wrote:
               | With semver, it's going to be too long to reach. I
               | suggest we switch to the Firefox versioning scheme, it
               | seems to already be near this goal.
        
               | agumonkey wrote:
               | the first release with Plug and Pep
        
               | _jal wrote:
               | Yeah, but PythoNT will be around for a long time.
        
               | mixmastamyk wrote:
               | Start me up!
        
               | pansa2 wrote:
               | You make a grown man cry...
        
         | downerending wrote:
         | And very importantly, would Python 4 be a new language, or
         | compatible with Python 3? (compare Python 3 vs Python 2)
        
           | Scarblac wrote:
           | They decided they didn't like the backwards-incompatible
           | changes they made in 3, so Python 4 will go back to how
           | things were in 2.
        
           | BiteCode_dev wrote:
           | There is no Python 4 planned for now.
           | 
           | Python broke compat once in 25 years and gave 13 years to
           | migrate.
           | 
           | It's a very conservative language.
        
             | mixmastamyk wrote:
             | Python breaks compatibility often on minor point releases.
             | Only once as big as 3.0.0 but it happens regularly.
             | 
             | I argued on the list that these should be kept for major
             | releases for planning reasons but they appear to be
             | convinced it is too hard.
        
               | kzrdude wrote:
               | If we look back in python history, the rolling breaking
               | changes have been handled mostly fine, and the actual
               | Python 3 caused a lot of pain in the ecosystem. So I hope
               | they stay away from major versions and keep up the other
               | things they are doing.
        
               | mixmastamyk wrote:
               | That was due to the scope of the breakage, not number
               | format. A good way to handle that and maintain
               | predictability is to constrain breaking changes, yet
               | defer them to 4.X.
        
               | ghshephard wrote:
               | Can you provide an example of where Python has broken
               | backwards compatibility recently between 3.x version?
               | I'll admit (despite googling for 5 or so minutes) that i
               | don't actually know if it does. It obviously breaks
               | _forward_ compatibility continuously all the time - new
               | language features are landing, and they just aren 't
               | present in previous versions - but I don't know if I've
               | ever run into people being tripped up by that.
               | 
               | I know some Python _Libraries_ break backwards
               | compatibility (Pandas being a big one) - but, for the
               | most part, hasn 't the language been backwards compatible
               | since at least Python 3.4? (And possibly further back,
               | for all I know).
        
               | mixmastamyk wrote:
               | On this page you'll see them. Check the "Porting to X.X"
               | items under each release:
               | 
               | https://docs.python.org/3/whatsnew/index.html
               | 
               | Keep in mind they have deferred a number of them because
               | of the impending EOL of Python 2.7. There are fewer
               | breaking changes during the latter 3.X series, which
               | should resume in 3.9 or 3.10 now that Python2 has passed
               | on.
               | 
               | Here's a commonly mentioned one:
               | 
               | Changes in Python Behavior: async and await names are now
               | reserved keywords. Code using these names as identifiers
               | will now raise a SyntaxError. (Contributed by Jelle
               | Zijlstra in bpo-30406.)
               | 
               | Note: I think this is a bad idea, I'd rather all these
               | small breaking changes and parser be deferred to 4.X. But
               | they need to be _small breaking changes,_ of course, not
               | a new language.
        
             | downerending wrote:
             | I'm struggling to think of any other language that has done
             | something like this.
             | 
             | It might seem like a quibble, but it seems better to
             | describe Python 3 as a different language versus Python 2.
             | Newbies seem to get that.
             | 
             | (Or, alternatively, "How many Python 2 scripts will run on
             | a Python 3 interpreter?" Answer: "None of them.")
        
               | vasco wrote:
               | PHP which tried to address unicode in version 6, but then
               | abandoned it and went straight to 7. Perl, which
               | amusingly also at version 6 decided on a huge re-write,
               | but then just decided to rename the version as an actual
               | new language "Raku".
        
               | downerending wrote:
               | Re _Raku_ , in hindsight, this was the right call. Give a
               | new language a new name, and the story stays simple.
        
               | The_Colonel wrote:
               | Yep. Also people are not shamed and ridiculed online
               | because they did not upgraded to new language.
        
               | BiteCode_dev wrote:
               | It if was such a righ call, why did it die ?
        
               | nostoc wrote:
               | > Or, alternatively, "How many Python 2 scripts will run
               | on a Python 3 interpreter?" Answer: "None of them."
               | 
               | That is completely false. So many libraries have support
               | both for 2 and 3. That's code that run just as well under
               | both interpreter.
        
               | downerending wrote:
               | Really? The same .py file runs under _python2_ and
               | _python3_?
               | 
               | Googling quickly, I find this, which does a bit better
               | than 2to3. I suppose one could write to a somewhat
               | constrained intersection of Python2 and Python3, if one
               | is willing to make at least some boilerplate changes to
               | the original Python2 code.
               | 
               | https://python-future.org/overview.html
               | 
               | That said, if you bring a Python2 script and feed it to a
               | Python3 interpreter, no, in general that will not work.
               | They simply aren't the same language. Even a simple
               | "print x" will do you in.
        
               | pdonis wrote:
               | _> The same .py file runs under python2 and python3?_
               | 
               | Sure, as long as it doesn't contain any syntax or
               | spellings which are incompatible between the two. That's
               | a fairly large subset of the language.
               | 
               |  _> if you bring a Python2 script and feed it to a
               | Python3 interpreter, no, in general that will not work.
               | They simply aren 't the same language. Even a simple
               | "print x" will do you in._
               | 
               | But this will work:                   from __future__
               | import print_function         print(x)
               | 
               | This is valid under both Python 2 and Python 3.
               | 
               | Also, as I said above, there is a pretty large subset of
               | the Python language that has the same syntax and
               | spellings in both Python 2 and Python 3, and any script
               | or module or package that only uses that subset will run
               | just fine under both interpreters. You are drastically
               | underestimating both the size and the usage of this
               | subset of the language.
        
               | 4ec0755f5522 wrote:
               | A lot of projects write in that style e.g. compatible
               | with both python2 and python3, it's really common because
               | there's so much py2 deployed (was default on centOS until
               | very recently, still default on osx, etc.)
               | 
               | Nearly every py3 feature was backported to 2 you just
               | need to write it in a compatible way. I'm seeing some
               | drop py2 support now though. Which I'm fine with, I
               | haven't written python2 code in maybe 6 or 7 years now.
        
               | Scarblac wrote:
               | Say Django 1.11, a massive amount of .py files, works
               | completely fine under both 2 and 3. As do many other
               | libraries.
               | 
               | Yes you often need some precautions like "from __future__
               | import" statements and sometimes libraries like `six`,
               | but it's been perfectly normal practice for most of the
               | last decade.
        
               | BiteCode_dev wrote:
               | No, but it is possible to write Python 2 code than runs
               | in Python 3.
               | 
               | In fact, the vast majority of popular libs had a 2/3
               | compatible code base for a few years.
               | 
               | The hard part was not the syntax in fact. It's pretty
               | trivial: the language are not that different.
               | 
               | The hard part is the I/O stack, because the stdlib is
               | very different, espacially for this part.
        
               | cesarb wrote:
               | > Or, alternatively, "How many Python 2 scripts will run
               | on a Python 3 interpreter?" Answer: "None of them."
               | 
               | That's obviously untrue. For example, consider the
               | following Python 2 script:                   with
               | open('a.txt', 'r') as a, open('b.txt', 'w') as b:
               | for line in a:                 b.write(line)
               | 
               | It works identically on a Python 3 interpreter, and it
               | doesn't even use "from __future__ import ...".
        
               | downerending wrote:
               | Yes, I should have said, "a vanishingly small proportion,
               | and even then mostly by accident".
               | 
               | My guess is that if you sweep GitHub for Python2 code and
               | push it into Python3, that proportion would be under one
               | percent.
        
               | mixmastamyk wrote:
               | As a big user of logging and little to do with character
               | coding, all of my admin/daemon stuff moved over with
               | almost no changes necessary for 3.0 (actually ~3.3).
               | 
               | For some projects I did bigger refactors for 2.6/7
               | (exceptions) and 3.6 (fstrings).
        
               | dtech wrote:
               | > Or, alternatively, "How many Python 2 scripts will run
               | on a Python 3 interpreter?" Answer: "None of them."
               | 
               | That is not true though, a lot of 2 scripts will run no
               | problem, especially if import future is used
        
               | chc wrote:
               | Ruby did something like this around the same time Python
               | did. Ruby's was a bit smaller, but overall a roughly
               | similar amount of breaking changes. They forced you to
               | think about encodings more with Strings, they changed the
               | signatures of several operators, they changed some of the
               | syntax for case statements, they drastically changed the
               | scope rules for block variables, they restructured the
               | base object hierarchy, etc. In both cases, it was a
               | deliberate decision to make a clean break. I think Ruby's
               | big break didn't make as big a schism mainly because
               | Rails was very supportive, and Rails holds an enormous
               | amount of influence in the Ruby world.
               | 
               | If Python 3 had been introduced as a separate language,
               | I'm pretty sure everyone would have said "Why isn't this
               | just called Python 3? It's 99.9% the same as Python and
               | it's by the same people and they're deprecating Python in
               | favor of it."
        
               | BiteCode_dev wrote:
               | First, there are not a lot of interpretted (not compiled,
               | that's another matter entirely) languages that are as old
               | as Python.
               | 
               | And there are really few that are even near Python
               | popularity, or used with such diversity as Python.
               | 
               | I mean, you can get away with keeping AWK the way it was
               | 2 decades ago, nobody is going to use it for for machine
               | learning or to teach computing in all universities in the
               | world on 3 operating systems, utilizing C extensions, or
               | processing Web API.
               | 
               | Among the few that would even compare, there are the ones
               | that have accumulated so much cruft that they became
               | unusable from today's standard (E.G: bash). Then you have
               | those who have done like Python (E.G: perl 6). The ones
               | that just tried and failed (PHP 6). The ones that broke
               | compat and told everybody move or die (Ruby in a point
               | release, gave basically 2 years). And the ones that
               | created a huge pile of horror they called full stack to
               | keep going (E.G: JS). Also those that got hijacked by
               | vendors and just exploded in miriads of proprietary
               | syntaxes (E.G: SQL) or completely new concepts (E.G:
               | lisp).
               | 
               | At least, in Python you CAN write Python 2/3 compatible
               | code, and you have a LOT of tooling to help you with
               | that, or migrating.
               | 
               | So, yes, the Python 2 -> 3 transition could have been
               | better. Insight is 20/20.
               | 
               | But I'm struggling to think of any other language in a
               | similar situation that has done better.
        
               | delaaxe wrote:
               | Hindsight 20/20, not insight.
        
               | BiteCode_dev wrote:
               | Thanks. French here. Can't edit anymore though
        
               | jnwatson wrote:
               | Swift has had 3 backwards incompatible versions of larger
               | scope in less time.
        
       | tartrate wrote:
       | re.sub?
        
       ___________________________________________________________________
       (page generated 2020-05-28 23:00 UTC)