[HN Gopher] Prevent DoS by large int-str conversions ___________________________________________________________________ Prevent DoS by large int-str conversions Author : genericlemon24 Score : 78 points Date : 2022-09-07 17:03 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | wmichelin wrote: | Can anyone TL;DR why? Why wouldn't it just return that long | integer of all 1s? | sp332 wrote: | Yeah it's right at the top of the linked page? | schoen wrote: | It's stated to be CVE-2020-10735, which is apparently about a | denial of service by forcing Python to inefficiently convert a | very large string to an integer, using a potentially ridiculous | amount of CPU time. | | The CVE hasn't been published, but for example there's an | explanation at | | https://bugzilla.redhat.com/show_bug.cgi?id=1834423 | klyrs wrote: | Looks to me like the actual problem is in string.__mul__ -- | that one's got arbitrary memory usage. Better limit those | arguments... | masklinn wrote: | str.__mul__ is just a conveniently short way to demonstrate | the issue, the target is pretty much any parsing routine | exposed to outside users e.g. any JSON API. | klyrs wrote: | Apologies, my comment is snark. The algorithm in question | is soft-linear, faster implementations exist, this seems | like an incredibly myopic fix. Just make a bigger JSON | blob and it will take longer to parse. | [deleted] | adgjlsfhk1 wrote: | this seems like a dumb fix to the cve to me. why not just use | a faster algorithm? | lifthrasiir wrote: | Because there is no linear-time algorithm for decimal-to- | binary conversion. If we are to expose the bignum-aware | `int` function to untrusted input there should be some | limit anyway. I do think the current limit of 4301 digits | seem too low though---if it were something like 1 million | digits I would be okay. | schoen wrote: | It looks like there is some discussion of the algorithmic | options at | | https://github.com/python/cpython/issues/95778 | | https://github.com/python/cpython/issues/90716 | | Is there something bad going on with Python's internal | representation of big integers, too? I thought I might | have understood Tim Peters to be saying that in the | latter thread. | | It does look like gmpy2.mpz() is like 100 times faster | than int() or something. Is this just because it's doing | it all in assembly rather than in Python bytecodes, or | are the Python data structures here also not so hot? | thehappypm wrote: | One of the comments showed the incredibly naive approach | of just building the integer digit-by-digit: | | '1234' => 1x1000 + 2x100 + 3x10 + 4x1 | | Is faster and has room to improve | tylerhou wrote: | This takes (worse than) quadratic time. | thehappypm wrote: | I'm not sure it does, in the best case. | | There are d additions, so the addition is linear time. | | Each multiplication is potentially quadratic, but it | seems optimizable since it's never multiplication of two | large numbers--always one large and one small number. | singron wrote: | Each addition is linear in d, but there are d additions, | so it's already quadratic before you even consider the | multiplications. | | In a power-of-2 base, the result of the multiplication is | a constant number of digits (because the multiplication | is just a shift of a single digit), so the additions | could each be constant time in that case. | klodolph wrote: | > It does look like gmpy2.mpz() is like 100 times faster | than int() or something. Is this just because it's doing | it all in assembly rather than in Python bytecodes, or | are the Python data structures here also not so hot? | | It's not the data structures. The data structures are | really more or less the same: you have some array of | words, with a length and a sign. The only real | differences are in the particular length of word that you | choose, which is not a very interesting difference. | | Assembly language optimizations do tend to matter here, | because you're working with the carry bit for lots of | these operations, and each architecture also has some | different way of multiplying numbers. Multiplying numbers | is "funny" because it produces two words of output for | one word of input. | | There are also sometimes some different algorithms in | use, and GMP uses some different algorithms depending on | the size. Here's a page describing the algorithms used by | GMP: | | https://gmplib.org/manual/Multiplication-Algorithms | | Here's a description of how carries are propagated: | | https://gmplib.org/manual/Assembly-Carry-Propagation | | IMO, I wouldn't expect my language's built-in bigint type | to use the best, most cutting-edge algorithms and lots of | hand-tuned assembly. GMP is a specialized library for | doing special things. | tylerhou wrote: | There is no practical linear time algorithm for | multiplication; should Python disable multiplication for | numbers greater than 10^4301? | | Even a naive divide and conquer decimal to binary | algorithm is only logarithmically slower than | multiplication. | adgjlsfhk1 wrote: | there isn't a linear time algorithm, but there is an | algorithm in O(n*log(n)^2) http://maths- | people.anu.edu.au/~brent/pd/rpb032.pdf which is pretty | close. it also seems weird to have a CVE for "some | algorithms don't run in linear time". should there be a | 4000 element maximum for the size of list passed to sort? | lifthrasiir wrote: | > should there be a 4000 element maximum for the size of | list passed to sort? | | Technically speaking, yes, there should be some limit if | you are accepting an untrusted input. But there is a good | argument for making this limit built-in for integers but | not lists: integers are expected to be atomic while lists | are wildly understood as aggregates, therefore large | integers can more easily propagate throughout | unsuspecting code base than large lists. | | (Or, if you are just saying that once you have sub- | quadratic algorithms you don't need language-imposed | limits anymore, maybe you are right.) | bjourne wrote: | But why convert it to binary? If you store the number as | an array of digits the parsing process should be O(n). | lifthrasiir wrote: | That means every limb operation should be done modulo | 10^k, which would be pretty expensive and only makes | sense if you don't do much computation with them so the | base conversion will dominate the computation. | wyldfire wrote: | But the multiplier is unbound, though. Faster wouldn't help | in that case. | klyrs wrote: | Maybe we should limit the lengths of strings altogether. | 512k should be enough for anybody. | eugenekolo wrote: | Could they not have modified the `int` function to `int(thingy, | i_really_want_to_do_this=false)`? | | Edit: Looks like they added a python argument to increase the | limit. So if you really need this, I suppose you can search | around until you figure out why it's not working and pass the | correct argument to the python bin. | qbane wrote: | Yeah, we must prevent DoS at all costs. It seems that Python | should not have integers at arbitrary size for "performance" | reason in the beginning. Aren't int32/int64/int128 nice? Number | of operations are all bounded. We should stick to them. | kragen wrote: | This was Python's behavior until Python 2; `long`, the | arbitrary-precision integer, was a separate type, and `int` | arithmetic overflow caused a ValueError. One of the big changes | in Python 2 was to imitate the behavior of Smalltalk and (most) | Lisp by transparently overflowing `int` arithmetic to `long` | instead of requiring an explicit `long()` cast. Python 3 | eliminated the separate `long` type altogether. | | Having been bitten by the Smalltalk behavior, I am skeptical | that the Python 2 change was a good idea. | justinsaccount wrote: | From the linked bug.. | | > It takes about 50ms to parse an int string with 100,000 digits | and about 5sec for 1,000,000 digits. The float type, decimal | type, int.from_bytes(), and int() for binary bases 2, 4, 8, 16, | and 32 are not affected. | | Sure seems strange to set the limit to 4300. 50ms is not a DoS. | xani_ wrote: | balooning 2ms request to 50ms is absolutely a DoS | | that's only 20req/sec to fill a core of execution | schoen wrote: | If you need to make integers this big from decimal | representations, I guess you could still use gmpy2.mpz(), and | then either leave the result as an mpz object (which is generally | drop-in compatible with Python's int type, with the addition of | some optimized assembly implementations of arithmetic operations | and some additional methods), or convert it to a Python int by | calling int() on it. | blibble wrote: | new interpreter argument: -X | int_max_str_digits=number limit the size of int<->str | conversions. This helps avoid denial of service | attacks when parsing untrusted data. The default is | sys.int_info.default_max_str_digits. 0 disables. | | this should not be a runtime configuration setting, fix the | sodding algorithm to not be quadratic | | will we be getting PHP style magic quotes soon? that also | protects developers against untrusted input (bonus! this could be | configured too!) | | or an inability to pass strings into the regular expression | module? that can also cause DoS | | (what happened to Python?) | simonw wrote: | My understanding is that there is no algorithm for this that | isn't quadratic. | | Update: I may have understood incorrectly, see | https://github.com/python/cpython/issues/90716 | blibble wrote: | > My understanding is that there is no algorithm for this | that isn't quadratic. | | > If you know of one, the Python core development team would | love to hear about it! | | it's mentioned on the issue page that makes up the article... | | (before they closed it due to the "code of conduct") | [deleted] | jwilk wrote: | https://github.com/python/cpython/issues/95778 has more | information. | dang wrote: | Ok, we'll change to that from | https://pythoninsider.blogspot.com/2022/09/python- | releases-3.... Thanks! | | All: submitted title was "`int('1' * 4301)` will raise | ValueError starting with Python 3.10.7" and comments reference | that, so you might want to take a look at both URLs. | svet_0 wrote: | So now an unreasonable user input will crash my server instead of | slowing it down by 50ms. Great DoS mitigation! | Ukv wrote: | In addition to omnicognate's point, calling `int` on user input | would generally already expect a possible ValueError. | omnicognate wrote: | Your server crashes if a request fails? | xani_ wrote: | it does with this change where it didn't before. At the very | best you're still restarting the whole process instead of | just wasting a bit of time | fuckstick wrote: | Who uses a process per request for serving Python apps? | That must be very uncommon. Even if you use a worker pool | that isn't going to restart a whole process just because of | an errant exception in a request handler. | | Also as noted if your whole process crashes because of | errant input to int() you are beyond fucked in other ways. | aYsY4dDQ2NrcNzA wrote: | Then don't upgrade Python in your container? | progval wrote: | You should always catch ValueError when using int() on user | input, because that input may not be a valid number. | [deleted] | ridiculous_fish wrote: | Why is base 10 string -> int a quadratic algorithm? Are there no | faster ones that could be implemented? | blahedo wrote: | No, because 10 is not a power of 2, so any digit in the source | (base 10) can affect any digit in the result (base 2). | Converting from e.g. base 16 to base 2 is linear, because 16 is | a power of 2. | saghm wrote: | I was surprised to see this in a bugfix release since it seems | like a breaking change, but from reading, it seems that this was | considered a security vulnerability (specifically a DOS | opportunity) given the CVE status, so I imagine that | compatibility concerns were secondary here. This seems in line | with how other languages seem to do things from what I've seen; | semver is important, but in a sense not every change is equally | "breaking" to users, and breaking code that's unlikely to be | common and potentially is not behaving correctly in the first | place is not going to cause as much friction as most other types | of breaking changes. Put another way, if there's a valid security | concern, breaking things loudly for users forces them to double | check their usage of this sort of code and ensure that nothing | risky is going on. (I don't personally have enough domain | knowledge here to know if the security concern is actually valid | or not, but the decision to make this change in a patch release | seems like a reasonable conclusion to come to for people who | determine that it is a security concern). | bo1024 wrote: | From the link: | | > Everyone auditing all existing code for this, adding length | guards, and maintaining that practice everywhere is not feasible | nor is it what we deem the vast majority of our users want to do. | | It's hard not to read this as "we want to use untrusted input | everywhere with no consequences". Seems like we'll be kicking as | many issues under the rug as we're fixing with this change, | right? | bostik wrote: | I read it the other way round - untrusted input is used in | various places where doing such inline checks is prohibitively | tricky. The examples given are quite telling: json, xmlrpc, | logging. First two are everywhere in APIs. The third is just | ... everywhere. | | Are you really going to use a JSON or XML stream parser _first_ | before feeding it to the stdlib module? And one that does not | try to expand the read values to native types? As for logging, | that is certainly the place where you are not only expected, | but often required to use untrusted input. | | The fix feels like a heuristic and a compromise. None of the | [easily available] solutions are robust, solid or performant, | so someone picked an arbitrary threshold that should never be | hit in sane code. | | The linked issue mentions that GMP remains fast even in face of | absurdly big numbers. No surprise, the library is _literally_ | designed for it: MP stands for multi-precision (ie. big int and | friends). | adgjlsfhk1 wrote: | this would all make more sense if python was using a | reasonably fast string to int routine, but the one they are | using is asymptotically bad, and the limit they chose is | roughly a million times lower than it should have been. | rwmj wrote: | Did they consider doing tainting (like Perl)? Input strings are | marked as tainted and anything derived from them, except for | some specific operations that untaint strings. If you use a | tainted string for a security-sensitive operation then it | fails. http://perlmeme.org/howtos/secure_code/taint.html | Dylan16807 wrote: | It's easy for me not to read it that way! Converting to an | integer is a very good start for validating many kinds of | input. | machina_ex_deus wrote: | This is way too low, I've used RSA keys in base 10 with half the | size of this string. It corresponds to only 14,000 bit numbers, | there are 8192 bit keys. I'm pretty sure this will break some CTF | challenges. The limit should be in the millions at the very | least. | munch117 wrote: | It does seem very low. | | However, you shouldn't be passing million-digit numbers around | as (decimal) text. Even if you're not at risk of DOS attacks, | there's still the issue that it's very, very slow: | $ python3 -m timeit -s "s='1'*1000000" "i=int(s)" 1 | loop, best of 5: 5.77 sec per loop | | A ValueError alerting you to that fact could be considered a | service. | | Contrast and compare: $ python3 -m timeit -s | "s='1'*1000000" "i=int(s,16)" 200 loops, best of 5: | 1.45 msec per loop | adgjlsfhk1 wrote: | python being slow isn't news. that's not a reason for an | error. | nomel wrote: | > However, you shouldn't be passing million-digit numbers | around as (decimal) text | | This is about numbers that are thousands of digits, not | millions. Regardless, why not? What's the alternative that | supports easy exchange? If you stick it in some hexified | representation, you still have to parse text, and put it into | some non-machine-native number container. It's going to be | slow no matter what. | blibble wrote: | you can convert hex into binary directly without any | multiplications | munch117 wrote: | No, it's not going to be slow no matter what. Didn't you | see my example? The hexadecimal non-machine-native textual | representation was 4000 times faster than the decimal | ditto. On a number that was much larger, I might add. | | Hex number parsing is linear time. | schoen wrote: | I could imagine people overlooking that little "m" in | your example's output! | nomel wrote: | Indeed I did! | im3w1l wrote: | This will break correct code for a fairly small benefit. I don't | think they should do this in a patch release. | [deleted] | [deleted] | gfd wrote: | Why did they close the discussion due to code of conduct? I | didn't see anything wrong with the previous comments before that | point. | klodolph wrote: | > As a reminder to everybody the Python Community Code Of | Conduct applies here. | | > Closing. This is fixed. We'll open new issues for any follow | up work necessary. | | The issue was marked closed, because the associated work was | completed and the PR was merged. The same comment happened to | mention the code of conduct, but the code of conduct wasn't why | the issue was closed--it was just because the work was done. | | I think the comment mentioned the CoC because the previous | comment, "This is appalling" was a bit rude. | Delk wrote: | > I think the comment mentioned the CoC because the previous | comment, "This is appalling" was a bit rude. | | The previous comment was indeed a bit rude. I personally | wouldn't think it was rude enough to invoke a code of | conduct. | | Even just referring to a code of conduct has, IMO, a rather | strong vibe of policing and perhaps even an implication of | wrongdoing, more so than merely a suggestion to keep it calm. | | I don't know the culture or context of Python development | (either the language or CPython), but I'm inclined to agree | with gdf that it's a bit weird to start reminding people of a | CoC because of a slightly rude sentence or two, especially | since the rest of the comment was reasonable technical | argumentation even if unapologetic. | | Even if closing the issue were entirely because of other | reasons and benign (someone did still reference the issue in | a commit later, though), it's all too easy to see the issue- | closing comment as shutting out dissenting opinions, either | because of a somewhat unpleasantly expressed argument or | simply because "this is fixed, no further discussion needed". | | The "this is appalling" comment may have been a bit rude but | the closing one wasn't exactly a triumph in communication | either. | Guthur wrote: | "This is appalling" is not even remotely rude, honestly are | we all children now? | blibble wrote: | your new comment violates the PSF "code of conduct" too! | | this particular wording could be used to ban any | criticism of contributions (regardless of the criticism's | correctness): | | > Being respectful. We're respectful of others, their | positions, their skills, their commitments, and their | efforts. | | in this sort of environment I guess it's far from | surprising that the technical decisions are suffering (to | put it politely) | klodolph wrote: | > Even just referring to a code of conduct has, IMO, a | rather strong vibe of policing and perhaps even an | implication of wrongdoing, more so than merely a suggestion | to keep it calm. | | I'd say the opposite. A suggestion to "keep it calm" is | inappropriate, because it carries the implication that | someone is not calm. This is inappropriate because it is a | comment on a person's emotional state rather than on what | they say or how they say it. | | In fact, if someone on my team said to "keep it calm", I'd | take that person aside and explain, in private, the reasons | why not to say that. | | > Even if closing the issue were entirely because of other | reasons and benign (someone did still reference the issue | in a commit later, though), it's all too easy to see the | issue-closing comment as shutting out dissenting opinions, | [...] | | If somebody thought that closing the issue shut out | dissenting opinions, then that person has forgotten how | GitHub issues work or how bug trackers work in general. | Closing an issue just means that someone thinks that the | work on it is done; it does not stop discussion on the | issue. I can see why someone might forget and not realize | that the issue was closed and _not_ the discussion, but I | don 't think that it's a problem that someone visiting the | bug from HN would forget how GitHub issues work for a | minute. | | With any online community above a certain size, there's a | certain amount of policing not just of what is said, but | where people have discussions. Anyone who regularly uses a | forum, Subreddit, Discord server, IRC, Slack, etc. will see | this pattern of behavior everywhere. For example--the | discussion about whether this is the right way to fix a bug | is a discussion which should be held elsewhere, where | people can see the context and interested parties can | respond to it. | | Which is why there is a comment at the bottom, | | > Please redirect further discussion to discuss.python.org. | | It's crystal clear to me that this is not about shutting | out dissenting voices, but just saying that this GitHub | issue is the wrong place for this discussion. | | You can see that there is a related issue which was closed, | but there was a lot of discussion afterwards--but because | the discussion was on-topic, the issue was not locked. | | https://github.com/python/cpython/issues/90716 | Delk wrote: | > I'd say the opposite. A suggestion to "keep it calm" is | inappropriate, because it carries the implication that | someone is not calm. | | Perhaps a suggestion to "keep it calm" wouldn't be the | best. English isn't my first language and my verbal | expression isn't always the greatest. But referring to a | code of conduct does also carry the implication that | someone isn't minding that code, and I don't see how that | would necessarily be better. | | In my view, suggesting that someone isn't calm is less of | a reprimand than suggesting they might be in breach of a | code of conduct which, among other things, includes rules | against outright harassment and other clearly | reprehensible behaviour. It's normal to not be calm at | times; it's another thing if someone needs to be reminded | of the rules of a community. Perhaps it's a cultural | thing but to me the latter is stronger judgement. | | There may well be reasons for not saying to keep it calm | (it sometimes simply doesn't work), but I can equally | well see how people might see a reference to a CoC as | strong-armed. | | > If somebody thought that closing the issue shut out | dissenting opinions, then that person has forgotten how | GitHub issues work or how bug trackers work in general. | Closing an issue just means that someone thinks that the | work on it is done; it does not stop discussion on the | issue. | | That's fair enough. Perhaps the intention is clear enough | within the community that it would indeed be deemed as | simply closing that rather specific GitHub issue without | implying that the matter is closed. | | Human communication isn't always quite that simple, | though. People get impressions from the way things are | expressed. "This is fixed." makes it feel that there is | nothing to be discussed about that particular change and | that it is final. | | I don't know the particular community well enough to know | how it would be interpreted, though. | | > Which is why there is a comment at the bottom, | | >> Please redirect further discussion to | discuss.python.org. | | That's after the comment that closed the issue. Had it | been in the issue-closing comment, that would have left a | different taste to the closing. | googlryas wrote: | For anyone wondering, '1' * 4301 creates a string of '11111....' | 4301 characters long. It doesn't result in an integer value of | 4301 like in some other languages. | | I find this a strange modification to the language, though | probably not a particularly painful one. Has python saved you | from yourself when dealing with non-linear built-in algorithms | before? IIRC it is also possible to have the regex engine take an | inordinate amount of time for certain matching concepts(I think | stackoverflow was affected by this?), but the engine wasn't | hobbled to throw in those cases, it is merely up to the user to | write efficient regex that aren't subject to those problems. | ffhhj wrote: | They should have made the analogous inverse operation: '1234' / | 2 = ['12', '34'] | bsdz wrote: | I was more expecting '1111' / 4 = '1'. This would be the | inverse operation. However, it opens up even more questions | like what to do if your string has mixed values etc | ffhhj wrote: | The string multiplication is about _joining_ strings, the | inverse is about _splitting_ them in several parts. It's | only confusing because the * appends the string to itself, | the / is actually very clear. | dekhn wrote: | Disagree. The inverse "string" * value is logically | splitting, _and then collapsing the repeated values_. The | logical split can be omitted, but the collapsing cannot. | [deleted] | tremon wrote: | That's not the inverse of the multiplication though. The | inverse would be '33' / 2 = '3', and '1234'/2 should then | probably raise a ValueError. | hyperpape wrote: | Backtracking regular expressions as an intentional or | accidental DOS vector are a moderately well-known issue, and | while I prefer that a standard library implementation be robust | against them, I can see the POV that it's buyer beware. | | Converting a string to an integer is somewhat less well known | as a DOS vector, more painful to avoid as an application | creator, and easier to fix in code. | | So there's a cost-benefit argument that you should just do this | before you rewrite your regex engine. | masklinn wrote: | > I can see the POV that it's buyer beware. | | On the other hands, lots of buyers are not aware that it's an | issue, and more frustratingly there are regex engines which | are very resilient to it... but are not widely used. | | Python's stdlib will fall over on any exponential | backtracking pattern, but last time I tried to make postgres | fall over I didn't succeed. Even though it does have | lookahead, lookbehind, and backrefs, so should be sensible to | the issue (aka it's not a pure DFA). | bo1024 wrote: | This does seem like a strange level of handholding, even if the | motivation makes lots of sense. If you start going down the | road of protecting people who don't sanitize user input, you | may have quite a long journey ahead... | mjevans wrote: | Operator overloading sure seems to increase the prevalence of | foot-guns, security issues, and other gotchas. | | str.ccClone(4301) # ConCatenate Clones of the source string N | times. | | Would even an abbreviated, named, function not be more self | documenting and better for human and machine reviews? | proto_lambda wrote: | Other than that being a terrible name (it's almost impossible | to be sure what it does without consulting documentation), I | personally do prefer fewer implicit/overloaded operations. | mjevans wrote: | What name would you suggest? That was my 5 min of thought | version. | | cc prefix for concatenate because that word is very long | and it seemed likely that strings may have a large number | of different concatenation focused functions that could all | share the prefix. | | Clone as the type of concatenation operation to perform. | proto_lambda wrote: | Rust uses `repeat()`, which sounds much more descriptive | to me. The types in the function signature make the | "clone" part of the name redundant. | mjevans wrote: | Offhand, is repeat(0) an empty string, repeat(1) the | input string, etc? If so that's a great name for the | function. | pezezin wrote: | Repeat is an iterator, so you can apply it to any type | you want, not just strings. You can chain it with other | iterators, or collect it into some data structure. But | yes, repeat(0) returns an empty iterator. | | https://doc.rust-lang.org/std/iter/fn.repeat.html | slaymaker1907 wrote: | I think how Rust does it is fine, but I agree operators are | often a mess. Yesterday I was looking at a memory dump where | there was a problem in a destructor (a double free was | detected) and it was an absolute mess trying to figure out | the exact execution location in source code since it was | setting the value of a smart pointer which triggered a | decrement of a reference counted value in turn triggering a | free. It's junk like that which starts to convince me that | Linus was right to avoid C++. Rust obviously also has | destructors, but it doesn't have the nightmare that is | inheritance+function overloading+implicit casting. | cma wrote: | > and it was an absolute mess trying to figure out the | exact execution location in source code since it was | setting the value of a smart pointer which triggered a | decrement of a reference counted value in turn triggering a | free. | | Isn't all that context there in the stack trace? | jlarocco wrote: | Yes, probably. Depends on the compiler settings. Stuff | can get optimized out and stripped. | | When writing the code in the first place, though, it's | difficult to see problems like that because it's all | hidden behind magic calls to copy constructors, move | semantics, and destructor calls. Out of sight, out of | mind. | DSMan195276 wrote: | I think it's separate from his point but some of those | things could potentially be tail calls, meaning the | functions actually leading to the free/delete might not | be in the stacktrace even if they were called. | UncleEntity wrote: | It is really useful sugar for: for _ in | range(4301): llama.append('1') | | (there's probably an easier way to do that but you get the | point) | | where python can see both sides of the operation and optimize | it on the C side of things. | | The issue really has nothing to do with that though, it is | converting a string to an int which is the whole point of the | security update. | Gordonjcp wrote: | > Operator overloading sure seems to increase the prevalence | of foot-guns, security issues, and other gotchas. | | How exactly? What would you expect an expression like ('1' * | 4301) to give you, and why would you think it would be | different from ('caterpillar' * 4301)? | qayxc wrote: | Well, let's assume that the "expected" behaviour holds, | shall we? Let's open up a python REPL and try | >>> 'caterpillar' * 2 'caterpillarcaterpillar' | | OK, now for something different: >>> [1, 2, | 3] * 2 [1, 2, 3, 1, 2, 3] | | Marvellous! How about this then: >>> True * | 2 2 | | Wait, what? Hm. >>> False * 2 0 | | Whoops! Implicit type conversion takes place... Even worse: | >>> 'abc' + 'efg' 'acbefg' >>> 'efg' + 'abc' | 'efgabc' | | Now I'm stumped. Isn't addition supposed to be commutative? | | So yeah, without contracts in place, operator overloading | is BAD. You can never know what the operator does, or what | its properties are by just looking at how it's used. | There's simply no enforced rules and so no-one's stopping | you from doing >>> class Complex: | def __init__(self, real, imag): self.real = real | self.imag = imag def __add__(self, other): | return Complex(self.real - other.real, self.imag - | other.imag) def __repr__(self): | return f'Complex({self.real}+{self.imag})' >>> | x = Complex(1, 2) >>> y = Complex(1, 2) >>> x | + y Complex(0+0j) | | Now this intentionally being malicious of course, but | plenty of libraries overload operators in non-intuitive | ways so that the operator's properties and behaviour isn't | obvious. This is especially true if commutative operators | are implemented as being non-commutative (e.g. abusing '+' | for concatenation instead of using another symbol like '&' | for example) or if the behaviour changes depending on the | order of operands. | samatman wrote: | In Lua, the first is 4301 and the second is a runtime | error. ('1' .. 4301) is 14301, the equivalent of the weird | thing Python is fixing would be spelled | `tonumber(('1'):rep(4301))` which is obviously wrong. | | To my taste operator overloading is fine, but concatenation | isn't addition, so they shouldn't be overloaded because... | [gestures vaguely at a half dozen language] | im3w1l wrote: | Succinct string operations is honestly like half of what I | use python for and the great numeric support with bignum by | default and powerful libraries with overloads like numpy and | tensorflow is the other half. | jejones3141 wrote: | In Algol 68, you can do that; it's part of the standard | prelude. I think that some people who'd worked on Algol 68 in | the Netherlands also worked on the ABC language, where it's "1" | ^^ 4301, and Guido worked on ABC before Python. | gsliepen wrote: | Well, in C++ int('1' * 4301) is a perfectly valid expression, | but it evaluates to 210749, not 4301. | oldgradstudent wrote: | Or some other value. | | If sizeof(int)=2, the result is undefined. | gsliepen wrote: | Not if CHAR_BIT is 10 or more! | oldgradstudent wrote: | I wonder how much software will fail on platforms where | CHAR_BIT is not 8. | eMSF wrote: | Whether evaluating that expression results in undefined | behaviour also depends on the basic execution character set | and the bit width of the machine byte. | dark-star wrote: | it doesn't evaluate to 4301 in Python either ;-) | Phil_Latio wrote: | What's next? A default socket timeout of X seconds for security | reasons? What a joke and rather scary that apparently everyone or | the majority on the internal side agrees with this change. | linspace wrote: | I find it completely unpythonic. Python has become too | important to do the right thing, there is money on the table. | LtWorf wrote: | I think python is now completely owned by a couple big | companies that decide everything. | | By this logic they should also block me from running benchmarks | on too big lists, because I'm dossing myself. | krick wrote: | This. I don't really understand CPython decision-making | process, but it just seems like a common sense that anybody who | would find this a good idea surely must be a very junior | developer who shouldn't be allowed to commit directly to the | master branch of your local corporate project just yet... But | basically breaking a perfectly logical behaviour just like that | in a language used by millions of people... To me it's | absolutely shocking. | loeg wrote: | Will Python's relentless campaign to break backwards | compatibility never end? (80% sarcastic.) | klyrs wrote: | Don't worry, it's a minor release. (110% sarcastic) | tremon wrote: | It's a patch release, not even minor (100% serious). | mywittyname wrote: | What should you use instead if you want the original | functionality? | Veedrac wrote: | https://docs.python.org/3/library/stdtypes.html#configuring-... | mywittyname wrote: | If I'm understanding this correctly: the only way to convert | an extremely large base10 string to an integer using the | standard library is to muck with global interpreter settings? | | It seems short sighted to not provide some function that | mimics legacy functionality exactly. Even if it is something | like int.parse_string_unlimited(). Especially since a random | library can just set the cap to 0 and side-step the problem | entirely. | Someone wrote: | > Especially since a random library can just set the cap to | 0 and side-step the problem entirely. | | Until another random library sets it to its preferred value | (see https://news.ycombinator.com/item?id=32738206 for a | similar issue with a CPU flag for supporting IEEE | subnormals) | | We might end up with libraries that keep setting that | global to the value they need on every call into them. | mywittyname wrote: | Oh fun. Just what Python needs more of, this... | try: value = int(value_to_parse) | except ValueError: import sys | __old_int_max_str_digits = sys.get_int_max_str_digits() | sys.set_int_max_str_digits(0) value = | int(value_to_parse) | sys.set_int_max_str_digits(__old_int_max_str_digits) | | Or maybe just this: class | UnboundedIntParsing: def __enter__(self): | self.__old_int_max_str_digits = | sys.get_int_max_str_digits() return self | def __exit__(self, *args): | sys.set_int_max_str_digits(self.__old_int_max_str_digits) | with UnboundedIntParsing as uip: value = | int(str_value) | dmurray wrote: | Needs to be made thread safe! | js2 wrote: | 4300 digits? | | > Chosen such that this isn't wildly slow on modern hardware and | so that everyone's existing deployed numpy test suite passes | before https://github.com/numpy/numpy/issues/22098 is widely | available. | | https://github.com/python/cpython/blob/511ca9452033ef95bc7d7... ___________________________________________________________________ (page generated 2022-09-07 23:01 UTC)