[HN Gopher] 5% of 666 Python repos had comma typo bugs (inc V8, ... ___________________________________________________________________ 5% of 666 Python repos had comma typo bugs (inc V8, TensorFlow and PyTorch) Author : rikatee Score : 238 points Date : 2022-01-07 17:00 UTC (6 hours ago) (HTM) web link (codereviewdoctor.medium.com) (TXT) w3m dump (codereviewdoctor.medium.com) | usrbinbash wrote: | Literally the second item in the "Zen of Python" | (https://www.python.org/dev/peps/pep-0020/): | | _Explicit is better than implicit._ | | And yet, s = ["one", "two" "three"] will implicitly and silently | do something, that is probably wrong most of the time. | dpedu wrote: | Hmmm, it sounds like you're expecting "two" and "three" to be | separate list elements because of some sort of implicit | behavior due to being written in a list context. This is the | opposite of what "Explicit is better than implicit" means. | | This is a list and you must explicitly place a comma when you | want to start a new element in the list. Is there ever a time a | new element follows a previous one and is NOT separated by a | comma? No, this is explicit. | | Whereas, strings also always concatenate in this manner be it | in a list context or not. It seems like you're assuming | behaviors from other languages would be the same in another. | matsemann wrote: | No, we don't want it to implicitly be a list item. We want it | to fail as invalid syntax. If I wanted the two and three | strings to be combined, I would have /explicitly/ used an | operator for that. It's the implicit behavior of that which | is the problem. | fantod wrote: | Not to mention the implicit string concatenation that you | get instead. | ReleaseCandidat wrote: | > it sounds like you're expecting "two" and "three" to be | separate list elements | | I'd expect that to be an error. | lijogdfljk wrote: | Funny enough, in dynamic languages i expect it to do | something unexpected and unwanted. | | This is why i like Go/Rust. I detest the implicit warts of | these languages. | wott wrote: | It's not related to being dynamic or not, it's a | syntactical choice: that's also the way to concatenate | string literals in C. | ReleaseCandidat wrote: | Well, there are dynamic languages and dynamic languages. | There are Python and Ruby and there are Elixir, Erlang | and Lisps. | aylmao wrote: | Ah yes, why would anyone expect lists' main purpose to be | listing? | | Sarcasm aside, I'd assume people primarily _list_ things in | between [ and ], and _sometimes_ concatenate things in there | too. The language should err on the side of doing what people | expect, unless explicitly told not to. | | > It seems like you're assuming behaviors from other | languages would be the same in another. | | Rather, I think people expect a language, especially one this | big and important, to work for them, and not to be designed | with unergonomic features instead. | twobitshifter wrote: | I could see lisp programmers missing the commas out of muscle | memory | doubleunplussed wrote: | Your sarcasm is misplaced. I would prefer a SyntaxError to | either of the implicit behaviours. | bokchoi wrote: | I'm not a python programmer, but the implicit string | concatenation seems surprising to me. | Dudeman112 wrote: | I'm not a python programmer either, but I would be | _seriously_ annoyed at implicit anything instead of syntax | error | kevin_thibedeau wrote: | It's idiomatic in C. | rat9988 wrote: | This is not what implicit is about. | ianbicking wrote: | Implicit concatenation sure seems implicit to me | suifbwish wrote: | Implicit things are rarely nice in code for production | environments. It makes bug tracing and security much more | complicated | oaiey wrote: | This is indeed the point. Some use cases are amazing and | increase quality while others are just pure evil. | [deleted] | jstx1 wrote: | I mean the zen being wrong is kind of a meme at this point. The | whole "only one obvious way to do it" isn't just false but the | exact opposite is true. Python is one of the most flexible | languages with many many ways to do the same thing; more than | any other language I can think of. | lenkite wrote: | Python finally ended up following Perl's TMTOWTDI motto! http | s://en.wikipedia.org/wiki/There%27s_more_than_one_way_to... | egeozcan wrote: | > Complex is better than complicated | | What? Something being complex is artificial, we try to avoid | it. Problems can be complicated, we try to simplify them, and | more complicated the problem is, we tend to develop more | complex solutions. So comparing them does not make sense? | | Or did I always know them wrong? | stonemetal12 wrote: | Complex: consisting of many different and connected parts. | | Complicated: consisting of many interconnecting parts or | elements; intricate. | | Nothing specifically artificial about either one. Software | that is well decomposed is Complex (made of many smaller | connected parts). Software that is is poorly decomposed is | Complicated (made of many smaller interconnected parts). | | Connected vs interconnected? | | Interconnected: connected at multiple points or levels (aka | spaghetti code) | dylan604 wrote: | Complicated: this mutha is hard all by itself | | Complex: we took all of these simple steps, lumped them | together, now we have this | hibrass wrote: | I first encountered the notion of complex/complicated in | Antifragile I believe, and IIRC it's based on the [Cynefin | framework](https://en.wikipedia.org/wiki/Cynefin_framework) | . | | My understanding is that: * Complex domains lend themselves | to experimentation and emergent behavior. * Complicated | domains lend themselves to analysis, expertise, and rule | following. | | The Wikipedia article offers the domains as containing | "unknown unknowns" and "known unknowns" respectively. | | I'm trying to think how this maps to Python -- the language | is complicated, while the problems we're solving are | expected to be complex? Or, maybe, the language lives at | the boundary between complicated and complex. We push | complicated procedures into the language, and let the | programmers deal with complex issues? | nighthawk454 wrote: | It's not particularly well-worded. A lot of dictionaries | list complex/complicated as synonyms. | | I always took it to mean 'complex' as in having many | connected parts, and 'complicated' more as in over- | complicated or convoluted - the opposite of 'simple'. In | other words, breaking something complicated into a system | of intentionally-designed pieces is probably better than a | chunk of opaque code to brute-force the current case. A | good system is probably also 'simpler', despite having more | pieces and interconnects. | pmarreck wrote: | Except exit. | | I knew Python wasn't for me in my first foray into it when I | fired its REPL and then went to exit it with control-C or | whatever and it _literally printed out the right way to do it | but then didn 't do it._ Python was more interested in having | me do things a certain way _even when it knew what I intended | to do, just to be a twit_. | jrockway wrote: | The REPL prints the value of a variable that you type in. | exit is a variable, and so the REPL prints its value. If | you want to run it as a function, you can do that, and | indeed its string value is a message telling you to do | that. $ python3 Python 3.9.2 | (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 | 20210110] on linux Type "help", "copyright", | "credits" or "license" for more information. >>> | exit Use exit() or Ctrl-D (i.e. EOF) to exit | >>> exit.eof 'Ctrl-D (i.e. EOF)' >>> | exit.name 'exit' >>> exit = 42 >>> | exit 42 >>> exit() Traceback (most | recent call last): File "<stdin>", line 1, in | <module> TypeError: 'int' object is not callable | >>> | | I would have special-cased exit, though. | animal_spirits wrote: | Ctrl-c raises a KeyboardInterrupt error, which is useful | for programs to catch. If you type >>> | exit Use exit() or Ctrl-D (i.e. EOF) to exit | | You will get that error response. The goal of this is to | have the REPL language the exact same as the scripting | language. exit() is supposed to be called as a function to | make the language more consistent, so just typing `exit` | will do nothing | solox3 wrote: | Notice that, in the original quote, There | should be one-- and preferably only one --obvious way to do | it. | | the author used two different ways of hyphenating (three, if | you count the whole PEP 20). PEP 20 is clearly not meant to | be taken as law. Nor PEP 8. Nor PEP 257. | | People frequently mistake "one obvious way" with "one way". | There are lots of ways to iterate through something, for | example, but there is really one obvious way. And the | philosophy here still applies: when you read anyone else's | python code, the obvious way is probably doing the obvious | thing. I think that is the more appropriate takeaway from PEP | 20. | jstx1 wrote: | > And the philosophy here still applies: when you read | anyone else's python code, the obvious way is probably | doing the obvious thing. | | I don't get what you mean by this. | | When I read someone else's code, what is obvious to me | isn't necessarily what was obvious to the author. For an | illustration of this, have a look at the day 1 solution | thread from this year's Advent of Code - https://www.reddit | .com/r/adventofcode/comments/r66vow/2021_d... (you can | search for Python solutions) - and see how many different | ways there are to solve a fairly straightforward problem. | srcreigh wrote: | The author uses "only one" to clarify "one". So obviously | "one" means at least one. There should be | at least one-- preferably only one --obvious way to do it. | | Kinda funny meta joke considering everybody conflates "one" | and "only one" to mean the same thing. Preferably there | would only be one obvious way to describe "one". :p | bilalq wrote: | It's not even obvious how to run Python or dependencies in | the first place. Even putting aside the 2.7/3.x fiasco | (that still causes problems even today), you're left with | figuring out wheel vs egg vs easy-install vs setuptools vs | poetry vs pip vs pip3 vs pip3.7 vs pip3.8 vs piptools vs | conda vs anaconda vs miniconda vs virtualenv vs pyenv vs | pipenv vs pyflow. | dylan604 wrote: | it's like you read my mind. | JasonFruit wrote: | I suspect this comment was an elaborate nerdsnipe. | dragonwriter wrote: | > the author used two different ways of hyphenating | | No, first, it doesn't use _hyphenating_ at all, it uses | hyphens as an ASCII approximation for typographical dashes | used to set off a phrase (a distinct function from | hyphenation), and, second, in that quote they used one way | of doing it: "two dashes set closed on the side of the main | sentence and set open on the side of set-off phrase". | | It is an _unusual_ way of doing it--just as with actual | typographical dashes, setting open or closed symmetrically | would be more common--but it 's not two ways. | | EDIT: And the third use (in the heading and later in the | body) is seperating parts where neither is a mid-sentence | appositive phrase, and uses open-on-both sides. So that's | not a different way of doing the _same_ thing, it 's a | different way of doing a semantically different thing. | | Actually, I think the dash use makes a good illustration of | how the "it" in "one way to do it" is intended. | Wowfunhappy wrote: | > "two dashes set closed on the side of the main sentence | and set open on the side of set-off phrase". | | Eh, I don't think that's the interpretation the author | was going for. The author wanted to show two different | ways of approximating a dash, and he had limited options. | | If he'd done this-- for example-- he would have been | showing one way, not two. | | If he'd done this --for example-- you would have called | it "two dashes set open on the side of the main sentence | and set closed on the side of set-off phrase". | | If he'd done this-- for example -- it would have been too | obvious (on the same line). | | I _suppose_ he could have done this-- for example--but I | still think that would have been too obvious. You 're not | supposed to see it on a first read. | | > And the third use (in the heading and later in the | body) is seperating parts where neither is a mid-sentence | appositive phrase, and uses open-on-both sides. So that's | not a different way of doing the same thing, it's a | different way of doing a semantically different thing. | | It's a different use of a dash, but it's still a place | where you'd typically use a dash. | | ----- | | Edit: You know what, thinking about it again--perhaps | both interpretations are valid. That almost adds to the | effectiveness of the whole thing. | xigoi wrote: | I can think of at least 2 obvious ways to iterate through | something: for loops and comprehensions. | voltagedivider wrote: | You're right that both iterate through something but | `for` loops and comprehensions aren't used as if they | were interchangeable. | | For example, you'll sometimes see people do bad stuff | like this: >>> lst = [] >>> | >>> [lst.append(i + i) for i in range(10)] [None, | None, None, None, None, None, None, None, None, None] | >>> >>> lst [0, 2, 4, 6, 8, 10, 12, 14, 16, | 18] >>> | | When they should be doing this: >>> lst = | [] >>> >>> for i in range(10): ... | lst.append(i + i) ... >>> lst [0, 2, | 4, 6, 8, 10, 12, 14, 16, 18] >>> | | Or just this: >>> lst = [i + i for i in | range(10)] >>> >>> lst [0, 2, 4, 6, 8, | 10, 12, 14, 16, 18] >>> | a_t48 wrote: | lst = [range(0, 10, 2)] | orlp wrote: | That's wrong in multiple ways. You want | lst = list(range(0, 20, 2)) | a_t48 wrote: | Ohh, yeah, you're right. | jstx1 wrote: | The first append version will more often be in a loop. | It's unlikely that someone will know enough to use | comprehensions but not enough to still use append. | voltagedivider wrote: | Agreed. I've mainly seen the first `append` version in | code written by people who've just discovered | comprehensions and code golf. | dragonwriter wrote: | To _generate a list /dictionary/geneator from an input | iterable_, you use a comprehension of the appropriate | type. | | To iterate through it _without_ doing one of those | things, you use a for loop. | | In "one obvious way to do it", "it" refers to a concrete | task; the same is not necessarily intended to be true of | arbitrarily broad generalizations of _classes_ of tasks. | Quekid5 wrote: | It's sort of like the Unix Philosophy. It sounds good and is | probably a good thing to strive for generally, but it's | ultimately pointless when it comes to actually evaluating | whether approach A is better than approach B. | fault1 wrote: | the zen of python was written in the 90s. | | from that context it makes sense, because the only goal of | python in the 1990s was to be more popular than perl, which | was notorious in having many ways of doing the same thing. | | but yeah, python had had significant feature creep over the | years, it's nowhere near the small clear lang it used to be. | andi999 wrote: | And still no expressive switch/case statement, breaking out | of loops and ending scripts early (for explorative | programming). | jstx1 wrote: | > And still no expressive switch/case statement | | There's match/case in 3.10 - | https://www.python.org/dev/peps/pep-0636/ | gabagool wrote: | >no expressive switch/case statement | | match/case (not a drop in switch statement) | | >breaking out of loops break | | >ending scripts early (for explorative programming) | exit() or sys.exit() | [deleted] | oblvious-earth wrote: | It was a meme when Zen was written, the spaces around the em | dash are handled 3 different ways. Twice in the line you | abbreviated, removing the joke. | savant_penguin wrote: | Matplotlib is an example of a library with at least two | "correct" ways of plotting | jstx1 wrote: | But only one of them is recommended - the one that makes | less sense. | jofer wrote: | How is working with figure and axes objects the one that | makes less sense? | | Is it really that crazy do set up a figure, axes on that | figure, and plot on the axes, returning an artist object | for each plotting command? | fault1 wrote: | one is more or less based on matlab's plotting | procedures, the other is an attempt at a cogent | implementation of a OOP implementation. However, the OOP | paradigm just doesn't seem very good for plotting. | | Personally, I like plotting in R way better than in | python. It has a lot better developer UX. | jstx1 wrote: | Yes, it is crazy. I guess this isn't really the place for | it but ... From the official docs: The | Figure is the final image that may contain 1 or more | Axes. The Axes represent an individual plot | (don't confuse this with the word "axis", which refers to | the x/y axis of a plot). | | This is infuriatingly bad and I firmly believe that it | makes sense only to people who already know how it works. | There's an image, axes (this word alone is a crime), | plot, figure... it's like they took a bunch of synonyms | and arranged them randomly to put together an API. | dylan604 wrote: | >axes (this word alone is a crime), | | why so? you prefer something like axiis? | jstx1 wrote: | See, that's the thing: | | > Axes object is the region of the image with the data | space. | | In matplotlib axes is not the plural of axis. It has its | own meaning specific to the API. And at the same time | it's the plural form of another word (axis) which is also | relevant in this context and it sounds almost identical | when pronounced. | marcosdumay wrote: | I dunno. One sets global values everywhere, then collects | them all into a plot. The other creates a bunch of | apparently disconnected objects, sets a bunch of | different attributes on each one, and then gets the plot | from one of those objects. | | If I was designing something like it, I wouldn't | recommend either. The global one has many fewer WTFs per | character, but the objects one looks like it works in a | multithreaded program or that you can create more than | one plot without displaying them (but I've never tested | this). | andi999 wrote: | Which two ways? | jstx1 wrote: | Object-oriented vs Pyplot - | https://matplotlib.org/matplotblog/posts/pyplot-vs- | object-or... | webmaven wrote: | _> I mean the zen being wrong is kind of a meme at this | point. The whole "only one obvious way to do it" isn't just | false but the exact opposite is true. Python is one of the | most flexible languages with many many ways to do the same | thing; more than any other language I can think of._ | | Not in comparison to Perl, which usually has multiple ways to | do anything, each 'obvious' to different sets of people (each | Perl codebase therefore seems to have a distinct dialect | based on which 'obvious' alternatives are chosen). | | The other direction languages can take that is being | contrasted, is there being one non-obvious way to do | something. | | Python's 'most obvious way' isn't necessarily the | fastest/most concise/most efficient/scalable/etc. way to do | something in Python, but it will usually be obvious to most | Python developers. And although broad styles have certainly | developed over time (imperative, functional, OO) as Python | has gained power and flexibility, the dictum still largely | holds true. | hnlmorg wrote: | 10 years ago I'd have agreed with you. But Perl has gone a | long way in pulling back from some of that insanity while | Python has been giving C++ a run for it's money in terms of | features. | onphonenow wrote: | I'd totally agree - there's been a burst of sort of the | perl style stuff (:= ?) to gain relatively small wins. | | ie, instead of | | for line in lines: print(line) | | we are supposed to be using | | while line := f.readline(): print(line) | | I've not been super impressed with this type of thing. | | That said, string formatting is better with f strings. | | They also rolled back some the forced breakage from | trying to force unicode with 3 which made a big | difference. 3.3 added back u'' | | Lots of good cleanups lstrip vs removeprefix etc. | | Underscores in numeric literals (10000000 vs 10_000_000) | | So lots of good stuff still landing. | dragonwriter wrote: | > ie, instead of | | > for line in lines: print(line) | | > we are supposed to be using | | > while line := f.readline(): print(line) | | No, we're not. Walrus, in loops, IME, is more for | replacing this pattern: while True: | myvar = get_it() if not ok(myvar): | break # code that uses myvar | | with this pattern: while ok(myvar := get- | it()): # code that uses myvar | onphonenow wrote: | False. I have been harshly attacked here on HN for | suggesting things like for line in lines - literally been | called "stupid". | | I'm not the only one who looked at the recommended | examples of the use case here and went, huh? | | https://news.ycombinator.com/item?id=17450890 | | Recommended new way: if any(len(longline | := line) >= 100 for line in lines): | print("Extremely long line:", longline) | | Old way: for line in lines: | if len(line) >= 100: print("Extremely | long line:", line) break | | I prefer the old way. These were examples in the PEP! | | In your example get_it() might be better as a generator | or iterable. A lot of code looks great if you push that | type of thing down a bit, and sometimes memory is helped | as well. Then you iterate over it, for values in get_it. | This keeps python very natural. You start to get a lot of | weird line noise type code with := vs the old python | style which while a bit longer was basically psudo-code. | oaiey wrote: | I am a bit in shock. Accidental string concatenation. Python just | lost a lot of reputation in my brain. | atleta wrote: | Not sure if it's irony or not. After all, this is not really | accidental string concatenation but an easy to make type error | which can go undetected due to the dynamic typing (and the lack | of thorough type annotation in most code). | | The string concatenation in itself should not be a problem as | it's really just string constants. (But again, it might be | irony exactly because of this :) ) | ErikCorry wrote: | In most languages an array with 3 elements has the same type | as an array with 2 elements so the type system isn't going to | warn you about the difference between | | ("foo" "bar", "baz") | | and | | ("foo", "bar", "baz") | skitter wrote: | They still tend to differentiate between 2- and 3 element | tuples (but I agree that the implicit concatenation is | problematic). | atleta wrote: | Fair enough. I was only thinking about the str vs tuple | case. So when you have 2 elements in the parenthesis. | oaiey wrote: | Unfortunately no irony. | | I come from a programming platform (C#) where productivity is | a key element of language design. I highly doubt that Anders | Heijlsberg would have accepted such a error prone concept | like a literal free implicit operator on a key type like | strings. | atleta wrote: | Well, I guess it's true for most language that productivity | is intended to be a key element of design. (For python, | definitely. But I also remember James Gosling saying this | about Java.) This implicit concatenation seems to come | (inherited?) from C. | | I kind of remembered that some languages do support it for | braking strings into multiple lines conveniently. I'm a bit | surprised that it works even on line (I've never used it, | because why would have I), but you'll likely to make the | mistake on multiline statements anyway. I've also checked | and it doesn't work in java (which I kind of remembered, | though I mostly do python these days). | amelius wrote: | > for braking strings into multiple lines conveniently | | What is inconvenient about just adding a + at the end or | beginning of the line? | ErikCorry wrote: | Misspelling a variable on the lhs of an assignment just causes | a new variable to be created with the new name. That's a lot | worse in my book. | version_five wrote: | I dont think that's the same kind of thing. Your example is a | tradeoff that anyone who uses a language that doesn't require | explicit variable declaration faces, and it's pretty tough to | argue such languages really shouldn't exist. | | Missing an operator resulting in explicit behavior is much | more subtle and not even obvious behavior. For those who use | python, it is worse. | SomeCallMeTim wrote: | > ...it's pretty tough to argue such languages really | shouldn't exist. | | "Shouldn't exist" is too strong. | | Dynamic languages that let you create a new variable via | assignment _shouldn 't be used to create non-trivial | software._ How about that? | | Scripting languages have a place. That place is 100% in | creating quick-and-dirty scripts and tools. Or in doing | some kind of one-off data transform (as is common in | machine learning scenarios). Anything that has a life span | of two weeks or less, or a code length of fewer than a | hundred lines? Yeah, script languages rock for that. | | Explicit/static typing adds vastly more value to large | projects than the cost of the overhead. The fact that you | _can 't_ really gain that value in Python means that Python | should be relegated to quick and dirty scripts. | | Same for JavaScript, Ruby, and other completely dynamic | languages. | | You'll note that all of these languages are getting types | one way or another, meaning that there are a lot of people | who do recognize their value. Though TypeScript is years | ahead of the rest in the completeness and sophisticated of | its type system; bugs like the comma bug detailed by OP, | along with simply _every_ JavaScript "wat" bug, simply | can't happen in TypeScript in strict mode. And static types | enables entire other categories of bugs to be detectable | via a linter as well. | simonw wrote: | I've been building non-trivial software in dynamic | languages for twenty years. They work great. | | I'd take a project in a dynamic language with a decent | test suite over a project without tests in a statically | typed language any day of the week. | shultays wrote: | it's pretty tough to argue such languages really shouldn't | exist | | Well, I agree with OP so that is at least two people. I | really don't see it as a good trade. | bilkow wrote: | Explicit variable declaration is just adding a keyword | (such as var or let) when you're declaring a new variable | instead of modifying one. | | The cognitive burden of having to memorize and look for | which variables are new vs which are being modified is | simply not worth it in my opinion, even for a scripting | language. Maybe for esolangs, simple math or first time | learning programming. | | In any case, it's a short coming of the language (IMO) but | not a deal breaker. We learn to live with it. | voltagedivider wrote: | Isn't that common for all/most languages that don't require | explicit typing? | xigoi wrote: | JavaScript (strict mode) doesn't have explicit typing, but | it still requires variables to be declared. | wott wrote: | Same for Perl. | samhw wrote: | It would be impossible in any language that requires either | explicit typing or some kind of 'let' keyword. (Or, in the | fringe case, a language like Go which uses a different | operator for initialisation-plus-assignment.) | voltagedivider wrote: | Exactly. That's why I asked about languages that _don 't_ | require explicit typing. My point is that it's a feature | of many languages rather than a Python idiosyncrasy. | dragonwriter wrote: | Declaration and explicit typing are logically orthogonal, | but few if any languages require typing but not | declaration. Lots require declaration but not typing. | ReleaseCandidat wrote: | I'd say unexpected behavior is always worse than expected | one. | | Yes, you'll certainly find somebody who doesn't know what | 'not statically typed' means, but ... And yes, there are also | C(++) users, that expect strings to be concatenated like | that. | obua wrote: | You seem to also not know what "not statically typed" | means. It certainly does not mean "not properly scoped". | ReleaseCandidat wrote: | Yes, of course. But you see that no scope keywords exist | in Python. But there exists `+` to concatenate strings | (too). | fragmede wrote: | Keywords like _namespace_ , no; but functions and classes | and modules provide for a lot of scoping opportunities. | ReleaseCandidat wrote: | The problem is `fop` should be `foo`: | foo = 5 fop = 6 | | Keywords like `let` solve this problem: | let foo = 5 fop = 6 # error | deathanatos wrote: | Not entirely: let foo = a(); let | foo = b(foo); let fop = c(foo); let foo = | d(foo); | | (Which is valid, e.g., in Rust.) | TheEzEzz wrote: | You do get a warning, though. And most Rust projects I've | seen usually adhere to 0 warnings. | ErikCorry wrote: | Or := for declaration like Go and Toit | ReleaseCandidat wrote: | Yes, or another symbol instead of `=` for assignment, | like `<-` (F#) | oaiey wrote: | While I agree, this is somehow something I expect. Implicit | string concatenation without operator or function around it | sounds just like a terrible idea. It breaks the basic syntax | concept of `foo X bar`. On the other hand it is probably very | handy with DSLs and things like that. | jstx1 wrote: | That's a complaint against the entire type system, nothing to | do with misspelling. | samhw wrote: | It has nothing to do with the type system? It's an issue | with implicit declaration. You could very easily require | explicit declaration while retaining the selfsame type | system. | jstx1 wrote: | Huh, you're right. It would be bizarre to see something | like this in Python though. I've never even thought of it | as being implicit declaration. | ehsankia wrote: | C/C++ has the exact same thing, no? | colpabar wrote: | I was going to comment something like "who would even use | this?" and then I remembered that I have in fact used that | feature :) It's a somewhat "nice" way to write long strings and | keep the code from getting too wide. I never did it inside an | array, but I found breaking up a long string into smaller ones | and wrapping them in parens without a comma was convenient, for | things like error messages. | | But that's just what comes with a hyper flexible language like | python. You can do lots of things in lots of different ways, | but you can also screw things up just as easily, and your IDE | won't tell you because technically it's valid code. | BeetleB wrote: | Heh. I use it all the time the way you do and didn't realize | this is alien to many developers (no one in my team every | complained about it). | | It's common in some languages and used the way you use it. I | looked in PEP8 and it seems they don't discuss this. | | I think it's a perfectly valid use case, but clearly there | are two camps to this. If this is so contentious, I would | recommend PEP8 be revised to either explicitly endorse it as | a way to split long lines or to explicitly discourage it and | recommend the + operator instead. | oaiey wrote: | I completely get that. That is a very nice feature for | building DSL or libraries with special needs. But it makes | the overall language very dangerous. | | Is this "operator" overloadable on each type in Python? | | And that scares me a lot. I think I have to reevaluate my | position towards Python. | housecarpenter wrote: | It's not really an operator. It's part of the syntax of | string literals. "foo" "bar" is an alternative way of | writing the string literal "foobar". If foo is not a string | literal, foo "bar" is invalid syntax. | oaiey wrote: | Okay... So it is not a implicit operator. That is good. | Some small reputation points are regained. | | Thanks. | silisili wrote: | Why not just use plusses? Or perhaps a join func, which would | accomplish the same. | | I get the use case as you described it, but it just seems | like minimal effort to accomplish and have some semblance of | explicit/safety. | dnautics wrote: | or if that's the use case, require the whitespace to | include a \n or \r\n... It's not like python doesn't have | significant whitespace already. | ErikCorry wrote: | That wouldn't fix most of the cases highighted by the | tool in the article. | | So strange that Python has completely different syntax | from C, but they chose to copy this obscure syntactic | feature _even though they have the plus operator on | strings_. | TonyRobbins wrote: | But i think Python is best than C, because of its Syntax | you cannot compare Python and C.! Just opinion. | shultays wrote: | You could have the same behavior by enforcing + operation in | between mylongstring = "hello" + | "world" | | No idea if python's way of indentations allows this but | sounds like it should | [deleted] | ReleaseCandidat wrote: | No, it doesn't: mylongstring = ("hello" + | "world") | | or, without `+` mylongstring = ("hello" | "world") | fragmede wrote: | Use \ mylongstring = "hello " \ | "world " \ "my " \ "name " \ | "is"* | BeetleB wrote: | The use of \ is discouraged in Python. From PEP8: | | > The preferred way of wrapping long lines is by using | Python's implied line continuation inside parentheses, | brackets and braces. Long lines can be broken over | multiple lines by wrapping expressions in parentheses. | These should be used in preference to using a backslash | for line continuation. | pmontra wrote: | As a comparison, in Ruby puts "a" "b" == "ab" # | true | | and puts "a" "b" == "ab" | | prints "a" with "b" == "ab" evaluated to false and discarded. | This could create bugs as with Python. However | ["a" "b"] == ["ab"] | | is syntax error at the beginning of the second line. The parser | expects a ] It would evaluate to true if it were on one line. | grey-area wrote: | In Ruby one too many commas can also cause problems: | | # list | | list = "a","b", | | # function | | def foobar | | end | | => ["a", "b", :foobar] | asow92 wrote: | I'm sure the devil is in the details on this bug. | aeturnum wrote: | The high-level goals of python end up creating these little | syntactic landmines that can get even experienced coders. My | personal nomination for the worst one of these is that having a | comma after a single value often (depending on the surrounding | syntax) creates a tuple. It's easy to miss and creates maddening | errors where nothing works how you expect. | | I've moved away from working in Python in general, but I think | the #1 feature I want in the core of the language is the ability | to make violating type hints an exception[1]. The core team has | been slowly integrating type information, but it feels like they | have _really_ struggled to articulate a vision about what type | information is "for" in the core ecosystem. I think a little | more opinion from them would go a long way to ecosystem health. | | [1] I know there are libraries that do this, I am not seeking | recommendations. | hsbauauvhabzb wrote: | I'd rather a compile time error over an exception (or both), | which in many cases can occur. I know mypy does this, maybe I | should alias python="mypy&&python" | luhn wrote: | I feel like it's been pretty clear from day one that type hints | are meant for static analysis with tools like mypy. It's not | exclusive to that use and has a lot of other possible | applications, but the primary goal has always static analysis. | aylmao wrote: | The lack of a static type-system is IMO what makes these one- | character mistakes very annoying. The compiler can't tell you | something is wrong, so you're just left to figure out why | things are broken, just to realize it was the smallest of | typos. | tyingq wrote: | C lets me do this, and doesn't say much about it. | char ch_arr[3][10] = { "uno", "dos" | "tres" }; | aeturnum wrote: | I love how simple and forgiving Python is for small projects. | The "trailing comma creates a tuple" situation comes out of, | as far as I can tell, a desire to create maximally convenient | syntax in the scenarios where tuples are intended. I think | that's great for small code! | | I just wish that the core team would take that same zeal for | a "pythonic" experience with small code and use it to develop | more scaled-up systems for dealing with larger code bases. My | idea is to enforce strong pre-conditions on function calls | using type hints, but I am sure there are other ways to do | it. | DangitBobby wrote: | For a language that is so incredibly picky about it's | whitespace rules, it's a little laissez faire on the | string-concatentation/tuple syntax side. I say this as | someone who loves python and uses it extensively. | trulyme wrote: | The "trailing comma creates a tuple" bug actually comes | from a disconnect between what people think defines a tuple | (parenthesis) and what really does (comma). I always put | parenthesis around a tuple for clarity. | iooi wrote: | If you use mypy (as anyone should for any non-hobby Python | usage) then Python has one of the strongest type systems | available. Optional types, generics, "Any" escape hatches, | everything you could want. | aeturnum wrote: | mypy is a great project and I agree that basically every | project at scale should use it. However, I think you're | wrong about the strength of the Python type system and | what a good type system can "get" you. I think mypy both | does an amazing job at static checking and that more | powerful type systems go far beyond static checks and | into changing how you structure and write code. The newly | introduced "structural pattern matching" they just | introduced[1] is an example of the kind of feature that | could be usefully expanded by making type a first-class | part of the Python runtime. | | Again - the dynamism of Python means teams can write | amazing extensions to Python (like mypy), but that isn't | a replacement for the core team having a plan for how | they think typing information should be used at runtime. | Their current answer seems to be "nothing," which | disappoints me. | | [1] https://www.python.org/dev/peps/pep-0622/ | ehsankia wrote: | A lot of people in this thread are using this to make fun of | Python, but the exact same issue exists in something like c++, | here's some I fixed recently: | | https://github.com/UWQuickstep/quickstep/pull/9 | | https://github.com/tensorflow/tensorflow/pull/51578 | | https://github.com/mono/mono/pull/21197 | | https://github.com/llvm/llvm-project/pull/335 | aeturnum wrote: | I didn't understand anyone to be saying that Python is the | only language to have this flaw. | | Also, I personally don't mind this approach to string | concatenation. I think it's a fine compromise between easy | formatting and clarity. I was whining about a corner case of | tuple construction - which as far as I know is not a feature | of any other language. | macNchz wrote: | I've been writing Python professional full time for 8 years and | still occasionally make the trailing-comma-tuple mistake. These | days at least I'll recognize and be able to find it quickly | rather than wasting time. Can be caught with a linter, but not | every codebase is readily linted. | kazinator wrote: | Not in Lisp! ("foo" "bar") and ("foobar") are lists of length 2 | and 1, respectively. | | (Python copies some bad ideas from C. Another one is having to | _import_ everything you use. It seems that since Python is | written in C, its designer took it for granted that there will be | something analogous to #include for using libraries, even | standard ones that come with the language.) | | Implicit string literal catenation is tempting to implement | because it solves problems like: printf("long %s | string" "nicely breaks up" "with | indentation and all", arg, arg, ...) | | and if you're working in a language which has comma separation | everywhere, you can get away with it easily. | | There are other ways to solve it. In TXR Lisp, I allow string | literals to go across multiple lines with a backslash newline | sequence. All contiguous unescaped whitespace adjacent to the | backslash is eaten: This is the TXR Lisp | interactive listener of TXR 273. Quit with :quit or Ctrl-D | on an empty line. Ctrl-X ? for cheatsheet. TXR needs money, | so even abnormal exits now go through the gift shop. 1> | "abcd \ efg" "abcdefg" | | If you want a significant space, you can backslash escape it; the | exact placement is up to you: 2> "abcd\ \ | efg" "abcd efg" 3> "abcd \ \ efg" | "abcd efg" 4> "abcd \ \ efg" | "abcd efg" 5> "abcd \ \ \ efg" | "abcd efg" | edflsafoiewq wrote: | The Python certainly looks nicer though. | [deleted] | rileymat2 wrote: | I like imports, it tells me what files symbols are coming from, | even for built in libraries. | | Maybe it is that through my work I use a half dozen languages, | where it is hard to remember each in detail. | | I have also worked on a javascript project where there were no | imports/requires and the build process created one file. So you | had to inspect the confusing build script to even know what was | what. | kazinator wrote: | You could fairly easily work with a bunch of .js files that | get catenated together by using an editor that can jump to a | definition. | | Build processes creating one file is the seven decade norm in | computing. | | Even if you literally don't catenate the .js files into one, | they get loaded into one running image one way or another. | stevesimmons wrote: | I like the explicit nature of Python's imports. | | And especially how I can choose the best way to indicate the | sources of names in my code: import time | t = time.perf_counter() import time, my_module | t1 = time.perf_counter() t2 = my_module.perf_counter() | from time import perf_counter as std_counter from | my_module import perf_counter as my_counter t1 = | std_counter() t2 = my_counter() try: | from my_module import perf_counter except ImportError: | # Fall back to standard implementation from time | import perf_counter t = perf_counter() # | import time as m import my_module as m t = | m.perf_counter() | kazinator wrote: | > import perf_counter as my_counter | | Yikes; you're renaming/aliasing global identifiers! Just | no. | tgv wrote: | The difference is: in C, it's pretty unlikely someone wants to | add strings. I suppose it's even illegal in the later C | versions. | kazinator wrote: | It is positively not illegal in any standard verision of C | since ANSI C 89. | | It's an essential feature used in all sorts of everyday code. | | C99 added printf conversion specifiers that are hidden behind | macros, and idomatic usage of them relies on string | catenation. uint32_t x = 0; | printf("x = " PRIx32 "\n", x); | | where PRIx32 might expand to "%lx" (if uint32_t is the same | as unsigned long in that compiler). | | All sorts of C macrology relies on string catenation. Kernel | print messages: printk(KERN_EMERG "%s: | temperature sensor indicates fire!", dev->name); | ^ must not have comma here | lanstin wrote: | Interesting. Arguably tho this shows how C is aging. I find | that PRIx32 a bit ugly. | | Although I just had a (logging) use case in go where I | missed cpp macros - wanted the log statement to get | something from the file and just had to pass it in as | another parameter. | kazinator wrote: | I have also never used PRI-anything. It's a crime against | readability. | | If I have a uint32_t which needs printing I cast it to | (unsigned long) and use %lu or %lx. This requires more | typing in the argument list, but keeps the format string | tidy. It's important for the format string to be tidy, | because that's the reason of its existence: to clearly | and concisely convey the shape of what is being printed. | tgv wrote: | I know that. I meant that "abc" + "def" is most likely | illegal (although "abc" + 'd' is not). | wott wrote: | > I meant that "abc" + "def" is most likely illegal | | That would be adding 2 pointers, and that's indeed | illegal. | | However, you can subtract them: "abc" - "def" . Now, the | result is not a pointer any more, it's a ptrdiff_t (an | integer type), so most compilers will warn if you try to | assign that to a char *. | kazinator wrote: | You started talking about "adding strings" in a thread | about adjacent literals, without mentioning any + | operator.. | | String catenation ("adding") by adjacency (no visible | operator) is a thing; "add" doesn't imply that we are | talking about a + operator: $ awk 'BEGIN | { x = "abc-" 2 + 2 "-def"; print x}' abc-4-def | Spivak wrote: | I'm gonna disagree on the import thing. Compared to Ruby where | requires are magic bags of metaprogramming bullshit, Python is | much much easier to reason about. It takes some getting used to | that require 'json' actually adds methods to existing classes. | kazinator wrote: | "require 'json'" is just another #include in disguise, and if | it monkey patches existing classes, it ... probably should | not exist in any form. | | If the language supports json, it should just do that. | 1> #J[1,2,3] #(1.0 2.0 3.0) 2> (get-json | "[1,2,3,{\"foo\":true}]") #(1.0 2.0 3.0 #H(() ("foo" | t))) 3> (put-json #(1.0 2.0 t)) [1,2,true]t | Spivak wrote: | Welcome to Ruby. $ irb | irb(main):001:0> { hello: "world" }.to_json | NoMethodError (undefined method `to_json' for | {:hello=>"world"}:Hash) irb(main):002:0> | require 'json' irb(main):003:0> { hello: "world" | }.to_json => "{\"hello\":\"world\"}" | kazinator wrote: | I mean, I understand that classes which are open to | extension with new methods is useful, and the right way | to do OOP and all. | | If it was CLOS with multiple dispatch, it would be easier | to swallow. Because it would look like: | (to-json { hello: "world" }) ;; error: no such | function! | | Then load the module, and you have a generic to-json | function now, with a method specialized to handle the | dictionary object and all. (I still wouldn't want to be | doing this if it's supposed to be a language built-in). | | I regard the ability to add new methods to a class as | good, but with a valid use case, like extending some | third party piece with new methods in your own | application. And the fact of not having to declare | methods in a class definition, which is cumbersome. Just | write a new method in that class's file, at the bottom, | and there it is. | | I ideally don't want that third-party piece itself to be | divided into three pieces that I have to separately load | to get all of the methods. Or worse, pieces from separate | third parties that add methods to each other. | | I copied a thing or two from Ruby in TXR Lisp. The object | system as a derived hook, and that was inspired by | something in Ruby: 1> (defstruct foo () | (:function derived (super sub) (prinl `derived @super | @sub`))) #<struct-type foo> 2> (defstruct bar | foo) "derived #<struct-type foo> #<struct-type | bar>" #<struct-type bar> 3> (defstruct xyzzy | bar) "derived #<struct-type bar> #<struct-type | xyzzy>" #<struct-type xyzzy> | | The derived hook is inherited (like any other static | slot), so it fires in bar also. The function can | distinguish which class is being derived by the super | argument. | Someone wrote: | You mean long %s stringnicely breaks upwith | indentation and all" | | ? In my experience, this always gets ugly when you want to | insert spaces (= about always). Do you put them at the end or | at the start of each string (apart from the first or last | string) | | I think scala's _mkString_ | (https://superruzafa.github.io/visual-scala- | reference/mkStrin...) is the best solution, visually, for such | things, but unfortunately, it would require hackers in the | parser to do the concatenation at compile time, where possible. | | Scala's multiline strings look nice, too, if you want to insert | newlines, except for the _stripMargin_ thing | (https://docs.scala-lang.org/overviews/scala-book/two- | notes-a...) | kazinator wrote: | The spaces aren't the point of the comment; rather that we | can break the literal into pieces and indent those pieces | without affecting the contents. In a non-strawman real | exmaple with real data, of course we include all the | necessary spaces in the literals. However, this bug is easy | to make in C; I've seen it numerous times. | Someone wrote: | That's preciseLy my point. This looks nice, but it's too | easy to forget tone of those spaces and to hard to spot | that. | NoahTheDuke wrote: | > Another one is having to import everything you use. | | The alternative is what exactly? Have the entire standard | library exposed at once? Make all modules create non- | conflicting names for exported objects, so that the json parse | function has to be called json_parse and the csv parse function | has to be called csv_parse? | | Seems less than ideal to me. | kazinator wrote: | That's one way. | | If these things are classes in a plain old single-dispatch | oop system, you can havec a json-parser and csv-parser which | have parse methods. | | There could be packages/namespaces. So csv:parse and | json:parse. These packages are standard and so they just | exist; nothing to import. | | In Python, you cannot use anything without an import! The | top-level modules (which serve as _de facto_ namespaces) | themselves are not visible. | | Say there is a csv module with a parse. You cannot just do: | csv.parse(...) | | you have to first say import csv | | This jaw-droppingly moronic. | NoahTheDuke wrote: | > This jaw-droppingly moronic. | | It can be slightly inconvenient but doesn't feel moronic to | me. It means that except for the built-in functions, | everything can be traced to either a definition or an | import. Makes tracking code much easier. | lanstin wrote: | It lets you debug. E.g. if they have made a file called | cvs.py in the same directory, then print (cvs.__file__) | will show you this. If they have some weirdly screwed up | paths with multiple pythons installed and multiple copies | of the modules etc., same. | | I will not Go lang has the same feature carried forward | from C. It helps a lot in the reading code side of the code | lifecycle. And Go compiler makes you keep the imports up to | date, which is good. | kazinator wrote: | > It lets you debug. | | It lets you debug Python problems which the system | created in the first place. | | > If they have some weirdly screwed up paths with | multiple pythons installed and multiple copies of the | modules etc., same. | | Doesn't happen in a sane language. Or, even not a sanely | defined language/implementation. | | I can easily have multiple different GCC copies (possibly | for different processor targets) on the same machine. | Each one knows where its own files are; an #include | <stdio.h> compiled with your /path/to/arm-linux-eabi-gcc | will positively not use your /usr/include/stdio.h, unless | you explicitly do stupid things, like -I/usr/include on | the command line. | lanstin wrote: | Having everything be imported is what makes the language be | useable. Especially if you never import * you can easily find | the definition and meaning of everything you read on the | screen. A prime example of explicit is better than implicit. | | And backslash doesn't let you have the literal obey the proper | indenting. Might as well use """ | kazinator wrote: | > _you can easily find the definition and meaning of | everything you read on the screen_ | | I don't want to be finding definitions of things that the | _language_ provides in the code. | | Languages that don't work this way have IDE's, editor plug- | ins or other tools for easily finding the definitions of | things that are in the language, without hunting for them | through intermediate definition steps in the same file. | | "I've spent all my life in and out of jails, so I expect bars | on doors and windows ..." | justsomehnguy wrote: | @" here strings in PS are fine for this purpose and | even allows whitespace anywhere but because | of the latter you can't indent it with your other | code "@ -split "`r`n" | % {'<SOL>{0}<EOL>' -f $_ } | <SOL> here strings in PS are fine for this purpose and <EOL> | <SOL> even allows whitespace anywhere <EOL> | <SOL> but because of the latter you can't indent it | <EOL> <SOL> with your other code <EOL> | kazinator wrote: | I posted a Unix StackExchange answer with some tricks for | doing this in shell programming, very similar to your <SOL> | trick. | | https://unix.stackexchange.com/questions/76481/cant- | indent-h... | wartijn_ wrote: | I like this. It's clearly meant as marketing for their product, | but imo the best kind of marketing. They don't just run their | tool and automatically make tickets, but check for false positive | and (offer to) make pr's. | | It's both good for those projects and for the company that does | the marketing since they reach there exact target group. Plus it | gets them on the front page of HN. | ehsankia wrote: | A great addition to prune a ton of false-positives is to check | the length of the strings. Almost always, the intentional | implicit concats will have a very long string that reaches the | max line length, whereas the accidental ones are almost always | very short strings. | routerl wrote: | tl;dr: Python concatenates space separated strings, so ['foo' | 'bar'] becomes ['foobar'], leading to silent bugs due to typos. | | I've been bitten by this one at work, and can't help but think it | is an insane behaviour, given that ['foo' + 'bar'] explicitly | concatenates the strings, and ['foo', 'bar'] is the much more | common desired result. | | edit: This also applies to un-separated strings, so ['foo''bar'] | also becomes ['foobar'] | idealmedtech wrote: | It's a holdover from C, where implicit string literal | concatenation is very useful in the preprocessor. | Palomides wrote: | I assume it's based on the C behavior, where it can be handy | with macros | | I don't think it fits well in python | pmontra wrote: | Maybe. We must remember that Python was designed at the very | end of the 80s so what was normal for developers back then | could be unexpected nowadays. An example: the self in | Python's OO is a C pointer to struct of data and function | pointers. It should be perfectly clear to anybody writing OO | code in plain C at the time (rising hand.) Five years later | new OO languages (Java, Ruby) kept self inside the classes | but hide it in method definitions. | wartijn_ wrote: | But Python 3 was designed in the 2000s and had many | breaking changes. Seems like they could have changed this | behavior with that version. | pletnes wrote: | I assumed it was borrowed from shell, where everything can | just be put next to eachother since it's all text. | thrdbndndn wrote: | I luckily never accidently used this space-concatenation thing, | but I've been bitten by the fact a=(1) doesn't create 1-element | tuple multiple times in my early days learning Python. | onphonenow wrote: | I still don't understand why it doesn't! So I still get bit | from time to time. | scbrg wrote: | Presumably because parantheses don't really have anything | to do with tuples, it's commas that do. Parantheses are | there to help the parser group things in case of ambiguity, | and to support expressions spanning multiple lines. | voussoir wrote: | If a person decides to add parentheses to some booleans or | arithmetic, (4 + 5) * (8 + 2) | (this and that) or (theother) | | These elements should not become 1-tuples after the | interior contents are evaluated. I sometimes add | parentheses even around single variables just for visual | clarity. | | Also, this allows you to do dot-access on int / float | literals, if you want to # doesn't work | 4.to_bytes(8, 'little') # works | (4).to_bytes(8, 'little') | shoyer wrote: | Most of the "bugs" caught here (including in TensorFlow and in my | own project, Xarray) seems to actually be typos in the test | suite. This is certainly a good catch (and yes, linters should | check for this!), but seems a little oversold to me. | jiveturkey wrote: | nice ad! | Pensacola wrote: | Why 666? | TonyRobbins wrote: | Thanks for sharing these because if you not share i have to | manage with this bug. Thank You Again! | micimize wrote: | For those looking to avoid this specific problem, there is a | flake8 rule: https://pypi.org/project/flake8-no-implicit-concat. | | More broadly, the https://codereview.doctors makers are making | the point that their tool caught an easy-to-miss issue that most | wouldn't think to add a rule for. A bit of an open question to me | how many of those there really are at the language level, but | still seems like a neat project. | pfisherman wrote: | Ime, Black will add parenthesis to clearly and explicitly | indicate a tuple where there is trailing comma. Figured this | out when I made the trailing comma mistake and wondered why | Black kept reformatting my code. | trulyme wrote: | Black rules. I love it that I don't need to have a discussion | about style with anyone when Black is used on the project. | oblvious-earth wrote: | Also all but 1 of the issues they found relates to test code, | it seems people are a little less careful compared to | functional code. | | Also in terms of mistakes codereviewdoctor twice linked to the | same issue in their blog | https://github.com/tensorflow/tensorflow/issues/53636 and | raised the PR to the wrong project | https://github.com/tensorflow/tensorflow/pull/53637 (I guess | Tensorflow vendors Keras, easy mistake) | sundarurfriend wrote: | > all but 1 of the issues they found relates to test code, it | seems people are a little less careful compared to functional | code. | | Also a factor that bugs in functional code are more visible, | both during development and to users once shipped. So there | may have been an equal number or more such bugs in the non- | test code, that just didn't remain in the code base for this | long. | thrdbndndn wrote: | https://github.com/tensorflow/tensorflow/tree/0d8705c82c64df. | .. STOP! This folder contains the | legacy Keras code which is stale and about to be deleted. The | current Keras code lives in github/keras-team/keras. | Please do not use the code from this folder. | | Yeah, not the most obvious notice. | | The fact they didn't find the same mistake(s) in keras- | team/keras (I assume they scanned, it's one of the most | popular Python repo) makes me believe these issues have been | fixed/removed in up-to-date karas repo. | rikatee wrote: | once tensorflow pointed to keras-team this happened | | https://github.com/keras-team/keras/issues/15854 | | resulting in | | https://github.com/keras-team/keras/pull/15876 | tedmiston wrote: | The URL in this comment has an incorrect TLD: it should be | `doctor` (singular). | | https://codereview.doctor/ | rikatee wrote: | there is also https://pypi.org/project/flake8-tuple/ | | typo in the url (or in HN's markup) btw: it's | https://codereview.doctor | prepend wrote: | This seems like not a big deal. It's a common mistake and is in | 5% of repos but it's not causing major damage. | | And there's no evaluation of importance as to whether these | instances are in test files or non-critical code. Packages are | big and can have hundreds or thousands of files. | | It could be that if these mattered, they would have been detected | and fixed. | | A good example for unit tests and perhaps checking to see if | these bugs are covered or not covered. | | I like these kinds of analyses but don't like the presented like | it's some significant failure. | rikatee wrote: | yeah the impact varies. the sentry one seems pretty big: | https://codereviewdoctor.medium.com/5-of-666-python-repos-ha... | | test did not work but did not fail either, imagine being that | dev maintaining the code that the test professes to cover. | Imagine being the user relying on the feature that test was | meant to check (if the feature under test actually broke). | jollybean wrote: | 5% of 'released' software is quite a lot, more importantly it's | a class of errors that definitely should not exist. This is a | 'bug' in the language effectively there just isn't any real | upside. | | Python has a few of these things, which is really sad. | onphonenow wrote: | There were proposals to fix some of these but the unicode | zeal beat out some of the more boring (but I'd say as | important) cleanups. | bcrl wrote: | It's a class of error that would be caught by even the most | basic testing. A better title for the article is that 5% of | 666 Python repos have typos that demonstrate the code in them | that is completely untested. It doesn't matter which language | it is: untested code is untested code in any language. | rikatee wrote: | unfortunately like 10% of the bugs were in the tests | themselves. e.g., the sentry one | https://codereviewdoctor.medium.com/5-of-666-python-repos- | ha... | | the tests are only as good as the code they're written | with, and as good as the code review process they were | merged under. | wott wrote: | I believe that, whenever possible, tests should be | written in a different language that the one used for the | code under test (even better, in a dedicated, mostly | declarative, testing language). | | It avoids replicating the same category of errors in both | the test and the code under test, especially when some | calculation or some sub-tests generation is made in the | test. | bcrl wrote: | One of the habits I have when writing kernel code is to | intentionally break code in the kernel to verify that my | test is checking what I think it's checking. That's | because of a lesson I learned a long, long time ago after | someone reviewed my code and caught a problem: when your | code has security implications, you need to make sure the | boundary conditions that your tests are supposed to cover | actually get tested. Having implemented a number of | syscalls exposted to untrusted userland over the years, | this habit has saved my bacon several times and avoided | CVEs. | geofft wrote: | The errors were usually in tests themselves. Are you | arguing that tests need their own tests to test that they | are testing the right thing? Usually I think people believe | that tests do not need to be tested and should not be | tested, i.e., that you measure "100% coverage" against non- | test code alone. | samhw wrote: | I don't think anyone could disagree: you could never | exceed 0% code coverage if your definition was recursive | (i.e. included tests, tests-of-tests, tests-of-tests-of- | tests, ...). | Cpoll wrote: | Only if you generate infinite tests, then your coverage | approaches 0%. But 100% covered code + 0% covered tests = | ~50% total coverage. | | Also, the _obvious_ solution is self-testing code. (Jokes | aside, structures like code contracts attempt something | like this). | jve wrote: | I checked those those 11 links to issues for major software. | 10 bugs were actually in tests... | oaiey wrote: | I do not see this from a verification perspective ... But | also from a productivity perspective. | ErikCorry wrote: | This is understandable since many of those projects are not | written in python. So the python code in them is only in | incidental scripts like test harnesses. If V8 was written | in python then performance would probably not be very good. | deathanatos wrote: | 9 out of 10, actually; the Tensorflow links are the same | link. | enchiridion wrote: | I mean, if you're ultimately going to combine the list into a | string anyway it's no big deal. | | Along those lines. I wonder how many of these come from ad-hoc | file path handling instead of using pathlib. | karolkozub wrote: | I really like the idea of automated code review tools that point | out unusual or suspicious solutions and code patterns. Kind of | like an advanced linter that looks deeper into the code | structure. With emerging AI tools like Github Copilot, it seems | like the inevitable future. Programming is very pattern-oriented | and even though these kinds of tools might not necessarily be | able to point out architectural flaws in a codebase, there might | be lots of low-hanging fruits in this area and opportunities to | add automated value. | rak1507 wrote: | Or people could just write it correctly in the first place! | Controversial I know! Seems like people would rather half-ass | things and then let some AI autocorrect fix it up for whatever | reason rather than doing it properly. | lumost wrote: | Consider that you may be describing a compiler. Typos are not | _generally_ a problem in statically typed languages with | notable exceptions such as dictionary key lookups etc. | | Even without static typing, argument length verification etc. | can be done with a suitable compiler. In python we are left | chasing 100% code coverage in unit tests as it's the only way | to be certain that the code doesn't include a silly mistake. | samhw wrote: | I think 100% code coverage is folly. Spreading tests so | widely near-inevitably means they're also going to be thin. | In any codebase I'm working on, I would focus my attention on | testing functions which are either (a) crucially important or | (b) significantly complex (and I mean real complexity, not | just the cyclomatic complexity of the control flow inside the | function itself). | lumost wrote: | Fully agree, but I _never_ want to see a missed function | argument programming error in customer facing code. In | python you really do need code coverage to achieve this | goal - static languages have some additional flexibility. | lanstin wrote: | Or a rich suite of linters religiously applied. Never | save a file with red lines in flymake or the equivalent. | Ed: actually, I am unsure if my current suite would miss | required parameters. I tend to have defaults for all but | the first parameter or two, so not a big issue for me I | guess. I do like a compile time check on stuff tho, one | of the reasons I am doing more and more tools in Go. | joatmon-snoo wrote: | I actually recently joined a startup working on this problem! | | One of our products is a universal linter, which wraps the | standard open-source tools available for different ecosystems, | simplifies the setup/installation process for all of them, and | a bunch of other usability things (suppressing existing issues | so that you can introduce new linters with minimal pain, CI | integration, and more): you can read more about it at | http://trunk.io/products/check or try out the VSCode | extension[0] :) | | [0] | https://marketplace.visualstudio.com/items?itemName=Trunk.io | rikatee wrote: | cool product :) it is just linting or do any of the tools do | code transformation to offer the fix for the lint failure? | (code review doctor also offers the fix if you add the github | PR integration) | atleta wrote: | This is basically linting, i.e. code analysis. The techniques | used might be more current (as they have been evolving, as you | say, for pattern matching) but linting is just that: a code | review tool to find usual bugs. (This is what did happen in | this blog post. It wasn't looking for unusual solutions but | usual mistakes.) The packaging, form of the feedback seems also | different and that in itself may make a lot of difference in | ease of use and thus adoption. | joatmon-snoo wrote: | Admittedly, the difference here is that codereview.doctor | spent time tuning a custom lint on a variety of repos. In an | org with a sufficiently large monorepo (or enough repos, but | I don't really know how the tooling scales there) it's | possible to justify spending time doing that, but for most | companies it's one of those "one day we'll get around to it" | issues. | rikatee wrote: | yeah something like sonarqube or https://codereview.doctor (if | you use GitHub) | ficklepickle wrote: | Ironically there are a variety of typos in the article. | | A paragraph is repeated and the markdown links at the end are | broken because there is a space between ] and (. | codeptualize wrote: | And then people make fun of JavaScript! (Just joking, I like | Python, also JS, I guess everything has it's quirks, it's a good | thing we have linters) | bilalq wrote: | The whole "666" thing really threw me off. I thought it was some | Python specific term or something at first glance. They open with | a sentence that mentions "5% of the 666 Python open source GitHub | repositories" as though there were only 666 total open source | Python GH repos. Picking a number with other fun connotations or | whatever to use as a sample is fine, but without setting that | context, it was kind of distracting from their main content. | deathanatos wrote: | Did you figure out what the context is, and if you did, would | you mind spelling it out for me? I still haven't figured out | what correction to make to that sentence to get it to make | sense. | rikatee wrote: | in a blog post about the evils of typos there was a typo! | classic https://en.wikipedia.org/wiki/Muphry%27s_law ;) | ffhhj wrote: | Also this classic: | | > Apple I was the first product ever announced by the | company in 1976. The computer was put on sale for $666.66 | at the time. | | https://9to5mac.com/2021/11/25/steve-woz-signs- | rare-1976-app... | bilalq wrote: | They ran their static analyzer over a sample of GH repos. | They chose 666 as the number for their sample size. That's | all. | tyingq wrote: | Seems expected, as linters can't be sure when it's not | intentional. Like this request to pylint: | | https://github.com/PyCQA/pylint/issues/1589 | | Is there usually enough context for a linter to make an educated | guess? | rikatee wrote: | can do a good job at allowing long urls for example, but would | be whack a mole trying to cater for "all" purposeful implicit | string concatenations | chrismorgan wrote: | Splitting long URLs onto multiple lines because you have a | hard line length limit is _considerably_ more harmful than | exceeding the length limit in such cases, because you break | the URL up so that tooling (including language-unaware static | analysers) can't conveniently access it. (e.g. if you want to | open the link, you can't just copy it or click on it or | whatever, but must first join the lines, removing the | quotation marks.) Any tool that forcibly splits up such lines | when there is no fundamental hard technical reason why it | must is, I categorically state, a bad tool. | mikepurvis wrote: | I would have thought it would be a no-brainer to just ban it | and insist on an explicit + operator. I'm pretty surprised that | issue was so flippantly closed. | ReleaseCandidat wrote: | The PR has been merged (for lists and tuples and sets only). | | https://github.com/PyCQA/pylint/pull/1655 | thaumasiotes wrote: | > I would have thought it would be a no-brainer to just ban | it and insist on an explicit + operator. | | Maybe as a matter of linting. As a matter of language design, | I think + for string concatenation is a big mistake; using | different symbols for numeric addition and string | concatenation is something Perl got right. | mikepurvis wrote: | Yes, I meant as a matter of linting. I can understand the | arguments being different for the language as a whole, | particularly when legacy compatibility is a consideration. | | But my impression using pylint is that its default settings | are wildly opinionated, hence the surprise that this | wouldn't have fallen under that umbrella. | tus666 wrote: | Alternative title: 5% of Python repos has inadequate test | coverage. | _dain_ wrote: | Most of the errors were in the tests themselves. | ehsankia wrote: | Nice! Internally we have a PCRE support on our code search and I | regularly run a regex to find and fix these. I've also found a | ton on opensource project which I've been trying to fix: | | https://github.com/YosysHQ/prjtrellis/pull/176 | | https://github.com/UWQuickstep/quickstep/pull/9 | | https://github.com/tensorflow/tensorflow/pull/51578 | | https://github.com/mono/mono/pull/21197 | | https://github.com/llvm/llvm-project/pull/335 | | https://github.com/PyCQA/baron/pull/156 | | https://github.com/dagwieers/pygments/pull/1 | | https://github.com/zhuyifei1999/guppy3/pull/12 | | https://github.com/pyusb/pyusb/pull/277 | | https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull... | | It is indeed a very common mistake in Python, and can be very | hard to debug. It bit me once and wasted a whole day for me, so | I've been finding/fixing them ever since trying to save others | the same pain I went through. | | EDIT: I will point out that I've found this error in other non- | Python code too, such as c++ (see the 2nd PR for example). | | Here's the regex for anyone curious: | | [([{]\s*\n?(\s*['"](\w)+['"],\n)+(\s*['"]\w+['"]\n)(\s*['"]\w+['" | ],\n)* | arusahni wrote: | The removal of implicit string concatenation was proposed for | Py3k[1], but was rejected. | | [1] https://www.python.org/dev/peps/pep-3126/ | Wowfunhappy wrote: | Does Python support the concept of allowing code to opt in to | new safety features? I can understand rejecting something like | this for the sake of legacy compatibility (something Python has | abandoned too readily in the past), but it seems like an option | --or maybe even a default--might be nice. | | I suppose this is also something you could catch with a linter? | cpeterso wrote: | Yes: import from __future__ | | https://docs.python.org/3/library/__future__.html | Wowfunhappy wrote: | I'd say that's a "kind of", since it implies the feature | will eventually become mandatory. I was thinking more along | the lines of Javascript's 'use strict'; | wodenokoto wrote: | The rejection notice seems completely counter intuitive to me. | How is adding a plus "harder" compared to removing a foot gun? | | > This PEP is rejected. There wasn't enough support in favor, | the feature to be removed isn't all that harmful, and there are | some use cases that would become harder. | oa2022 wrote: | This change would break a lot of legacy code for no good | reason | | The most common way to split a string in lines is using this | concatenation formula. | [deleted] | nojs wrote: | > The most common way to split a string in lines is using | this concatenation formula. | | Is it really? I tend to avoid it in favour of """ or | '\n'.join(<list of lines>), because it looks like a | mistake. | | Triple quotes are kind of annoying if the string is | indented, but you can just not indent the string to avoid | the whitespace. | benesch wrote: | > This change would break a lot of legacy code for no good | reason | | Preventing a bug that occurs in 5% of observed codebases | (and anecdotally, happens to me during development all the | time) seems like about as good as reasons get. | | Swapping a perfectly fine print statement for a function, | on the other hand... that's the breaking change in Py3k | that's never seemed worth it to me. | wodenokoto wrote: | But wasn't this proposal part of the move to python 3? | strings where broken left and right anyway. | gsnedders wrote: | Right, there was lots of deliberate breakage, _and_ this | is purely syntaxual hence the sort of thing 2to3 could | trivially deal with. | ehsankia wrote: | > the sort of thing 2to3 could trivially deal with | | 2to3 could also trivially add +, and if anything, that | would actually help surface these kind of bugs, because | if you randomly see a + in the middle of your list of | strings, it's much easier to spot the bug than if there | was a missing comma. | wirthjason wrote: | Ironic to see this today. I spent an hour debugging this very | same issue this morning. | | I was just doing some simple refactoring, changing a hard coded | sting into a parameterized list of f-strings that's filtered and | joined back into a string. | | I'm glad that I had unit tests that caught the problem! I | couldn't figure out why it was breaking, that comma is very | devilish to spot with the naked eye. I'm surprised my linters | didn't catch it either. Maybe time to revisit them. | Forge36 wrote: | I wonder if any of the found issues will turn out to be important | issues. | titzer wrote: | Just to be clear, the V8 "bug" was in the test runner code and | caused mis-parsing of command line options for testing for non- | SSE hardware. Not exactly a critical bug. ___________________________________________________________________ (page generated 2022-01-07 23:00 UTC)