[HN Gopher] We fixed f-string typos in popular Python repos ___________________________________________________________________ We fixed f-string typos in popular Python repos Author : rikatee Score : 94 points Date : 2022-04-29 18:06 UTC (4 hours ago) (HTM) web link (highertier.com) (TXT) w3m dump (highertier.com) | safwan wrote: | f-string does not work with GNU/gettext! It is also a common | mistake that people make! | hn_saver wrote: | nice | mhils wrote: | We've been one of 666 repos, and I'm not too happy of having our | repo used as advertising space. Some thoughts: | | - I'm happy to receive fix-a-typo PRs from human users. In this | case the other side demonstrated that they care by putting in a | bit of manual effort, and a small PR often paves the way towards | larger contributions. I also know that open source beginners get | really excited about their first small contributions, and I'm | honestly happy to support that. | | - In contrast, the marginal effort for bot PRs is ~0. It's very | easy to generate a small amount of work for a lot of people, and | the nice side effect is that the bot's platform is advertised | everywhere. As a maintainer, I have never given consent to this | and I have no choice to opt out. | | We are very happy users of some GitHub bots, but I feel it needs | to be an active adoption decision by the maintainer. If you want | to pitch me your service you may send me an unsolicited email, | but don't use our public space to advertise your product without | asking. | | Edit: I don't want to be too harsh to OP here - at least they | pointed out a small but valid issue in our case. I very much | appreciate their apology at | https://news.ycombinator.com/item?id=31210245 | IshKebab wrote: | I feel the same way about those bots that tell you about | insignificant security vulnerabilities in some project you | abandoned. It's basically spam. | | That said, this does seem like it is a bit more useful. As long | as they actually read the changes and make sure they aren't | false positives. Which I'm guessing they didn't do for 666 | repos. | b6dybuyv wrote: | > As long as they actually read the changes and make sure | they aren't false positives. Which I'm guessing they didn't | do for 666 repos. | | In the article they say that "really a bot found the problem | and made the PR, but really a human developer at Code Review | Doctor did triage the issue before the PR was raised)". | TameAntelope wrote: | I just... think you should reconsider your stance on this. If | you made a mistake in a public repo and someone else caught it | (via scan of your repo or otherwise), it's a pretty bad look to | be anything but grateful at that point, PR benefits for the bot | aside. | omegalulw wrote: | I understand the sentiment but you should be judging the PR, | not the source. Ask yourself: would you have happily accepted | the same PR that the bot sent if it came from a human? | | By all means, I am not against having bots identify themselves | properly, my point is that "effort from bot PR is ~0", "it | advertises their platform" are simply not the right reasons to | judge this situation by. | memco wrote: | The article links to some docs for the logging module here: | https://docs.python.org/3/howto/logging.html#optimization | asserting that f-strings are less optimal but the docs do not say | that they do not optimize our the expression evaluation of | f-strings: only that the logging module tried to perform | evaluation as late as possible: where is the f-string described | as suboptimal? | | Relatedly the logging optimization suggests setting: | raiseExceptions to false for production logging code: where is | that set? On the logger, handler or something else? | bobbiechen wrote: | I was also confused by the expression evaluation thing. Reading | between the lines, it seems like | logger.debug("hello %s", foo) | | may be better than logger.debug(f"hello | {foo}") | | in the case when loglevel is higher than debug. In the first | version, the final string does not have to be computed, while | in the second version, we might construct the string and then | do nothing since the loglevel is excluded. | jrootabega wrote: | You're also able to add additional log-specific processors to | the log record in the first case. | masklinn wrote: | That's exactly it. | | Although this becomes more complicated because printf-style | string formatting is not free (though it's the cheapest of | all methods save fstrings if I remember correctly), and | because python does not support lazy parameters if `foo` is a | non-trivial expression odds are good it will far outcost | either formatting. | nimish wrote: | The "Use logging's interpolation" warning has always annoyed | me. If logging is in your hot path, that might be an issue. Me | f-string interpolating some info logs that run once for | convenience is not. | | Low value junk like this is not helpful. | aib wrote: | > where is the f-string described as suboptimal? | | I guess it's implicit in that f-strings, as arguments, will be | evaluated before the logging function can even run whereas | `debug("...", heavy_obj)` will avoid a potentially expensive | `str(heavy_obj)` (or whatever the string conversion warrants.) | | As for raiseExceptions, I'm not sure it's for optimization. It | looks like an old sanity check for bad logging configurations. | dewey wrote: | > For science you can see the reactions here. | | That link seems to be broken: | https://github.com/issues?q=is%3Aissue+author%3Acode-review-... | | I was actually surprised to read that people would ignore or be | annoyed by a bot raising a valid PR that can be easily merged | after a quick glance. What would be the reason for that? | VWWHFSfQ wrote: | Because this is basically just PR spam | dekhn wrote: | to me, well-intentioned systems wiht a high true positive | rate and low false positive rate are welcome so long as they | follow reasonable etiquette and norms, which this group seems | to do. | mhils wrote: | In our case OPs bot did not open a PR which could have been | merged quickly, but filed an issue instead. | Waterluvian wrote: | I expect to see the entire gamut of possible reactions with a | sufficient number of bot PRs. But in looking at 10 of them at | random, I didn't find a single "negative response." | | (I don't think ignoring it is invalid or wrong by any means, | given there's so many reasons one might not engage in a timely | manner, or at all, in the issues section or PRs. I don't | monitor my repos issues because I just don't feel interested in | supporting my code. Feel free to fork or ignore!) | vitus wrote: | Some negative reactions: | | https://github.com/mitmproxy/mitmproxy/issues/5285 | | https://github.com/Qiskit/qiskit-terra/issues/7981 | | https://github.com/beetbox/beets/issues/4340 | | I do think those concerns are legitimate. (I also think more | tooling is a good thing!) | Waterluvian wrote: | Thank you for sharing these links! | dekhn wrote: | I looked through all three. The first isn't really a | complaint because the bot acted in good faith and found an | error. In the second one they complained abiout a missing | unsubscribe link (reasonable) and in the third one, the | author should update their code so it doesn't create a | variable named path, then a non-f-string that includes | "{path}". I had to stare at the author's comment that it | was a false positive for quite a bit to convince myself | they were right. | vitus wrote: | I will point out that in the first two issues, the repo | owners also edited the initial report with something | along the lines of "removed ad". | | I disagree that the first isn't a complaint -- the owner | stated that this behavior isn't appreciated but decided | to let it slide because the issue was valid. | | In the third issue, the owner also explicitly stated: "I | don't think bots posting unsolicited static analysis | results are a good idea." I have no opinions on whether | the code should be clearer, but that doesn't change the | validity of the reaction. | klyrs wrote: | The bot-account's (apparently human-written) reply of | "you're very welcome" to the complaint in the third issue | is downright dismissive of the problem and kinda passive | aggressive. While it seems that the bot did good work | overall, the human(s) handling edge cases need work. | mhils wrote: | We've blocked the bot after their script malfunctioned | and they opened a second issue with exactly the same text | (https://github.com/mitmproxy/mitmproxy/issues/5286). | rikatee wrote: | klyrs was right about the reply from me (a dev behdind | Code Review Doctor) being dismissive in the issue. I | apologise for that. | | FWIW my reaction was classic "expectations not meeting | reality": weeks of work to do (what I thought) was a | mutually beneficial helpful thing. I was naively not | expecting non-positive responses and was ill prepared | when you raised valid concerns I had not considered. | | Again, I am working on that and sorry I was passive | aggressive to you. | mhils wrote: | Thank you - that explanation makes sense, apology | accepted. :) | klyrs wrote: | Nice response, thanks. | dekhn wrote: | I do think the question of "how should bots that do | static analysis work" is an important one, but in the | meantime, people are gonna bot and repo managers are | gonna complain. | Forge36 wrote: | What I've found from doing similar types of changes. | | 1. It's hard to explain the impact to the application of the | current problem. Thus it looks like a theoretical issue | | 2. Sometimes people rely on the bug for their code to work | | 3. Surprise work can be poorly received (ie: not the current | priority) | TrickardRixx wrote: | Automated checking of potential bugs in f-strings is hard. | There are lots of false positives. You can see some discussion | around this kind of rule in pylint [0]. At the end of the day, | the choice to run automated linting tools on a repo is up to | the maintainers. Autogenerating PRs like this is incredibly | noisy and comes off to me as a blatant advertisement for their | "code review doctor" product. | | [0] https://github.com/PyCQA/pylint/issues/5039 | zamadatix wrote: | In reactions they conveniently left out "false positives we | still hadn't weeded out". On top of that it can be annoying to | have bots making trivial PRs in their own format when you've | got a well defined process for it. Lastly it was basically | spamming an ad link for the service at the end of the PR | comment - even if the other issues didn't come up it's not | always well received to do that. | | Looking at 1 bot it doesn't sound bad, when you have everyones | bot doing this kind of stuff it can quickly become more of a | nuisance than a help. | cinntaile wrote: | https://github.com/Qiskit/qiskit-terra/pull/7982 | | That guy was not happy. I do agree that it's basically | advertising and that's annoying. | jjoonathan wrote: | You're assuming that the PR is valid, but a maintainer can't | make that assumption. They have to do the thankless work to | figure out the context and handle the fallout if they get it | wrong. Let's look at who wins: * Small | benefit to bot creator * Tiny benefit to project | * Modest cost to maintainer | | Waves of low-effort resume-padding commits are already a thing. | Not a big problem, but bots clearly have the potential to | multiply the small problem into a big problem. | | I'm still open to the idea that bots could be a net win, | because most projects really do have heaps of small simple | mistakes lying around. I'm sympathetic to the maintainers | though. They always seem to get the short end of the stick. | malcolmgreaves wrote: | You can also use flake8 to find this, and even more, errors in | Python code. | rikatee wrote: | flake8 does not currently support this check, as they are | concerned about the false positives from "what if the string it | later used in .format(...)" | | However, Code Review Doctor is more of a "this MIGHT be a | problem. have you considered..." rather than "it wrong" | bvinc wrote: | So they checked 666 python repositories and fixed bugs in 69 of | them. Interesting choice of numbers. | defterGoose wrote: | ...doing the devil's work. | racl101 wrote: | Nice! | InfiniteRand wrote: | Fixing F-strings | readthenotes1 wrote: | I find it ironic that the article points out that relying on | error from humans to find errors is something of a hit or miss | proposition and suggests that automating error finding is an | appropriate course instead of making it less likely to make the | error in the first place. | | For example, I wonder how many errors would have been found if | the definition of a format string was the default? That is, how | many times would people have written something like "hello | {previously-defined-variable}" and not meant to substitute the | value of that previously defined variable at runtime? | rikatee wrote: | what's also ironic is I left an easter egg in the code sample | for how we downloaded the list of repositories and no one has | noticed it yet. | swatcoder wrote: | To be fair, your suggestion might make for a more resilient | default, but it's also a great way to leak data and add | overhead for the default case. There are tradeoffs. | Someone wrote: | Not much overhead, I would think. We're talking about literal | strings in source code, not strings in general. It's not much | work to check those. | | One thing that it would break is that strings read from files | would be treated differently from those in source code, even | those read from files that logically "belong" to the | application (say config file) | | I don't think that's an issue, though. | | Also, in Swift _" \\(foo)"_ does string interpolation. I | haven't seen people complain it leaks data or makes Swift | slow (but then, it's not fast at compiling at all because of | its rather complicated type inference) | JadeNB wrote: | > Also, in Swift "\\(foo)" does string interpolation. I | haven't seen people complain it leaks data or makes Swift | slow (but then, it's not fast at compiling at all because | of its rather complicated type inference) | | I think that the claim is not that this leaks data in an | absolute sense, but rather that changing the behaviour | after people have come to rely on it will leak data from | currently well behaving applications. | boxed wrote: | This is how strings work in swift. It's a much superior system | imo. | MauranKilom wrote: | I don't think this makes sense. Plain strings and format | strings are not interchangeable, and using one where the other | was meant is probably a bug. | | Would you expect that a user input like "{secret} please" is | interpolated? If so, we hopefully agree that this would blow | major security holes into any python script processing | untrusted user input. And if not... Why not? | kgeist wrote: | >Would you expect that a user input like "{secret} please" is | interpolated? | | That's basically what the recent log4j security vulnerability | was all about. "Helpfully" interpolating logs by default. | noobermin wrote: | The assumption I'm thinking they mean is to make formatting | default and unformatted not default, for example, how "raw" | strings were treated, escaped characters are replaced with | the ascii code by default unless the string is raw, signified | by an 'r' prefixed in front. | Sohcahtoa82 wrote: | Adding that behavior would break existing code that uses | str.format, and Python tries to avoid breaking code between | minor releases. | asvitkine wrote: | If you only make it work with string literals (e.g. generate | the underlying formatting logic at parse time), it wouldn't | allow arbitrary inputs to be treated as f strings. | boxed wrote: | Look up how this works in Swift. They only have one string. | No raw strings or f strings. Yet they have all the power of | all three python string types and less syntax. It's very | nice. | HL33tibCe7 wrote: | That's not really a feasible solution in Python because that | change would break a load of existing code. | mschuster91 wrote: | So what? Raise a deprecation notice, treat it as a fatal | error in two or three years and that's it. PHP has been doing | this for years now. | krisoft wrote: | > treat it as a fatal error | | Did you think this through? What would you treat as a fatal | error? How would the compiler know if a particular string | is old style code wanting to print some characters between | curly braces or new style code wanting to string | interpolate a variable? | noobermin wrote: | We've done this too many times and we've had enough pain, | let's please proceed at a pace where we can worry about | delivering our product and not updating formatted strings, | thank you. | bornfreddy wrote: | Also see: Python 2 => 3 hell. Nobody wants to repeat that. | HL33tibCe7 wrote: | We're not talking about deprecating a feature here, we're | talking about the addition of behaviour that will break | existing code, potentially in non-trivial and hard to debug | ways, and in ways that could easily introduce security | vulnerabilities. | capitalsigma wrote: | Just bump the major version number from 3 to 4, right? How | long could that migration take? | dekhn wrote: | Python does not do this. A change like that would require a | major version number increment and the community would | revolt. | | Too bad we can't go back in time to 1996 or so. | swatcoder wrote: | As someone who would like to be working on new, interesting | things in 2-3 years rather than bringing old code into | conformance with breaking changes, this attitude captures a | worrisome trend in development. | | On the one hand, it's great that we have platforms that | innovate and improve and harden over time, but we're also | facing a development culture where more and more time is | spent servicing package/platform/language/OS changes that | have no material impact on our own otherwise-mature | projects. | | It's worth being judicious about where breaking changes are | applied, right? | mulmen wrote: | One person's "new interesting thing" is another person's | "breaking change". | macspoofing wrote: | >...and suggests that automating error finding is an | appropriate course instead of making it less likely to make the | error in the first place. | | You can't fix the syntax and standard lib of the language. It | is what it is. Similarly, how many bugs would you prevent if | Python had compiler support to catch those types of syntax (and | type) errors. | ggm wrote: | Maybe catenation of an fString and a string _should_ yield an | fString by type promotion? String is morally "any" so it feels | to me like a contextual narrowing of type. | zikohh wrote: | Can I use code review doctor with gitlab? If not what options do | I have? | mjs7231 wrote: | This was posted on Reddit earlier this week with similar negative | responses: | https://www.reddit.com/r/Python/comments/ubkvrd/10_of_the_66... | rikatee wrote: | FWIW, HN is much more positive (while also raising valid points | that will be taken into account going forward) | snapetom wrote: | I'd like to add better technical discussion, too. | fareesh wrote: | I like python although I don't use it too often. Would it be | unfairly critical of me to say that this is the outcome of a bad | design choice? Ideally languages should be designed in a way that | a bug like this which is so widespread and easy to create, should | be caught via some mechanism, either linting or some part of the | process. | noobermin wrote: | The f-strings are a recent (may be not so recent now) addition | to the language, so all the errors stem from it being "new" | where people's reflexes / carefulness hasn't adjusted to them | yet. | | I think in addition to the suggestion for linters, updating | IDE/editors to incorporate them would help. Syntax highlighting | is the primary reason not terminating strings isn't that common | of an error anymore, coloring it differently than a normal | string might help (or may be it would make things ugly, I don't | know). | polio wrote: | Most developers will require an arsenal of static analysis | tools to achieve maximum productivity. Linters are an example | of such a tool, but they don't exist as part of the language | spec itself, AFAIK. | f7fg_u-_h wrote: | dattboii wrote: | HL33tibCe7 wrote: | > > We may be looking too deep into this but it seems like many | developers think when string concatenation occurs it's enough to | declare the first string as an f-string and the other strings are | turned into f-strings by osmosis. It doesn't. We're not | suggesting this is the case for all developers that accidentally | did this error, but interesting nonetheless. | | I highly doubt that people believed that f-strings worked this | way. Far more likely is that, for example, the expression started | as one line, then got split onto two, or some such similar | scenario. | badrabbit wrote: | You'd be surprised, people who expect python to be "smart" and | "figure it out" might think that way. | groestl wrote: | Well, it's not completely unlogical. | | 'a' is str | | b'a' is bytes | | f'something' might be a separate f-str-type too? | | 1 is an int | | 1.2 is a float | | (1.2 + 1) is a float | GeorgeTirebiter wrote: | Indeed, if "{value} is bad" can be automatically f-stringed | by an external program automatically --- then why can't | Python do this automatically -- so we can get rid of the | f-string type as a required explicit declaration? After | all, we don't specifically add a type to a number like 42 | or 3.14159 --- those are implicitly 'int' and 'float' | types. | | I would use such a feature, as I always use f-strings when | formatting. | bagels wrote: | Backwards compatibility for one. Code existed before | fstrings that may use curly braces, and you can currently | use curly braces in non fstrings without escaping them. | | Might also be a performance penalty for always having to | run the fstring parser. | dragonwriter wrote: | If all strings with what looks like format specs are | implicitly f-strings, how do you use reusable template | strings, which use the same basic internal syntax, for | formatting? | | f-strings, like r-strings, have uses, but, like | r-strings, I wouldn't want to replace plain strings with | them. | masklinn wrote: | > Indeed, if "{value} is bad" can be automatically | f-stringed by an external program automatically --- then | why can't Python do this automatically -- so we can get | rid of the f-string type as a required explicit | declaration? | | Because it would break existing strings containing | braces, such as those used with `str.format`, or | string.Template, or literal Jinja templates, ... | GeorgeTirebiter wrote: | Yes, my use case was "I always use F-strings" so these | other breakages could not occur, by definition. | | I suppose it's not that much of a problem to run a | program to preprocess source and add in the F | | But.. Python has a history of introducing new features | that break old ones. That seems to me a balance between | backward compatibility and future goodness. | | the "from future import auto-fstring" construct could do | it... | | And, as for having to run the f-string parser: yes, but | only once on static strings, which are most of them. | chrisshroba wrote: | It's not clear to me that folks would _want_ this | behavior globally - I personally would not, because | sometimes I want to be able to include curly braces in my | strings without it being interpolated, and I prefer the | current syntax of having that just work. I 'm very | comfortable with having to add the 'f' to the string | syntax to declare that this _particular_ string should be | interpolated. | | There are other issues you'd have to deal with as well. | Would you interpolate non-literal strings (e.g. in the | code `print(input())`, if the user inputs the string | "{1+1}", what would be printed?)? How would you propose | dealing with jinja and other strings that typically | contain curly braces that should not be interpolated? | This could also be an issue with (potentially long) | strings containing lots of random characters: if a '}' | showed up in a string somewhere after a '{', even if | separated by tens or hundreds of characters, then you'll | run into errors. This could be a problem if you're | dealing with pseudorandom or base-92 encoded strings, or | even just strings representing code (imagine a python | library that generates C++ or Java code which has lots of | hard coded strings with braces). | | I think overall, having to specify that a particular | string should be interpolated is a better solution than | having to specify that a string should _not_ be | interpolated. | GeorgeTirebiter wrote: | Well, you could put a N" your string with{} in it" to | type-specify that you are not F-string. In fact, WHY are | strings not fully specified every time? There is nothing | wrong with F"This is also an f-string". After all, | strings are just number sequences with special meanings | assigned to some values in some circumstances. | | Seems to me the problem is similar to wanting 1 be | interpreted as an int --- if you want float, you change | the syntax (and therefore semantics) to 1. or 1.0 | | It's always possible to conjure corner-case failure | modes; but shouldn't the 'common case' be catered to, | more than some base-92 encoded strings? | | And, by the by, more 'smarts' can be applied to automatic | f-string determination. If "{variable-that-exists} | foobar" is seen it could plausibly be converted to an | f-string. | | This leads into a much longer discussion of how our | compilers/interpreters are too stupid today, and need to | up their game. But probably not here, not now.... and | also, thank you for your observations & comments. | dragonwriter wrote: | > It's always possible to conjure corner-case failure | modes; but shouldn't the 'common case' be catered to | | It is: normal strings are the common case. | ArchOversight wrote: | Because in some cases it is not desired, if for example | the string is used in something that expands it further | down the line (think SQL injection potential). | HL33tibCe7 wrote: | As a beginner, maybe. But this is code in some of the biggest | open-source Python repos in existence. Probably written by | someone with a reasonable level of Python expertise. | groestl wrote: | I think what's happening is that people assume there's type | coercion going on here. | jldugger wrote: | I wonder if an autoformatter like black is at play here. | saila wrote: | Black doesn't split strings, and I doubt they'd choose to use | concatenation if they did. | johnny_castaway wrote: | This may be also caused by confusing syntax highlighting in | some editors, for example in VSCode [1] | | The variable in the second string gets highlighted (with | slightly different color, but still) because it would still | work with `str.format()`. GitHub doesn't seem to do this. | | [1] https://imgur.com/a/9KGWVG0 | totony wrote: | Yeah I got bitten by the same-color syntax highlighting a few | times. | baisq wrote: | The comments on this submission are overwhelmingly negative. Why | are the comments on PVS-Studio submissions, on the other hand, | generally positive? | bayareabadboy wrote: ___________________________________________________________________ (page generated 2022-04-29 23:00 UTC)