[HN Gopher] Why Security Defects Go Unnoticed During Code Review... ___________________________________________________________________ Why Security Defects Go Unnoticed During Code Reviews? [pdf] Author : chx Score : 41 points Date : 2021-02-04 19:33 UTC (3 hours ago) (HTM) web link (amiangshu.com) (TXT) w3m dump (amiangshu.com) | jdlyga wrote: | Is it a bikeshed problem? It's easy to pick out the obvious | problems, but not enough people know about how to spot security | vulnerabilities. | tptacek wrote: | I'd be a little cautious about this, as it seems to be plug-and- | chug from CWE, followed by a breakdown of the "CWEs least and | most likely to be detected in code review". I don't think CWE is | a particularly good taxonomy, and, in any case, apart from | obvious cases (yes, use of banned functions is easy to spot), | it's pretty much orthogonal from the real factors that make | vulnerabilities hard to spot in code review. | | I haven't, like, done a study, but I have done _a lot_ of | security code review, and my instinct is that the basic issues | here are straightforward: bugs are hard to spot when they 're | subtle (ie, they involve a lot of interactions with other systems | outside the one under review) and when they involve a large | amount of context (like object lifecycle bugs). | | Which, another issue with this study: I get why Chromium OS is an | interesting data set. But I don't think it's a particularly | representative case study. Another factor in effectiveness of | security review: how much of the whole system you can fit in your | head at once. | carlmr wrote: | >Another factor in effectiveness of security review: how much | of the whole system you can fit in your head at once. | | This is why I think pure functions as much as possible, i.e. a | part of the functional programming mindset, is so important for | making code reviewable. Yes, you can make nice abstractions in | OOP, but at least in my experience OOP with it's stateful | objects interacting makes you need to know a lot more about the | system than pure functions. | | And yes, it's not a panacea, and large allocations may take too | long to copy, which is why the next best thing is mostly | functional, most techniques don't work in every case. | WrtCdEvrydy wrote: | > how much of the whole system you can fit in your head at | once. | | It's a consistent issue that's brought up... the majority of | bugs are found in large releases. The smaller the release, the | less likely you'll create a bug (this is where you get some | methodologies moving away from waterfall). | | I wonder if the future of security practices will just be | "deliver small pieces of functionality / changes" | tomlagier wrote: | Doesn't that seem like a tautology? I think the metric needs | to be "bugs per feature" rather than "bugs per release" - it | seems obvious that more changes will be buggier than fewer | changes. | hsbauauvhabzb wrote: | +1. I got asked to focus on CWE top 25 in a node app. Problem | is, half the top 25 are not applicable to node (buffer | overflows etc, which matter in memory unsafe languages). | | Uncontexualised security is bad security. Frequency of bugs is | not a good indicator of severity. | Spearchucker wrote: | Psychologically the enterprise software industry is actively | discouraged from creating secure code. The most-heard mantra is | to not do it because it's difficult, making it a pointless | exercise. Another is that security testing is too disruptive to | production systems. The software industry is good at recommending | specific design ciphers and algorithms, and ignoring symmetric | and public key protocols (Alice and Bob). | | Also, many attacks today target security protocols rather than, | for example, cracking passwords. | | Another huge impediment to creating secure systems is agile, | which discourages up-front design. The average sprint duration of | two weeks is too short for meaningful security testing. Testing | is not feasible with nightly builds or pre-release sanity checks, | either. | | Product owners or customer representatives are too often non- | technical stakeholders that have the authority to make technical | design decisions. While never a good idea, this has bad | consequences on architecture and particularly on security. | hinkley wrote: | > According to the model, the likelihood of a security defect | being identified during code review declines with the increase in | the number of directories/files involved in that change. | | I think most of us who understand code reviews know about this | problem and push back against it. | | > Surprisingly, the likelihood of missing a vulnerability during | code review increased with a developer's reviewing experience. | | Hold up. I need more information about 'experience' | | > reviewer's coding experience and reviewer's reviewing | experience, | | still no criteria... | | > Another threat is the measure we take to calculate the | developer's experience. We can interpret the term "experience"in | many ways. And in many ways, measuring of experience will be | complex. For example, we cannot calculate the amount of | contribution of a developer to other projects. Although a | different experience measure may produce different results,we | believe our interpretation of experience is reasonable as that | reflects the amount of familiarity with current project. | | Wrong "experience" and still no criteria. If I may proceed from a | notion that they mean the same class of experience even though | they didn't _define_ either of them until almost the conclusions, | experience is here measured as time on Google projects. | | New people are trying to establish themselves. Finding a bug is | 'counting coup' and raises their status. They also bring in | outside notions of good code reviews that may be more diverse or | rich than those inside the group. | | An 'experienced' reviewer may be suffering from any or all of the | following: | | * under-training (outside experience/echo chamber) | | * over-training (confirmation bias) | | * over-extension (too many reviews, competing priorities) | | * lack of familiarity | | * lack of motivation | | * burnout | | * hubris | lmilcin wrote: | The cause is usually the same and it is looking for things you | can spot vs proving the code is correct. | | Looking for things you can spot means you read the code as if it | was novel, and you discuss only the things that stand out or that | seem to be important. | | Many people interpret "doing the code review better" as reading | the novel more slowly and spotting more things that stand out. | | Imagine bridge engineering plans were reviewed this way. Somebody | would look at the numbers and react only if for some reason the | numbers looked incorrect. | | Asked to do the review more thoroughly -- they would just stare | at the plans for longer, but never actually performed the | calculations, looked up the tables, consulted standards, etc. | leetcrew wrote: | the diff-based code review emphasizes very local changes to | code. it inherently directs one's attention toward what has | changed, rather than how the changes affect the larger system. | I notice this a lot with the (browser-based) tool we use at | work. if I expand the view to see the whole file, rather than | just the diff, it tends to make the page lag. | hinkley wrote: | Something that really cheeses me off about Atlassian | (bitbucket) code reviews is that by default they only show | you the changed lines. | | You can't see that the code they changed also lives in | another function that they missed, or than in fact the whole | thing should be factored out into a function and fixed once. | lmilcin wrote: | That's why I never accept code based on review in web tool | unless the code is evidently correct. I usually check out the | branch and try to figure out how the code affects or is | affected by other, sometimes distant code. | | Ideally, of course, everything should be hidden behind leak- | tight abstractions. | | Unfortunately, that is rarely the case. | | I recently had a developer reuse a function that generates | IDs for some other type of ID. | | That function that generates IDs happened to use OS entropy | pool directly. It was used before, once every startup of the | application so it was deemed OK. | | But now with high demand for IDs that is no longer | acceptable. | | I noticed it because I followed the code to see how the | function _actually_ generates IDs. That would not happen in | web based tool because there simply isn 't convenient way to | follow the references from the new code. | | That is just a simple example, I get things like that every | other day. | bluGill wrote: | I've never been a fan of any review tool I've used because | they all assume the code changed stands in isolation. Just | today I was in a review where a change on line 234 was | correct, except that it violated a bad assumption on line | 123 which needed to be fixed. Line 123 is where I wanted to | comment because that is where the problem should be fixed, | but the tool saw that as okay code and didn't even let me | see it, much less comment one it. It gets worse when the | bad assumption is in a different file. | | And I often get frustrated because I see you call foo(), | but I need to see how foo works with the given arguments. | | Though to be honest, most of the time I catch my self | playing grammar checker, if the style isn't violated and | you didn't make any of the dozens of checklist items you | pass (I work in C++, so I often catch people writing c++98 | code when there is a c++14 construct that is better) | philipkglass wrote: | Github's PR review tool has some irritating limitations this | way. If a changed function could cause problems for an | unchanged caller, there's no way to add a comment on the | caller unless it's within a few lines of the changed code. | You cannot add comments on specific file lines that are "too | far" from a changed line. | | My review flow now is to check out the branch under review, | read it in IDE with git annotate on, and do all my actual | review work there so I can see the full context. Then add | line-by-line comments on Github where I can. Comments | relating to older parts of the system that should change in | tandem with the current PR have to go in as "general" | comments, with file/line number included manually. | dkarl wrote: | Approaching from the mentality of doing a proof is valuable for | establishing anything about code, even when it isn't security- | sensitive. I think many programmers do this instinctively, and | some don't do it at all. You can spot the difference by asking | a question like, | | "Why do no records get dropped in this step?" | | Some programmers will have an argument resembling a logical | proof: "No records get dropped because for every record, we | call a method to write the record into the outgoing artifact, | and if that call fails, the entire stage aborts. The flow of | control never reaches the point where the outgoing artifact is | added to the database for delivery. The lack of a successful | delivery record for the batch triggers an alert after an hour | and ensures the system will keep reprocessing the batch until | it succeeds, or until the batch gets manually flagged as bad." | | Other programmers will say something like: "Well, all the | records get passed in, and we write them all to the artifact. | We had some bugs in this code, but they haven't been hard to | fix, and anyway, if something goes wrong, the process is | retried later." | | This has been a big thing on my mind recently because I've been | asked to do a design review for a system created by some high- | SLOC-productivity people who built a complex system with a | sophisticated-looking architecture but who keep giving me | answers like the second one. There's a bunch of cool tech but | no coherent explanation of why they they believe the system | will work reliably. It's just, well, there have been some bugs, | and of course it doesn't work when there are bugs, but we're | fixing them. | | Until you have an argument for why a system works, there's no | coherent way to understand why it doesn't work, and what the | effort will be to achieve different aspects of reliability. | It's just "bugs." With such an argument, you can target | specific aspects of reliability instead of playing the "it'll | work when the bugs are gone" eternal game of whack-a-mole. | | Granted, this isn't very rigorous, but 99% of the time, it's | the closest you'll get to "proof" in application code. | hinkley wrote: | Code reviews have become a social thing, and bagging on someone | with a litany of issues in front of a group is never acceptable | behavior. If they've done ten things wrong, you have to pick 3 of | them and let the rest go, or decide to kick them out of the group | entirely. | | There was a period for me when I 'solved' this code review | problem the way many of us solved class participation in grade | school - it's not a competition, it's a learning environment. It | doesn't matter to the class if you know the answer, but it's | really awkward if nobody knows the answer. So you say the answer | in your head, wait to see if anybody else volunteers that | information, and then put your hand up if they don't. Or you | count to five. (I suspect based on hands and grades, at least a | couple of my classmates knew all the answers but were too shy to | say anything, and this also appears to be true in code reviews.) | | Code reviews are typically but not always asynchronous, so you | can choose to wait for someone else to point out the obvious | stuff and then just mention the things nobody else saw. The | danger there is if an argument starts about one of these issues | and distracts from others. Like a real conversation, you can run | out of time for it before the issue you cared about gets | sufficient airtime. | Sodman wrote: | Style nits aside, I don't think it's a good idea to selectively | pick the worst N things in a PR and let the rest go. If you're | consistently finding 10+ things to request changes on in | somebody's pull requests, you probably need to look at your | engineering process. Was the high level approach discussed | before it was fully coded and "ready for review"? This is | something easily fixed by process changes. | | Are you finding a lot of little things in the PR itself? Many | of the usual suspects can and should be automated away | (linting, codegen, test coverage reports, static analysis, | etc). Others can easily be added to a checklist before PRs are | submitted for review, eg "Have you updated the relevant | documentation". | | Usually newer employees will get more comments on their PRs, as | they're not as familiar with the system and might fall into | more traps that others know about from experience. This can be | mitigated somewhat by pairing/mentoring. | | Sure some people just aren't going to cut it on your team, but | most engineers when corrected on an individual coding mistake | won't repeatedly make the same mistake, in my experience. If | you just never mention the mistake, they'll keep making it in | every PR and be completely unaware. Sometimes it helps to | communicate out of band (ie, not in "official" comments on the | PR itself) if there are some low hanging gotchas you want to | make the engineer aware of. | leetcrew wrote: | > Code reviews have become a social thing, and bagging on | someone with a litany of issues in front of a group is never | acceptable behavior. If they've done ten things wrong, you have | to pick 3 of them and let the rest go, or decide to kick them | out of the group entirely. | | I wonder if we overestimate how sensitive people are in this | case. if someone raises ten objections on my review and makes | it clear what I have to do to get their approval, I'm happy. | I'll get right to work on fixing it. if this happens in front | of a group (esp involving skip-levels / managers of other | teams), I take it as a signal that I should have run the change | by a teammate or my direct manager before involving so many | people. | | the only thing I actually dislike is when people comment about | vague concerns that leave it unclear whether I actually have to | address them in that change to get their approval. sometimes | this causes a circular waiting deadlock with the other | reviewers, which is especially frustrating. | wccrawford wrote: | It depends on both the person and the environment. | | In an environment that places stress on those who | underperform, anything that calls attention to performance | problems will be _hated_. Anyone that points out those | problems will be disliked. Even the best people will fall to | this stress. | | In a more relaxed environment that isn't constantly counting | points towards raises and promotions, people can relax and | not worry about the problems that are raised. However, not | everyone can manage this. Some people have just had too many | jobs of the previous type, or other experiences like that. | They just can't adjust. | | IMO, it also means they have a much harder time fixing their | own problems because they can't tolerate other people | pointing out those problems. | | As for code reviews, I always call it like I see it. It's my | _job_ to make sure the code will work and be secure. I 'm not | going to let things slip by that violate that, no matter how | much I want that person to be happy. And I'll still point out | matters of style as well, though I usually won't insist they | be corrected, unless the person has repeatedly had the same | problem. (My people are generally great, though, and will fix | even my suggestions. So it's generally not a thing.) | | You'd probably hate that last bit of my code reviews. If we | were on a team, my advice would be to simply discuss that | part with me in real time. We could decide together if it was | worth the effort to make those changes. Async communications | like PRs and emails simply aren't fit for that kind of | discussion. | hinkley wrote: | It also depends if everyone else pointed out 2 things and | you pointed out 10. Letting other people go first means you | only have to point out 5, which is still lopsided but not | 'holy shit, man' lopsided. | nicoburns wrote: | > In an environment that places stress on those who | underperform, anything that calls attention to performance | problems will be hated. Anyone that points out those | problems will be disliked. Even the best people will fall | to this stress. | | Seems to me that this might be the problem rather than the | code review itself! | | > We could decide together if it was worth the effort to | make those changes. Async communications like PRs and | emails simply aren't fit for that kind of discussion. | | Indeed. I've definitely found myself falling into a pattern | of failing the review for serious issues with comments on | the PR, but just approving the review and slacking the | comments for small issues. ___________________________________________________________________ (page generated 2021-02-04 23:00 UTC)