[HN Gopher] Why Security Defects Go Unnoticed During Code Review...
       ___________________________________________________________________
        
       Why Security Defects Go Unnoticed During Code Reviews? [pdf]
        
       Author : chx
       Score  : 41 points
       Date   : 2021-02-04 19:33 UTC (3 hours ago)
        
 (HTM) web link (amiangshu.com)
 (TXT) w3m dump (amiangshu.com)
        
       | jdlyga wrote:
       | Is it a bikeshed problem? It's easy to pick out the obvious
       | problems, but not enough people know about how to spot security
       | vulnerabilities.
        
       | tptacek wrote:
       | I'd be a little cautious about this, as it seems to be plug-and-
       | chug from CWE, followed by a breakdown of the "CWEs least and
       | most likely to be detected in code review". I don't think CWE is
       | a particularly good taxonomy, and, in any case, apart from
       | obvious cases (yes, use of banned functions is easy to spot),
       | it's pretty much orthogonal from the real factors that make
       | vulnerabilities hard to spot in code review.
       | 
       | I haven't, like, done a study, but I have done _a lot_ of
       | security code review, and my instinct is that the basic issues
       | here are straightforward: bugs are hard to spot when they 're
       | subtle (ie, they involve a lot of interactions with other systems
       | outside the one under review) and when they involve a large
       | amount of context (like object lifecycle bugs).
       | 
       | Which, another issue with this study: I get why Chromium OS is an
       | interesting data set. But I don't think it's a particularly
       | representative case study. Another factor in effectiveness of
       | security review: how much of the whole system you can fit in your
       | head at once.
        
         | carlmr wrote:
         | >Another factor in effectiveness of security review: how much
         | of the whole system you can fit in your head at once.
         | 
         | This is why I think pure functions as much as possible, i.e. a
         | part of the functional programming mindset, is so important for
         | making code reviewable. Yes, you can make nice abstractions in
         | OOP, but at least in my experience OOP with it's stateful
         | objects interacting makes you need to know a lot more about the
         | system than pure functions.
         | 
         | And yes, it's not a panacea, and large allocations may take too
         | long to copy, which is why the next best thing is mostly
         | functional, most techniques don't work in every case.
        
         | WrtCdEvrydy wrote:
         | > how much of the whole system you can fit in your head at
         | once.
         | 
         | It's a consistent issue that's brought up... the majority of
         | bugs are found in large releases. The smaller the release, the
         | less likely you'll create a bug (this is where you get some
         | methodologies moving away from waterfall).
         | 
         | I wonder if the future of security practices will just be
         | "deliver small pieces of functionality / changes"
        
           | tomlagier wrote:
           | Doesn't that seem like a tautology? I think the metric needs
           | to be "bugs per feature" rather than "bugs per release" - it
           | seems obvious that more changes will be buggier than fewer
           | changes.
        
         | hsbauauvhabzb wrote:
         | +1. I got asked to focus on CWE top 25 in a node app. Problem
         | is, half the top 25 are not applicable to node (buffer
         | overflows etc, which matter in memory unsafe languages).
         | 
         | Uncontexualised security is bad security. Frequency of bugs is
         | not a good indicator of severity.
        
       | Spearchucker wrote:
       | Psychologically the enterprise software industry is actively
       | discouraged from creating secure code. The most-heard mantra is
       | to not do it because it's difficult, making it a pointless
       | exercise. Another is that security testing is too disruptive to
       | production systems. The software industry is good at recommending
       | specific design ciphers and algorithms, and ignoring symmetric
       | and public key protocols (Alice and Bob).
       | 
       | Also, many attacks today target security protocols rather than,
       | for example, cracking passwords.
       | 
       | Another huge impediment to creating secure systems is agile,
       | which discourages up-front design. The average sprint duration of
       | two weeks is too short for meaningful security testing. Testing
       | is not feasible with nightly builds or pre-release sanity checks,
       | either.
       | 
       | Product owners or customer representatives are too often non-
       | technical stakeholders that have the authority to make technical
       | design decisions. While never a good idea, this has bad
       | consequences on architecture and particularly on security.
        
       | hinkley wrote:
       | > According to the model, the likelihood of a security defect
       | being identified during code review declines with the increase in
       | the number of directories/files involved in that change.
       | 
       | I think most of us who understand code reviews know about this
       | problem and push back against it.
       | 
       | > Surprisingly, the likelihood of missing a vulnerability during
       | code review increased with a developer's reviewing experience.
       | 
       | Hold up. I need more information about 'experience'
       | 
       | > reviewer's coding experience and reviewer's reviewing
       | experience,
       | 
       | still no criteria...
       | 
       | > Another threat is the measure we take to calculate the
       | developer's experience. We can interpret the term "experience"in
       | many ways. And in many ways, measuring of experience will be
       | complex. For example, we cannot calculate the amount of
       | contribution of a developer to other projects. Although a
       | different experience measure may produce different results,we
       | believe our interpretation of experience is reasonable as that
       | reflects the amount of familiarity with current project.
       | 
       | Wrong "experience" and still no criteria. If I may proceed from a
       | notion that they mean the same class of experience even though
       | they didn't _define_ either of them until almost the conclusions,
       | experience is here measured as time on Google projects.
       | 
       | New people are trying to establish themselves. Finding a bug is
       | 'counting coup' and raises their status. They also bring in
       | outside notions of good code reviews that may be more diverse or
       | rich than those inside the group.
       | 
       | An 'experienced' reviewer may be suffering from any or all of the
       | following:
       | 
       | * under-training (outside experience/echo chamber)
       | 
       | * over-training (confirmation bias)
       | 
       | * over-extension (too many reviews, competing priorities)
       | 
       | * lack of familiarity
       | 
       | * lack of motivation
       | 
       | * burnout
       | 
       | * hubris
        
       | lmilcin wrote:
       | The cause is usually the same and it is looking for things you
       | can spot vs proving the code is correct.
       | 
       | Looking for things you can spot means you read the code as if it
       | was novel, and you discuss only the things that stand out or that
       | seem to be important.
       | 
       | Many people interpret "doing the code review better" as reading
       | the novel more slowly and spotting more things that stand out.
       | 
       | Imagine bridge engineering plans were reviewed this way. Somebody
       | would look at the numbers and react only if for some reason the
       | numbers looked incorrect.
       | 
       | Asked to do the review more thoroughly -- they would just stare
       | at the plans for longer, but never actually performed the
       | calculations, looked up the tables, consulted standards, etc.
        
         | leetcrew wrote:
         | the diff-based code review emphasizes very local changes to
         | code. it inherently directs one's attention toward what has
         | changed, rather than how the changes affect the larger system.
         | I notice this a lot with the (browser-based) tool we use at
         | work. if I expand the view to see the whole file, rather than
         | just the diff, it tends to make the page lag.
        
           | hinkley wrote:
           | Something that really cheeses me off about Atlassian
           | (bitbucket) code reviews is that by default they only show
           | you the changed lines.
           | 
           | You can't see that the code they changed also lives in
           | another function that they missed, or than in fact the whole
           | thing should be factored out into a function and fixed once.
        
           | lmilcin wrote:
           | That's why I never accept code based on review in web tool
           | unless the code is evidently correct. I usually check out the
           | branch and try to figure out how the code affects or is
           | affected by other, sometimes distant code.
           | 
           | Ideally, of course, everything should be hidden behind leak-
           | tight abstractions.
           | 
           | Unfortunately, that is rarely the case.
           | 
           | I recently had a developer reuse a function that generates
           | IDs for some other type of ID.
           | 
           | That function that generates IDs happened to use OS entropy
           | pool directly. It was used before, once every startup of the
           | application so it was deemed OK.
           | 
           | But now with high demand for IDs that is no longer
           | acceptable.
           | 
           | I noticed it because I followed the code to see how the
           | function _actually_ generates IDs. That would not happen in
           | web based tool because there simply isn 't convenient way to
           | follow the references from the new code.
           | 
           | That is just a simple example, I get things like that every
           | other day.
        
             | bluGill wrote:
             | I've never been a fan of any review tool I've used because
             | they all assume the code changed stands in isolation. Just
             | today I was in a review where a change on line 234 was
             | correct, except that it violated a bad assumption on line
             | 123 which needed to be fixed. Line 123 is where I wanted to
             | comment because that is where the problem should be fixed,
             | but the tool saw that as okay code and didn't even let me
             | see it, much less comment one it. It gets worse when the
             | bad assumption is in a different file.
             | 
             | And I often get frustrated because I see you call foo(),
             | but I need to see how foo works with the given arguments.
             | 
             | Though to be honest, most of the time I catch my self
             | playing grammar checker, if the style isn't violated and
             | you didn't make any of the dozens of checklist items you
             | pass (I work in C++, so I often catch people writing c++98
             | code when there is a c++14 construct that is better)
        
           | philipkglass wrote:
           | Github's PR review tool has some irritating limitations this
           | way. If a changed function could cause problems for an
           | unchanged caller, there's no way to add a comment on the
           | caller unless it's within a few lines of the changed code.
           | You cannot add comments on specific file lines that are "too
           | far" from a changed line.
           | 
           | My review flow now is to check out the branch under review,
           | read it in IDE with git annotate on, and do all my actual
           | review work there so I can see the full context. Then add
           | line-by-line comments on Github where I can. Comments
           | relating to older parts of the system that should change in
           | tandem with the current PR have to go in as "general"
           | comments, with file/line number included manually.
        
         | dkarl wrote:
         | Approaching from the mentality of doing a proof is valuable for
         | establishing anything about code, even when it isn't security-
         | sensitive. I think many programmers do this instinctively, and
         | some don't do it at all. You can spot the difference by asking
         | a question like,
         | 
         | "Why do no records get dropped in this step?"
         | 
         | Some programmers will have an argument resembling a logical
         | proof: "No records get dropped because for every record, we
         | call a method to write the record into the outgoing artifact,
         | and if that call fails, the entire stage aborts. The flow of
         | control never reaches the point where the outgoing artifact is
         | added to the database for delivery. The lack of a successful
         | delivery record for the batch triggers an alert after an hour
         | and ensures the system will keep reprocessing the batch until
         | it succeeds, or until the batch gets manually flagged as bad."
         | 
         | Other programmers will say something like: "Well, all the
         | records get passed in, and we write them all to the artifact.
         | We had some bugs in this code, but they haven't been hard to
         | fix, and anyway, if something goes wrong, the process is
         | retried later."
         | 
         | This has been a big thing on my mind recently because I've been
         | asked to do a design review for a system created by some high-
         | SLOC-productivity people who built a complex system with a
         | sophisticated-looking architecture but who keep giving me
         | answers like the second one. There's a bunch of cool tech but
         | no coherent explanation of why they they believe the system
         | will work reliably. It's just, well, there have been some bugs,
         | and of course it doesn't work when there are bugs, but we're
         | fixing them.
         | 
         | Until you have an argument for why a system works, there's no
         | coherent way to understand why it doesn't work, and what the
         | effort will be to achieve different aspects of reliability.
         | It's just "bugs." With such an argument, you can target
         | specific aspects of reliability instead of playing the "it'll
         | work when the bugs are gone" eternal game of whack-a-mole.
         | 
         | Granted, this isn't very rigorous, but 99% of the time, it's
         | the closest you'll get to "proof" in application code.
        
       | hinkley wrote:
       | Code reviews have become a social thing, and bagging on someone
       | with a litany of issues in front of a group is never acceptable
       | behavior. If they've done ten things wrong, you have to pick 3 of
       | them and let the rest go, or decide to kick them out of the group
       | entirely.
       | 
       | There was a period for me when I 'solved' this code review
       | problem the way many of us solved class participation in grade
       | school - it's not a competition, it's a learning environment. It
       | doesn't matter to the class if you know the answer, but it's
       | really awkward if nobody knows the answer. So you say the answer
       | in your head, wait to see if anybody else volunteers that
       | information, and then put your hand up if they don't. Or you
       | count to five. (I suspect based on hands and grades, at least a
       | couple of my classmates knew all the answers but were too shy to
       | say anything, and this also appears to be true in code reviews.)
       | 
       | Code reviews are typically but not always asynchronous, so you
       | can choose to wait for someone else to point out the obvious
       | stuff and then just mention the things nobody else saw. The
       | danger there is if an argument starts about one of these issues
       | and distracts from others. Like a real conversation, you can run
       | out of time for it before the issue you cared about gets
       | sufficient airtime.
        
         | Sodman wrote:
         | Style nits aside, I don't think it's a good idea to selectively
         | pick the worst N things in a PR and let the rest go. If you're
         | consistently finding 10+ things to request changes on in
         | somebody's pull requests, you probably need to look at your
         | engineering process. Was the high level approach discussed
         | before it was fully coded and "ready for review"? This is
         | something easily fixed by process changes.
         | 
         | Are you finding a lot of little things in the PR itself? Many
         | of the usual suspects can and should be automated away
         | (linting, codegen, test coverage reports, static analysis,
         | etc). Others can easily be added to a checklist before PRs are
         | submitted for review, eg "Have you updated the relevant
         | documentation".
         | 
         | Usually newer employees will get more comments on their PRs, as
         | they're not as familiar with the system and might fall into
         | more traps that others know about from experience. This can be
         | mitigated somewhat by pairing/mentoring.
         | 
         | Sure some people just aren't going to cut it on your team, but
         | most engineers when corrected on an individual coding mistake
         | won't repeatedly make the same mistake, in my experience. If
         | you just never mention the mistake, they'll keep making it in
         | every PR and be completely unaware. Sometimes it helps to
         | communicate out of band (ie, not in "official" comments on the
         | PR itself) if there are some low hanging gotchas you want to
         | make the engineer aware of.
        
         | leetcrew wrote:
         | > Code reviews have become a social thing, and bagging on
         | someone with a litany of issues in front of a group is never
         | acceptable behavior. If they've done ten things wrong, you have
         | to pick 3 of them and let the rest go, or decide to kick them
         | out of the group entirely.
         | 
         | I wonder if we overestimate how sensitive people are in this
         | case. if someone raises ten objections on my review and makes
         | it clear what I have to do to get their approval, I'm happy.
         | I'll get right to work on fixing it. if this happens in front
         | of a group (esp involving skip-levels / managers of other
         | teams), I take it as a signal that I should have run the change
         | by a teammate or my direct manager before involving so many
         | people.
         | 
         | the only thing I actually dislike is when people comment about
         | vague concerns that leave it unclear whether I actually have to
         | address them in that change to get their approval. sometimes
         | this causes a circular waiting deadlock with the other
         | reviewers, which is especially frustrating.
        
           | wccrawford wrote:
           | It depends on both the person and the environment.
           | 
           | In an environment that places stress on those who
           | underperform, anything that calls attention to performance
           | problems will be _hated_. Anyone that points out those
           | problems will be disliked. Even the best people will fall to
           | this stress.
           | 
           | In a more relaxed environment that isn't constantly counting
           | points towards raises and promotions, people can relax and
           | not worry about the problems that are raised. However, not
           | everyone can manage this. Some people have just had too many
           | jobs of the previous type, or other experiences like that.
           | They just can't adjust.
           | 
           | IMO, it also means they have a much harder time fixing their
           | own problems because they can't tolerate other people
           | pointing out those problems.
           | 
           | As for code reviews, I always call it like I see it. It's my
           | _job_ to make sure the code will work and be secure. I 'm not
           | going to let things slip by that violate that, no matter how
           | much I want that person to be happy. And I'll still point out
           | matters of style as well, though I usually won't insist they
           | be corrected, unless the person has repeatedly had the same
           | problem. (My people are generally great, though, and will fix
           | even my suggestions. So it's generally not a thing.)
           | 
           | You'd probably hate that last bit of my code reviews. If we
           | were on a team, my advice would be to simply discuss that
           | part with me in real time. We could decide together if it was
           | worth the effort to make those changes. Async communications
           | like PRs and emails simply aren't fit for that kind of
           | discussion.
        
             | hinkley wrote:
             | It also depends if everyone else pointed out 2 things and
             | you pointed out 10. Letting other people go first means you
             | only have to point out 5, which is still lopsided but not
             | 'holy shit, man' lopsided.
        
             | nicoburns wrote:
             | > In an environment that places stress on those who
             | underperform, anything that calls attention to performance
             | problems will be hated. Anyone that points out those
             | problems will be disliked. Even the best people will fall
             | to this stress.
             | 
             | Seems to me that this might be the problem rather than the
             | code review itself!
             | 
             | > We could decide together if it was worth the effort to
             | make those changes. Async communications like PRs and
             | emails simply aren't fit for that kind of discussion.
             | 
             | Indeed. I've definitely found myself falling into a pattern
             | of failing the review for serious issues with comments on
             | the PR, but just approving the review and slacking the
             | comments for small issues.
        
       ___________________________________________________________________
       (page generated 2021-02-04 23:00 UTC)