[HN Gopher] Static Analysis in GCC 10
       ___________________________________________________________________
        
       Static Analysis in GCC 10
        
       Author : fanf2
       Score  : 279 points
       Date   : 2020-03-28 14:42 UTC (8 hours ago)
        
 (HTM) web link (developers.redhat.com)
 (TXT) w3m dump (developers.redhat.com)
        
       | thekhatribharat wrote:
       | I believe _type sytems /type theory_ is likely going to be the
       | most popular method for _formal verification of programs_ (aka
       | _static analysis_ ). And of course, there's a limit to what you
       | can _prove_ about programs (ref: Rice 's theorem).
        
         | jfkebwjsbx wrote:
         | Static analysis is not formal verification.
        
           | thekhatribharat wrote:
           | Sure, things like enforcing style guides, etc. can be seen as
           | _lightweight_ formal verification.
        
             | irundebian wrote:
             | Yes, but you wrote "formal verification of programs (aka
             | static analysis)". Formal verification is not also known as
             | static analysis.
        
           | exdsq wrote:
           | It's part of the field of formal methods.
        
       | brundolf wrote:
       | I have to wonder if Rust is putting pressure on C/++ to have more
       | static analysis (while at the same time blazing trails in what's
       | possible in that space, and what's possible in terms of error
       | message helpfulness). I think it's a great idea to start baking
       | these things into the compiler, even if it will never be 100%
       | free of false-negatives because of the limitations of what the
       | language can express and guarantee. Still seems like a great way
       | to eliminate a lot of common problems, as a default across the
       | ecosystem instead of as an extra step.
        
         | rurban wrote:
         | It's clang, not rust.
         | 
         | And clang's analyzer has different UI concept via web, which is
         | far superior. And for the screen valgrind has a far superior
         | solution. I don't see the advantage of gcc's analyzer yet. Far
         | too verbose. and the most important errors, like wrong
         | optimizer decisions based on their interpretation of UB code
         | are still silenced.
        
           | TwoBit wrote:
           | I'd love to see a compiler warn me that it's doing something
           | potentially unexpected due to UB considerations.
        
             | masklinn wrote:
             | That is intrinsically limited because UB are runtime issues
             | (otherwise they'd be compiler errors) and the compiler
             | simply assumes they don't happen, plus many of the "UB-
             | related" optimisations (in the sense that these
             | optimisations change the behaviour of non-well-formed
             | programs) are surfaced by other optimisations (largely
             | inlining).
        
             | saagarjha wrote:
             | Luckily, compiler authors have gotten the message and are
             | starting to improve on this situation slowly.
        
             | ali_m wrote:
             | clang has UBSan, which adds runtime checks for detecting
             | various kinds of undefined behaviour:
             | https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
        
               | codys wrote:
               | gcc also has ubsan since gcc version 4.9. Many of the
               | sanitizers are shared by the clang & gcc projects. Some
               | functionality, like emitting errors on unsigned overflow,
               | is confined to clang.
        
           | Tempest1981 wrote:
           | Yes, I tried the ReSharper Inspector tool, and it uses clang
           | to find all sorts of code issues, with a friendly UI.
        
         | thekhatribharat wrote:
         | I think you mean "false-positives". Most static analysis tools
         | are "conservative": will throw more warnings than they can
         | prove. But assuming they're "sound" (and don't "give up"
         | because of time/space constraints), there'd be no "false-
         | negatives".
        
           | UncleMeat wrote:
           | Vanishingly few (maybe even zero) industrial static analysis
           | tools are sound. FPs are too common, especially for something
           | like C++ where common code patterns can only be soundly
           | handled by just forcing everything to Top. Good luck with
           | your sound interprocedural pointer analysis as soon as
           | somebody makes an indirect call.
        
             | thekhatribharat wrote:
             | I think you mean industrial static analysis tools aren't
             | _complete_. _Sound_ means no FNs, _Complete_ means no FPs.
        
               | UncleMeat wrote:
               | No I meant what I said. Virtually no tool is sound (zero
               | false negatives, lots of false positives) because in
               | practice false positives just make people stop using
               | tools when they are too common.
        
               | mgerdts wrote:
               | At my previous job, we used an internally developed
               | static analysis tool. While false positives were a
               | problem, the team responsible for the tool took bug
               | reports and fixed these issues. There was a way to
               | silence false positives as well as a mode to only see new
               | (potential) problems. Once code was clean, the false
               | positive rate was low enough to not really be a problem.
               | 
               | The bar of zero false negatives is not useful for me when
               | the alternative is to rely solely human code analysis and
               | tests.
               | 
               | One thing I learned from using this tool is that static
               | analyzers need to be able to configured to know about
               | side effects of functions that are in libraries. For
               | example, if a library function will free memory, it is
               | important that there's a way to tell the analyzer about
               | this behavior to avoid false positives about leaks.
               | 
               | Proper design, careful coding, static analysis, code
               | review, and various forms of testing all have roles to
               | play in delivering reliable code.
        
               | UncleMeat wrote:
               | But you can't fix all false positives with bug reports.
               | Its _fundamental_ if you want soundness. And for lots of
               | properties (like I said, pointer analysis in C++) the
               | only scalable and sound techniques basically havoc as
               | soon as interesting things start to happen. The community
               | even has a term (soundiness) to describe how everybody
               | throws out certain kinds of soundness (indirect calls
               | through function pointers, reflection in java, tons of
               | features in javascript, etc).
               | 
               | Requiring the user to add piles of annotations still
               | sucks, even if you are smart and do something like
               | abductive inference to identify minimally required
               | annotations. Even something as simple as the checker
               | framework in Java has trouble getting people to write
               | correct annotations and has a few corner cases (what is
               | the nullness property of a field in code reachable from a
               | constructor, for example). Nobody does this stuff
               | correctly for something like dataflow annotations on
               | methods.
               | 
               | I've got a PhD in this and develop a static analysis tool
               | for my job. I've _never_ seen a productionized, fully
               | sound tool for general purpose static analysis of general
               | programs that solves problems more difficult that
               | slightly augmented type checking.
        
               | thekhatribharat wrote:
               | So if industrial static analysis tools aren't _sound_ ,
               | which means they produce _false negatives_ (ie. they can
               | pass programs with logical errors they 're meant to
               | detect), how do you use them to assure code quality?
        
         | pjmlp wrote:
         | Internet and ramping project costs due to CVE fixes are what is
         | putting pressure into adopting static analysis and hardware
         | memory tagging, not Rust.
         | 
         | https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...
         | 
         | https://security.googleblog.com/2019/08/adopting-arm-memory-...
         | 
         | Lint exists since 1979.
        
           | saagarjha wrote:
           | Hardware memory tagging is not a panacea, it just catches
           | egregious memory errors.
        
             | pjmlp wrote:
             | Indeed, yet it does a better job than the mythical
             | developer that never gets pointer arithmetic wrong.
        
       | [deleted]
        
       | ape4 wrote:
       | You don't want that option every time since its slower. But I
       | wonder if there would be a smart way to run it occasionally, like
       | an option to -fanalyzer every 10th time or when the size of a
       | source file changes a lot, etc.
        
         | brundolf wrote:
         | That seems out of scope for the compiler itself. Dev
         | tools/workflows could easily set up something like this as
         | needed.
        
         | saagarjha wrote:
         | Perhaps as part of your CI?
        
       | [deleted]
        
       | archi42 wrote:
       | Oh, that's really nice. Though, as a user one should remember:
       | The approach described here gives up at some point. So it doesn't
       | prove the absence of a bug class (e.g. double free), but it finds
       | some instances. Which is already a very good thing, and hugely
       | non-trivial.
       | 
       | The problem with "not giving up at some point" is the
       | computational complexity: Analyzing big code bases takes half an
       | eternity (days), while using huge amounts of memory (>128GB). And
       | once you enter the "least-defined state", you either throw lots
       | of false positives (which gives the users a hard time) or you
       | need to "give up" (and hence potentially miss bugs).
       | 
       | Disclaimer: I work for a company that builds static analysis
       | tools. I don't see this as competition, though. Our tools are
       | used in industries where "safety-critical" is _really_ important
       | - so the  "giving up"-part of the analysis is no option for us,
       | and solely relying only on GCC isn't an option for our customers
       | either ;-)
        
         | s3cur3 wrote:
         | I'd be interested to know which company (if you're free to
         | say). IIUC, this is very different from the way Coverity (the
         | biggest commercial player I'm aware of in this space) operates.
        
           | yitchelle wrote:
           | Interestingly, different industry seems to have its own
           | leading tool even though they all check for the same type of
           | risks or errors. Q-AC is extremely prevalent in the
           | Automotive Industry, Polyspace's Code Prover is also quite
           | popular as well.
        
         | a1369209993 wrote:
         | Per Rice's theorem, it's _not_ giving up that 's not a option;
         | it's just a question of whether you have false positives or
         | false negatives. (To be fair, for safety-critical code,
         | insisting on only false positives (ie treating anything you
         | give up on as a positive) is a pretty resonable choice.)
        
           | nullc wrote:
           | A tool like this could be sound but incomplete. E.g. return
           | true, false, or idunno.
        
           | progval wrote:
           | I'm guessing that by "giving up", GP meants having a false
           | negative
        
       | saagarjha wrote:
       | Nice, this looks pretty cool! It seems a bit like Clang's static
       | analyzer: https://clang-analyzer.llvm.org/
        
       | mshockwave wrote:
       | Just a side note that the equivalent solution in LLVM/Clang is
       | Clang Static Analyzer: https://clang-analyzer.llvm.org . But it's
       | an external tool rather than integrating into clang
        
       | marta_morena_23 wrote:
       | "My thinking here is that it's best to catch problems as early as
       | possible as the code is written, using the compiler the code is
       | written in as part of the compile-edit-debug cycle, rather than
       | having static analysis as an extra tool "on the side" (perhaps
       | proprietary)."
       | 
       | Really? Your thinking? I mean seriously, why do you have to start
       | off a good article and feature with a sentence like this? This is
       | not your thinking, and if it was, you have been living under a
       | rock for the past decades... Why the need to make it sound as if
       | this decade long idea was yours?
        
         | raegis wrote:
         | Let's all try to be nice to each other. Everybody's stuck at
         | home and frustrated right now.
        
           | glouwbug wrote:
           | New users typically think this is Reddit. Best we can do is
           | encourage civil discourse
        
         | glouwbug wrote:
         | It probably was their idea, but you are right, it was the idea
         | others before their idea.
         | 
         | Difference is, their idea came to fruition, and their idea has
         | contributed to what is arguably the world's most important
         | piece of software
        
         | yjftsjthsd-h wrote:
         | Are they only allowed to think thoughts that nobody else has
         | had?
        
         | dang wrote:
         | Could you please not post in the flamewar style to HN? It's not
         | what this site is for, and if you'll read
         | https://news.ycombinator.com/newsguidelines.html and
         | https://news.ycombinator.com/newswelcome.html, you'll see that
         | we're actively trying for a different sort of discussion.
        
       | google234123 wrote:
       | How far behind LLVM/Clang is this?
        
       | jchw wrote:
       | > As of GCC 10, that option text is now a clickable hyperlink
       | (again, assuming a sufficiently capable terminal)
       | 
       | Seems like mostly only GNOME Terminal and iTerm2. Here's some
       | that apparently won't work:
       | 
       | - Konsole
       | 
       | - Kitty
       | 
       | - LXDE Terminal
       | 
       | - MATE Terminal
       | 
       | - hyper
       | 
       | - Windows Terminal
       | 
       | - ConEmu
       | 
       | - PuTTY
       | 
       | ... so it's kind of weird to suggest this is an accepted
       | standard. Especially since some of the discussions in feature
       | requests suggest they will likely not implement it due to
       | security concerns or otherwise.
        
         | guerrilla wrote:
         | how does on produce such a link in output? (not that i really
         | want this bloat and increased attack surface)
        
           | jchw wrote:
           | There's a fair bit of documentation here. I'd try it but I
           | don't have any supported terminals on any of my machines...
           | (usually I use kitty, but I have a couple machines with KDE
           | and Konsole.)
           | 
           | https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3c.
           | ..
        
           | bonzini wrote:
           | Escape sequences. It's similar to the "tell the emulator
           | about the current directory" feature that is used to open new
           | windows on the current directory.
        
             | labawi wrote:
             | Does the CWD feature actually use terminal escape
             | sequences?
             | 
             | I always assumed the emulator accessed the working
             | directory of the child process (as in /proc/$PID/cwd). On
             | my terminal the CWD feature only seems to work for the
             | topmost shell, and symlinks get resolved.
             | 
             | EDIT: Linked bug report mentions OCS7 (presumably an escape
             | sequence), as one of the ways to track CWD.
        
         | jbk wrote:
         | Konsole had security concerns about this feature:
         | https://bugs.kde.org/show_bug.cgi?id=379294
        
         | zzo38computer wrote:
         | Another that probably won't work is xterm (which I currently
         | use). (And if I implement my own terminal emulator, I probably
         | won't implement it either, as I would probably mainly
         | implementing the features of xterm and of DEC terminals.)
        
         | cowsandmilk wrote:
         | I don't see anything suggesting it is a standard. It seems like
         | it is an enhancement that some terminals support.
         | 
         | It isn't like this feature degrades support for any other
         | terminal, just improves life for some users. Why complain?
        
           | jchw wrote:
           | Because it can _break_ terminals that don't. Nothing forces
           | you to put enough context in your link to be able to recover
           | the URL. Unsuspecting developers who only target iTerm2 will
           | put a link in for "Click here to complete OAuth2 flow" or
           | something and it will be broken...
           | 
           | In this case it isn't the worst problem ever. but i think
           | support for this should probably be removed from terminal
           | emulators unless a quorum of popular terminal emulators
           | support it.
           | 
           | (Addendum: do developers actually do things that only work in
           | virtually a single terminal emulator? Yeah. A lot of Node.JS
           | software printed status using emoji that broke horridly on
           | everything but macOS. It mostly works elsewhere now, and it
           | was purely a visual problem, but I would assume many didn't
           | even realize what their app would've done in most terminal
           | emulators. The same with hyperlinks would be a disaster.)
        
             | pwdisswordfish2 wrote:
             | Someone should create a database of various terminal
             | emulators and the features they support, so that
             | applications would be able to look up what features are
             | available.
             | 
             | We shall call it the 'terminal capabilities library'.
        
               | jchw wrote:
               | I dislike this strategy because then everyone pretends to
               | be xterm and we're back to square one.
        
               | yjftsjthsd-h wrote:
               | Yeah, it feels like we're just doing user agent vs
               | feature detection all over again. And the xterm thing is
               | a real concern, since I've read that lots of terminals
               | have indeed pretended to be xterm, leading to programs
               | having to try to detect the real terminal identity
               | because they set TERM=xterm but don't have 100% of
               | xterm's features...
        
               | jchw wrote:
               | Yep... this is exactly the problem. In fact sometimes you
               | are recommended to go and override $TERM in order to fix
               | a bug. Worse, is apps that do "work" but miss features.
               | Like it is often recommended to set your term to
               | xterm-256color to get high color in terminal apps. It's
               | not a huge deal, but it is simultaneously not a situation
               | I like or think should be encouraged.
               | 
               | FWIW I am quite confident you can do better progressive
               | feature detection in terminals, but until that somehow
               | gains traction we're going to get a lot of "hey, how come
               | i cant click the link?" and the answer is they are using
               | KDE or Windows.
        
               | yjftsjthsd-h wrote:
               | Yup. I use alacritty, but if I SSH I frequently have to
               | force TERM=xterm or something because nothing knows about
               | that. But it's fine when I SSH from in tmux... because I
               | have tmux force TERM=screen-256color to get better color
               | support:)
        
             | saagarjha wrote:
             | Don't those terminals have hyperlink detection anyways?
             | What's the need to use special escape sequences in this
             | case?
        
               | jchw wrote:
               | I dunno, but people are getting quite mad about me
               | suggesting this feature should be removed, so I guess it
               | is truly mandatory that our terminal emulators do this.
               | Shortlinks are not good enough, phishing be damned?
        
         | VadimPR wrote:
         | What telnet text are they using? The only hyperlinking in
         | telnet I know of is MXP, http://www.zuggsoft.com/zmud/mxp.htm.
        
       | wyldfire wrote:
       | Previously submitted as
       | https://news.ycombinator.com/item?id=22708586
        
       | leni536 wrote:
       | It looks great and useful. I suspect that this only works within
       | a single translation unit and can't work between separate
       | translation units. But maybe it could work together with lto,
       | that would be pretty awesome.
       | 
       | Some of the worst lifetime issues are 3rd party library calls
       | with unclear ownership semantics and static analyzers are just as
       | clueless as you are. The function signature doesn't help you out
       | in this regard (in C). My recent "favorite" is libzip's
       | zip_open_from_source that conditionally takes ownership of its
       | zip_source_t* argument.
       | 
       | https://libzip.org/documentation/zip_open_from_source.html
       | 
       | https://libzip.org/documentation/zip_source_free.html
        
       | 3fe9a03ccd14ca5 wrote:
       | These types of tools go a really long way in improving the
       | reliability and safety of C code.
       | 
       | Hats off to the redhat team for putting in the effort on this.
       | Their blog posts have been really interesting. It's definitely
       | changing my perception of what redhat really does.
        
         | jabedude wrote:
         | I've always had a positive impression of redhat, but I was
         | recently blown away with their dedication to upstreaming
         | contributions across different open source projects. I was
         | investigating a new Linux kernel feature that redhat
         | contributed and saw that the same developer opened pull
         | requests that added support for the new kernel feature in three
         | major open source projects. And one of the projects took over a
         | year to accept the changes, but he was persistent in reaching
         | out, making requested changes, etc. It really shows the passion
         | at the company to share their contributions.
        
           | Vogtinator wrote:
           | It doesn't have to be passion - having something upstreamed
           | has a lot of other benefits as well.
        
       | WalterBright wrote:
       | Double-free's can be tracked by doing data flow analysis on the
       | function. This is how D does it in its nascent implementation of
       | an Ownership/Borrowing system. It can be done without DFA if
       | getting it only 90% right and having numerous false positives is
       | acceptable.
       | 
       | I've used many static checkers in the past, and the rate of false
       | positives was high enough to dissuade me from using them. This is
       | why D uses DFA to catch 100% of the positives with 0% negatives.
       | I knew this could be done because the compilers were using DFA in
       | the optimization pass.
       | 
       | In order to get the tracking to work, one cannot just track
       | things for a function named "free". After all, a common thing to
       | do is write one's own custom storage allocators, and the compiler
       | won't know what they are. Hence, there has to be some mechanism
       | to tell the compiler when a pointer parameter to a function is
       | getting "consumed" by the caller, and when it is just "loaned" to
       | the caller (hence the nomenclature of an Ownership/Borrowing
       | system).
       | 
       | One of the difficulties to overcome with D in doing this is there
       | are several complex semantic constructs that needed to be
       | deconstructed into their component pointer operations. I noticed
       | that Rust simplified this problem by simplifying the language :-)
       | 
       | But once done, it works, and works satisfyingly well.
       | 
       | Note that none of this is a criticism of what GCC 10 does,
       | because the article gives insufficient detail to draw any
       | informed conclusions. But I do see this as part of a general
       | trend that people are sick and tired of memory safety bugs in
       | programming languages, and it's good to see progress on all
       | fronts here.
        
         | chrisseaton wrote:
         | > This is why D uses DFA to catch 100% of the positives with 0%
         | negatives.
         | 
         | For example if we have a graph data structure and we are
         | visiting it and freeing nodes, how can your static analysis
         | know that nodes will not be visited twice by my logic and freed
         | twice?
        
           | WalterBright wrote:
           | Because the DFA follows edges, not logic. If free appears
           | twice along any sequence of edges, the compiler rejects it.
        
             | chrisseaton wrote:
             | I think that's why you're getting the response you are in
             | this thread - if it really did '100% of the positives with
             | 0% negatives', it'd _have_ to consider the logic. There 's
             | an impossibility proof for that which has already been
             | cited in the thread. It's usually illustrated using a
             | function that solves the halting problem rather than the
             | Goldbach conjecture.
             | 
             | If the compiler rejects things that have free appear twice
             | on branching control flow paths but aren't actually broken
             | due to dynamic control flow (like your looping example
             | given elsewhere) then that's a false positive, isn't it?
             | 
             | And people thought '0% negatives' was meaning 'no false
             | positives'.
        
               | WalterBright wrote:
               | I see your point, and it's a valid criticism of my
               | statement. I meant it more in the sense that the static
               | checkers I have used would use hackish ad-hoc heuristics
               | which did not work very well, and gets constantly tweaked
               | by the implementor.
               | 
               | The D implementation, in contrast, uses a well-defined
               | rule outlined above, and it works predictably and
               | reliably (absent implementation bugs, rather than ad hoc
               | rule shortcomings).
               | 
               | Another way of putting it is that the _rules of the
               | language_ are changed to make 0% /100% results workable.
               | The compiler never says "there may be a problem here". It
               | passes or is rejected. If it is passes, it is memory
               | safe.
        
               | anchpop wrote:
               | That approach is a really good point in the design space
               | IMO. It is also the tradeoff that people have to make
               | when designing type systems. Many programs are correct
               | but rejected by their language's type system, but if the
               | language is well designed it was probably a really weird
               | and unintuitive program to begin with, so it's okay to
               | reject it anyway.
        
         | vanderZwan wrote:
         | > _I noticed that Rust simplified this problem by simplifying
         | the language :-)_
         | 
         | That's a bit of an open-ended qualifier. Would you say that is
         | a positive simplification (an increase in elegance) or one that
         | has a trade-off in reducing its expressive power?
        
           | WalterBright wrote:
           | I mentioned it as another way of solving the problem. Which
           | way you prefer is a matter of professional taste, something
           | I'd really prefer not diverging into in this thread.
        
         | anchpop wrote:
         | > This is why, for D, I was determined to use DFA to catch 100%
         | of the positives with 0% negatives. I knew this could be done
         | because my compilers were using DFA in the optimization pass.
         | 
         | Is this really true? I thought this was impossible due to
         | Rice's theorem
        
           | WalterBright wrote:
           | Fortunately, I am unaware that it was impossible and did it
           | anyway :-)
           | 
           | But it is possible I made a mistake.
           | 
           | It is also true that for it to work, one has to change the
           | way one writes code, like Rust does. This is why D requires
           | and @live attribute be added to functions to enable the
           | checking for just those functions, so it doesn't break every
           | program out there. It will enable incremental use of the
           | checking at the user's option.
        
             | nullc wrote:
             | You're probably using a different definition of 100% than
             | any impossibility proof would use.
             | 
             | Consider some code:
             | 
             | ---
             | 
             | a=malloc(1);
             | 
             | needfree=true;
             | 
             | if (hashfn(first_factor(huge_static_rsanum1))&1){needfree=f
             | alse;free(a);}
             | 
             | if (hashfn(first_factor(huge_static_rsanum2))&1){needfree=f
             | alse;free(a);}
             | 
             | if(needfree)free(a);
             | 
             | ---
             | 
             | The decision if this has a double free or not depends on
             | the factorizations of two huge difficult to factor
             | constants. It either double-frees or not depending on those
             | constants.
             | 
             | Surely your software cannot decide that...
             | 
             | What you probably mean is something like "100% on real
             | programs rather than contrived cases". Of course, in that
             | case, your definition of 'real programs' is the catch. :P
             | 
             | Sometimes things that seem like they should always work
             | except on contrived junk like the above example actually
             | run into limitations in practice because macros and machine
             | code generation produce ... well ... contrived junk from
             | time to time.
        
               | jjnoakes wrote:
               | I personally would want code like this flagged by static
               | analysis. I would fix the code by not calling free in the
               | second 'if' body if needfree was already false (just like
               | the call at the end of the function is guarded).
        
               | nullc wrote:
               | You might want it flagged-- I would--, but it would still
               | be a false positive if it did so (assuming the constants
               | didn't result in a double-free). :)
        
               | [deleted]
        
               | WalterBright wrote:
               | > Surely your software cannot decide that...
               | 
               | The D implementation would reject such code. The DFA
               | assumes all control paths are executed. For example,
               | if (c) free(p);         *p = 3;
               | 
               | is rejected as use-after-free.                   if (c)
               | free(p);         if (!c) *p = 3;
               | 
               | is also rejected as use-after-free. If the DFA is done
               | properly, you will not be able to trick it.
        
               | UncleMeat wrote:
               | Then that doesn't mean "0% of the negatives".
        
               | Ygg2 wrote:
               | Ok. And if you call into non-D code, like C or assembly?
               | 
               | I think this is just moving potential problem under rug,
               | which is best it can do, given Rice Theorem
        
               | WalterBright wrote:
               | Calling non-D functions means the compiler cannot check
               | them, and will rely on the user having correctly
               | specified and annotated the function signature.
               | 
               | I.e. you can have "unsafe" annotated code which the
               | compiler doesn't check, which is indeed where the dirty
               | deeds get swept. For example, I doubt someone could write
               | a guaranteed safe implementation of malloc().
               | 
               | The idea, naturally, is to specifically mark such
               | sections so that your crack QA team knows what to focus
               | on, and to minimize the amount of such code.
               | 
               | The trick is to make the safe sections expressive enough
               | that it is practical to write the bulk of the code in
               | safe sections, thereby minimizing the "threat surface".
        
             | anchpop wrote:
             | Imagine this pseudocode:                   void main() {
             | check_goldbach(); // iterates over every integer > 2
             | // to see if it is an exception to
             | // the goldbach conjecture, and
             | // returns if so            use_after_free();          }
             | 
             | This program may be memory safe. It's possible that there
             | is no exception to the goldbach conjecture, so
             | check_goldbach will never halt, so use_after_free will
             | never be called. It also may not be memory safe. If the
             | goldbach conjecture is not true, check_goldbach will
             | eventually return and then we'll call use_after_free.
             | 
             | You say your compiler accepts all programs which are memory
             | safe, and rejects all that aren't. If so, you can use it to
             | prove or disprove the goldbach conjecture. Since proving or
             | disproving that conjecture comes with a one million dollar
             | prize, it is surprising that you can solve it with your
             | method
             | 
             | edit: I didn't mean to disparage WatlerBright's work, I
             | only was taking issue with the idea that any method of
             | static analysis can accept 100% of correct programs and
             | reject 100% incorrect programs. This is an unavoidable
             | consequence of the halting problem and being aware of it
             | allows you to decide where you want the tradeoff to be. it
             | sounds like his implementation has no false negatives at
             | the cost of a few false positives that are rare enough for
             | them not to really be an issue, which is still really cool
        
               | WalterBright wrote:
               | > You say your compiler accepts all programs which are
               | memory safe, and rejects all that aren't.
               | 
               | It does it by implementing O/B rules, which force changes
               | in the way one writes code. This may sidestep the
               | goldbach conjecture, which I don't understand.
        
               | funglewest wrote:
               | > the goldbach conjecture, which I don't understand
               | 
               | Clearly.
        
               | dang wrote:
               | Please stop making accounts to break the site guidelines
               | with.
               | 
               | https://news.ycombinator.com/newsguidelines.html
        
               | jjnoakes wrote:
               | It is possible that the compiler simply rejects any
               | definition of a function that has a use after free, which
               | would both uphold the compiler's claim and also not prove
               | or disprove the goldbach conjecture.
        
               | anchpop wrote:
               | If that is the case then his approach has false
               | negatives, because a function that is never called cannot
               | introduce memory unsafety but would incorrectly be
               | flagged as unsafe
        
               | WalterBright wrote:
               | Which it does, if such use is on any control flow path
               | after the free(). For example,                  bool
               | first = true;        while (f()) {          if (first)
               | free(p);          first = false;        }
               | 
               | is rejected as double-free.
        
               | [deleted]
        
               | bjoli wrote:
               | having your static analyser solving million dollar
               | problems seems like an unnecessary high bar.
               | 
               | how is your comment, under any reasonable interpretation
               | of what Walter wrote, not pointless pedantry?
        
               | anchpop wrote:
               | What Walter meant in his comment was that his static
               | analyzer has no false positives (it never allows a memory
               | unsafe program to compile). He's provided a few simpler
               | examples in this thread of cases where his analyzer
               | produces false negatives. I knew that was going to be the
               | case because no program can flag 100% of negatives
               | without also flagging a few positives by accident (as a
               | consequence of Rice's theorem). I just meant to
               | demonstrate that, but I don't know anything about D so I
               | wanted to pick something that I was certain his analyzer
               | would fail one way or the other.
               | 
               | That doesn't mean that his analyzer isn't useful. I find
               | rust extremely useful because it catches almost me
               | whenever I accidentally write memory unsafe code without
               | burdening me too much otherwise. But almost every rust
               | coder has run into a case where they wrote some code that
               | was safe, but not allowed. That's just a fact of life and
               | it serves no one to make incorrect claims about one's
               | software
        
       | ufo wrote:
       | I wonder... Has anyone here tried using gcc with the -fanalizer
       | option? Did it find any bugs that you did not know about?
        
         | [deleted]
        
       | 6gvONxR4sf7o wrote:
       | I'm super happy to see more static tools to prevent or at least
       | find buggy code.
        
       | mynegation wrote:
       | I, at one time, worked on a tool, commercial and external to the
       | compiler, that did this (among other things). Probably the most
       | intellectually challenging job I have ever had. I am happy static
       | analysis makes inroads into mainstream!
       | 
       | Few takeaways from that time: inter procedural matters: if your
       | function reallocated a pointer passed as an argument, you want to
       | treat it as 'free' regarding this argument, and conversely, if
       | your function returns a newly allocated memory, you want to mark
       | it as such, and so on. There is also a trade off between the
       | breadth of the analysis and the human ability to comprehend it,
       | author mentions 110 node path in the article.
       | 
       | The subject of my unfinished PhD thesis and something I hope also
       | picks up is the combination of static and dynamic analysis, used
       | iteratively. If your static analysis flags a suspicious path but
       | does not have the means to figure out if it is true or not,
       | instrument it and leave it to the dynamic analysis to run through
       | it (the idea here that total instrumentation a la valgrind is
       | detrimental to performance so you will get some gain from
       | selective instrumentation). Conversely, dynamic analysis may
       | provide some hints as to where static analysis should be applied
       | at a greater depth and provide automatic annotation of functions
       | with regard to their behaviour and - possibly - invariants, that
       | help with the state explosion.
        
         | DyslexicAtheist wrote:
         | ca 2000 - 2004 I had the luck to work on a massive C/C++code
         | base building base station / telecoms infrastructure. We had
         | several hundred engineers contributing code with all kind of
         | different philosophies (many grey beards who did C/C++ for
         | decades).
         | 
         | Running Flexelint was part of our CI chain and also part of the
         | internal coding standard (e.g. definition of done). There was
         | no other time in my life where I learned so much about secure
         | coding as I did back then. Biggest challenge was agreeing about
         | false positives and we had 1 guy in the team who maintained the
         | official wiki document on when a lint warning needed to go on a
         | whitelist with an agreed description of why. The initial
         | overhead to become _lint-clean_ got a lot of push-back but
         | thanks to management support we got there and you could really
         | see how things stayed at that level even after years.
         | 
         | It felt at times bureaucratic or like yak shaving, but in
         | retrospect linting was what kept the code base at a quality I
         | haven't seen ever again since. It also ensured everyone was on
         | the same page. Taking linting seriously required a small
         | learning curve though and lead to some discussions here or
         | there. These discussions were really valuable since we got to
         | really learn from another too.
         | 
         | When I left and went to another project it felt like a step
         | back where we were chasing the same old bugs due to bad coding
         | practices and it was a major step back in my career as a dev. I
         | miss those days.
         | 
         | Really love that this becomes part of gcc.
        
           | neilv wrote:
           | Kudos for lint-clean, and for high-quality C/C++ programming.
           | One of my own stories of that (coincidentally, from when I
           | worked on dev tools for aerospace/datacomm):
           | https://news.ycombinator.com/item?id=21158546
        
       | olivierduval wrote:
       | "using the compiler the code is written in as part of the
       | compile-edit-debug cycle, rather than having static analysis as
       | an extra tool "on the side" (perhaps proprietary)"
       | 
       | Mmmm... and why not have an external tool, part of the GCC family
       | but with a proper API, to allow to use ANY TOOL instead of
       | bloating GCC with one more tool that won't be usable on other
       | compilers and will need specific maintainers, althought this
       | field is already really complex and need a lot of different
       | knowledges ?
       | 
       | For example, it could be based on intermediate code so better
       | than just source-code or machine-code analysis...
        
         | olivierduval wrote:
         | Just to be more specific: why not use the "UNIX philosophy"
         | with a compiler to compile (translate to Intermediate
         | Representation), an optimizer to optimize, an assembler to
         | produce machine code from IR (with allocation registry) and so
         | on...?
        
           | UncleMeat wrote:
           | Because GCC is explicitly designed to be a tangled mess (as
           | opposed to Clang/LLVM), in part because it makes it harder
           | for groups with different beliefs about FLOSS code to
           | repurpose it.
           | 
           | Its a choice that has caused them to cede a lot of territory
           | to Clang/LLVM.
        
             | ndesaulniers wrote:
             | I think the major mistake was FSF refusing the objective c
             | front end from Apple.
        
               | pjmlp wrote:
               | For starters, NeXT wasn't even considering giving
               | anything back to GCC.
        
           | aw1621107 wrote:
           | IIRC in the past GCC (or maybe just some of the maintainers?)
           | has been resistant to that kind of architecture because it
           | makes it easier to write proprietary
           | extensions/plugins/backends/frontends/etc. to GCC. I'm not
           | sure whether that has changed.
           | 
           | LLVM was designed to be a lot more modular, so it fits these
           | use cases a lot better. In your example, you have the front
           | ends (C, C++, Objective-C, Fortran frontends in LLVM proper,
           | Rust, Zig, etc. as third-party front-ends) compiling to LLVM
           | IR, opt for optimization, and different backends to generate
           | machine code from IR.
        
       ___________________________________________________________________
       (page generated 2020-03-28 23:00 UTC)