[HN Gopher] C's Biggest Mistake (2009)
       ___________________________________________________________________
        
       C's Biggest Mistake (2009)
        
       Author : todsacerdoti
       Score  : 46 points
       Date   : 2020-09-12 18:29 UTC (4 hours ago)
        
 (HTM) web link (digitalmars.com)
 (TXT) w3m dump (digitalmars.com)
        
       | david2ndaccount wrote:
       | In C you can declare pointers to arrays, the syntax is just
       | somewhat strange. You can even declare it as a pointer to a
       | variable sized array with c99, eg:                 void
       | foo(size_t length, char (*x)[length]){           size_t size =
       | sizeof(*x);           assert(size == length);
       | printf("sizeof(*x): %zu\n", sizeof(*x));       }
        
         | dependenttypes wrote:
         | Here is a post from 2014 from someone who is in the C standard
         | committee. https://gustedt.wordpress.com/2014/09/08/dont-use-
         | fake-matri...
        
       | WalterBright wrote:
       | Author here. I'll be blunt and repeat a prediction I made 3 years
       | ago or so:
       | 
       | C is finished if it doesn't address the buffer overflow problem,
       | and this proposal is a simple, easy, backwards compatible way to
       | do it. It is simply too expensive to deal with buffer overflow
       | bugs anymore.
       | 
       | This one addition will revolutionize C programming like adding
       | function prototypes did.
        
         | enriquto wrote:
         | The real troubles are undefined behavior and aliasing. Buffer
         | overflows are just a well known gimmick of the language that is
         | more or less controllable with some discipline. Aliasing is
         | hell. You cannot even use a global variable safely!
        
           | xxpor wrote:
           | Isn't that what asan/ubsan is for?
           | 
           | Granted, it's not static analysis, but it should catch most
           | aliasing related errors, no?
        
             | fizixer wrote:
             | I'm fully in the camp of C plus powerful analysis tools,
             | plus a high-level language (Python or Scheme).
        
             | nsajko wrote:
             | No. There has been some effort in that direction, somebody
             | proposed a Clang "type sanitizer" patch, but it wasn't
             | merged.
        
             | Ar-Curunir wrote:
             | only if your tests exercise that code path
        
         | eps wrote:
         | I'm not sure whom this proposal is aimed at exactly.
         | 
         | Any production-quality C code will already use a (pointer +
         | count) combo when passing arrays to a function, which is
         | something that will still be needed under your proposal because
         | the vast majority of arrays is dynamically sized. So unless
         | _all_ arrays in C are given the fat pointer treatment, I don 't
         | really see how what you suggest would make much of a
         | difference. That is, if fat pointers are made the first class
         | language construct, then, yes, that can be useful... though I
         | disagree if it's not done, it will cause a demise of C.
        
           | leetcrew wrote:
           | pointer + size does not really fix anything, as you are
           | relying on the programmer to correctly keep track of the
           | size. I'm not even sure what alternative this improves upon.
           | even more error-prone null value marking the end? praying the
           | array will be big enough (looking at you, gets!)?
           | 
           | unless you have a team of incredibly diligent coders, people
           | are going to read past the end of bare arrays over and over
           | again. one specific mistake I keep seeing is where people
           | misinterpret the meaning of a variable named `size`. is it
           | the number of elements or the size in bytes? who knows, but
           | it's probably UB either way if you're wrong.
        
             | true_religion wrote:
             | Would you just wrap the pointer and size in a strut, then
             | only iterate the array via a library of functions that
             | check the size first?
             | 
             | I don't code c full time, but it's what I have always done
             | when needing to use c via ffi to get a speed up in a
             | dynamic language.
        
           | mhh__ wrote:
           | Any production-quality C code?
           | 
           | Any or some? I'm not sure if I've seen that in the wild.
        
         | baby wrote:
         | I don't see a future where C survives, not only because of
         | memory corruption bugs (although that's a pretty big one), but
         | also for usability: the lack of package manager, common build
         | system, good documentation, good standard library, etc. are
         | just too much to compete with any modern system language.
        
           | edoceo wrote:
           | Those are features which makes C flexible on main-stream
           | platforms and also usable for so many other platform where
           | other languages just don't/won't work.
        
             | pjmlp wrote:
             | Those features are not unique to C, they are just cargo
             | culted as such.
        
           | dependenttypes wrote:
           | > the lack of package manager
           | 
           | Just use nix or even apt. Both of them are MUCH better when
           | compared to trash like npm or cargo which do not even check
           | for signatures.
           | 
           | > common build system
           | 
           | Such as make? There is also Ninja/Meson if you prefer.
        
           | pjmlp wrote:
           | Unfortunely until we get rid of UNIX/POSIX clones, C will be
           | kept around.
           | 
           | So not in my lifetime.
        
           | fortran77 wrote:
           | C will survive, if just for embedded/systems programming
           | where you need a "portable assembly language" that can run on
           | the simplest CPUs.
        
             | mhh__ wrote:
             | That's because of sunk-cost rather than design.
             | 
             | Thanks to LLVM and GCC you can happily write embedded code
             | in a higher level language, but the vendors don't bother
             | supporting it because a lot of embedded coding isn't really
             | what we would call software (no tests etc.)
        
           | ajsnigrutin wrote:
           | > I don't see a future where C survives
           | 
           | I've been seeing those exact words for decades now, and C is
           | still going strong. Every few year a new language comes,
           | somes writes something in it, that was written in C before,
           | someone might even write a basic OS in it, and after a few
           | years, that language is almost forgotten, a new one is here,
           | and again, someone is writing something in it, but in the
           | end, we still use C for the things we used it 10, 20, for
           | some, even 30 years ago.
        
           | blogant wrote:
           | > lack of package manager, common build system, good
           | documentation.
           | 
           | This is where C is superior to virtually every other
           | language. It has K&R to start with [1], a wealth of examples
           | to progress from there, man pages, autotools, cmake, static
           | and shared libraries.
           | 
           | > good standard library.
           | 
           | It should have hash tables at least, but it isn't bad.
           | 
           | [1] Which is still the best language book ever written (yes,
           | it has some anti patterns, you unlearn them quickly).
        
           | non-entity wrote:
           | As much as I dislike C
           | 
           | > There are only two kinds of languages: the ones people
           | complain about and the ones nobody uses.
           | 
           | This unfortunately seems to mostly hold true.
        
             | unsatchmo wrote:
             | Porque no los dos? There are a few languages that nobody
             | uses and also everybody seems to complain about.
        
           | samatman wrote:
           | SQLite alone has a support contract through 2050.
           | 
           | C survives.
        
           | rumanator wrote:
           | > the lack of package manager
           | 
           | What do you call linux distro's package managers then? I
           | mean,in distributions like Debian you can even download a
           | package's source code with apt-get.
        
           | forrestthewoods wrote:
           | > I don't see a future where C survives
           | 
           | if C dies then what replaces it?
        
             | nicoburns wrote:
             | Perhaps a combination of a language like Zig (a 1:1
             | replacement for situations where you really do want a lot
             | of manual low-level control) and higher-level languages
             | like Rust eating into more and more of the use cases.
        
           | humanrebar wrote:
           | I don't see C being in much worse shape than C++ with respect
           | to build system and package manager. It's slow going, but
           | progress seems to be happening there.
           | 
           | Are you saying both are doomed? Or is there some scenario
           | where C++ survives without C?
        
             | vlovich123 wrote:
             | I think both are, long term (think FORTRAN where it's not
             | particularly popular but a lot of existing code is
             | maintained and not rewritten).
             | 
             | C++ is actually in a slightly better spot ironically
             | because it's harder to integrate with. If you have a C
             | program you can pretty easily start replacing parts with
             | Rust. You can't do the same with C++ which insulates it
             | better in that sense.
        
               | ryl00 wrote:
               | Reports of Fortran's death (latest standard 2018) are
               | greatly exaggerated (much like C). It's receded to a
               | niche, but it's still a very important niche (numerical,
               | HPC). Hopefully, the development of a new Fortran front
               | end for LLVM (from PGI/Nvidia?) pans out, as this would
               | fill a gap in LLVM's offerings, and provide more
               | competition for ifort and gfortran.
        
               | vlovich123 wrote:
               | You're proving my point. FORTRAN is a niche language. C++
               | is still mainstream. It will recede but not completely
               | disappear
        
             | baby wrote:
             | I don't see a great future for C++ either
        
         | Upvoter33 wrote:
         | "C is finished if it doesn't address the buffer overflow
         | problem"
         | 
         | You should keep making this prediction ... one day you might be
         | right! :)
        
           | WalterBright wrote:
           | I suspect C has been steadily losing ground since I made it.
        
             | rumanator wrote:
             | C has been "losing ground" not because of random per peeves
             | of those who never wrote a line of code in C but because
             | since C's last standard update there have been other
             | programming languages that offer developers something of
             | value so that the trade-off between using C or any
             | alternative starts to make technical sense.
             | 
             | It also helps that C's standardization proceeds I ways that
             | feel somewhat between sabotage and utter neglect.
             | 
             | Meanwhile, C is still the absolute best binary interop
             | language devised by mankind.
        
           | teej wrote:
           | C code is being replaced by Rust fast. The only limit is how
           | quickly programmers can become good at Rust. It's already
           | happening.
        
             | viraptor wrote:
             | I love the move to d/rust/zig/nim/... but there are other
             | issues too. Ecosystem of libraries, stabilisation of common
             | patterns (futures and Tokio issues are still out there),
             | platform compatibilities, industry support for moving away
             | from known solutions, and many other issues. Even if we all
             | suddenly knew Rust perfectly tomorrow, there are other
             | issues in the way.
        
         | rightbyte wrote:
         | The proposal is just syntactic sugar for an size argument. It
         | doean't add or solve anything really.
        
           | WalterBright wrote:
           | In my experience with language design, a little bit of
           | syntactic sugar can have transformative results.
           | 
           | C's function prototypes, syntactic sugar added circa 1990,
           | were transformative for C programming.
        
             | Koshkin wrote:
             | I wonder if a better idea (in principle) would be to have
             | some kind of hardware implementation, sort of like a finer-
             | grained memory segmentation.
        
           | mhh__ wrote:
           | It allows automatic bounds checking i.e. I don't need to
           | point out how many bugs that could fix.
           | 
           | If you're worried about performance test it and turn it off.
        
       | skywhopper wrote:
       | It's not a "mistake". This article is complaining about a
       | misinterpretation of C's functionality. Arrays are not "real"
       | data structures in C: there's no such thing. The array-ish syntax
       | that's available is just a some syntactic sugar on top of
       | pointers. You could say that having the sugar at all is a
       | mistake. Or that C is incomplete without first-class array types.
       | This is a cute hack, but at this point (far more than 10 years
       | ago) it's probably better to move on to Rust if you don't like
       | this aspect of C, rather than proposing to hack the language.
        
       | chadcmulligan wrote:
       | Niklaus Wirth would concur - that was similar to his argument for
       | pascal - strings contain a size.
        
       | SamReidHughes wrote:
       | The biggest mistake to me feels like implicit integer
       | conversions. That's where C feels like it's really out to get
       | you.
        
         | leetcrew wrote:
         | on a somewhat related note, I've always wished for something
         | like `explicit` that prevents assigning different typedefs for
         | the same underlying type to each other. like suppose I have two
         | types, WorldVec (vector in worldspace) and ViewVec (vector in
         | view/sceenspace). under the hood they are both typedefs for
         | float[3], so I can freely assign them back and forth. but any
         | vector operation that mixes the types would almost always be a
         | bug, since they are in different spaces. would be cool to get
         | this functionality out of the humble typedef.
        
         | ncmncm wrote:
         | Agree. And they have leaked out to C++ where they have been
         | very hard to fix, and even, to some degree, to Rust.
        
           | forrestthewoods wrote:
           | How have they leaked into Rust? I thought Rust had no
           | implicit conversions?
        
             | steveklabnik wrote:
             | There are a small number of coercions, but we do not do
             | them around numeric types, it's true. Not sure what your
             | parent is referring to.
        
             | ncmncm wrote:
             | It does, however, have integer overflow, in release mode.
             | So if you do code a conversion, you can end up with a value
             | different from the source.
        
         | rowanG077 wrote:
         | I don't feel it's so bad. You have a a specific flag to tell
         | the compiler to show warnings if you have any.
        
       | pvg wrote:
       | Previously:
       | 
       | https://news.ycombinator.com/item?id=17585357
       | 
       | https://news.ycombinator.com/item?id=1014533
        
       | bumblebritches5 wrote:
       | is errno, global state is incompatible with multithreading
        
       | tus88 wrote:
       | > Conflating pointers with arrays.
       | 
       | AND Strings.
       | 
       | FTFY.
        
         | yarrel wrote:
         | C doesn't have strings. ;-)
        
           | Snarwin wrote:
           | More precisely, it doesn't have a string _type_.
        
       | Something1234 wrote:
       | Stupid question, but how do I access the size of an array using
       | this fancy new declaration if it were to be added? It doesn't
       | seem like any sugar is there to provide "range based for loops."
       | 
       | Wait I would just use `sizeof` but then I'm still doing pointer
       | math then?
        
         | WalterBright wrote:
         | A macro can be added to access the length property.
        
       | franciscop wrote:
       | This was probably the most confusing thing about C when I first
       | started learning programming back in the day. When you call a
       | function you pass the value, except in arrays where it gets
       | converted as a pointer. It was explained back then to me that the
       | reason is because copying the whole array was not efficient so it
       | was better to pass the reference.
        
         | quelsolaar wrote:
         | I think a better way to think about is to say that when you
         | type:
         | 
         | int a[10];
         | 
         | you allocate 10 integers and "a" is the pointer to the first
         | one of them.
         | 
         | Arrays are just memory, just like what you get wen calling
         | malloc, and memory is accessed using pointers in C.
        
           | rightbyte wrote:
           | That mindset doesn't cover sizeof(a) properly.
        
       | ktpsns wrote:
       | I don't think it is a mistake in language design. In the 90s,
       | memory was a rare good, and it still is in the microprocessor
       | world, where "only" a few kilobytes of RAM are available. There
       | are performance critical paths where passing a size_t is just
       | unnecessary.
       | 
       | The actual mistake is to don't pass size_t as a user. This is one
       | kind of "premature optimization". We can safely say the language
       | design doesn't encourage the user to write safe code, and
       | succeror languages do that.
       | 
       | Don't get me wrong -- I just try to do the point that C itself is
       | not the point to blame. It's people using computers who write the
       | million dollar bugs.
        
         | bobbyi_settv wrote:
         | > There are performance critical paths where passing a size_t
         | is just unnecessary
         | 
         | You would still be able to declare your function as taking a
         | pointer (instead of an array, which in this world would be a
         | far pointer) if you need to
         | 
         | He's saying to deprecate char[] as a parameter type, not char _
        
           | WalterBright wrote:
           | That's right, nothing is taken away from the user with my
           | proposal.
        
           | antiquark wrote:
           | An existing alternative is to put an array in a struct:
           | struct string123 {              char data[123];          };
           | 
           | Then create functions that user pointers to these string123
           | structs.
        
             | david2ndaccount wrote:
             | If you want a pointer to a fixed size array, just use one,
             | eg
             | 
             | char (*data)[123]; // syntax is somewhat awkward
        
               | mav3rick wrote:
               | You can type def that
        
             | WalterBright wrote:
             | That does work, except for:
             | 
             | 1. variable length buffers
             | 
             | 2. every other piece of code you want to interface with
             | uses `char*`
        
           | tgb wrote:
           | Interestingly, I assume you meant to end your post with
           | "pointer to char" not "char" itself, but asterix is the the
           | italics formatting character on HN so it's italicized it. But
           | the funny thing is that it's italicized the "reply" button
           | (as well as an empty i-tag after "char").
        
         | WalterBright wrote:
         | The #1 undetected bug problem with C programs is buffer
         | overflows. Experience shows it is extremely difficult to verify
         | that arbitrary C code doesn't have buffer overflows in it.
         | Assistance from the core language design can improve things a
         | great deal.
         | 
         | D allows passing both raw pointers as parameters and
         | pointer/length pairs. It's up to the user to choose. In
         | practice, people have simply moved away from using raw pointers
         | into buffers.
         | 
         | As for performance, in C to determine the length of a string
         | one uses strlen(). Over and over and over again on the same
         | string. This can be a major performance problem, even not
         | considering the memory cache effects. When I look at speeding
         | up C code, often the first nuggets of gold is reviewing all the
         | explicit and implicit uses of strlen(). (Implicit uses are
         | functions like strcat()). It's also the first place I look for
         | bugs when reviewing C code - anytime you see a sequence of
         | strlen, strcat, strcpy, it's often broken (typically in
         | neglecting somewhere to account for the extra 0 byte).
        
           | Gibbon1 wrote:
           | All of this I agree with. In a better world 'arrays' would
           | have added in the 1980's. The arguments about memory
           | limitations is spurious since if you're writing good code you
           | always pass a pointer and the length. Always no exceptions.
           | 
           | Yeah and all the string functions should have been marked as
           | depreciated with C89 and fully depreciated with C99.
        
           | nmarks100 wrote:
           | I don't agree that in the days of valgrind, asan etc. the #1
           | issue is buffer overflows.
           | 
           | #1 and #2 are integer overflows and aliasing mistakes.
        
             | WalterBright wrote:
             | valgrind is a marvelous tool, but it only detects actual
             | buffer overflows, not vulnerability to buffer overflows.
        
         | mhh__ wrote:
         | You must pass a size_t somewhere, surely? Otherwise you have no
         | idea how long the array is - this is about doing it properly
         | rather than relying on yourself at 9AM to get it right
         | everytime.
        
       | quelsolaar wrote:
       | Personally I like it the way it is. If you want to copy an array
       | when making a function call you can define a struct with a array
       | in it, and pas the structure.
       | 
       | If C did pass array lengths it still wouldn't matter since C
       | doesn't (and in my opinion shouldn't) check for overflows.
        
         | dependenttypes wrote:
         | > since C doesn't ... check for overflows
         | 
         | Because it's a language, not an implementation. An
         | implementation is free to do so (and there are such
         | implementations after all).
        
           | quelsolaar wrote:
           | That's correct! C doesn't require checking for overflows, but
           | it also doesn't forbid implementations from doing so. both
           | are features.
        
             | Koshkin wrote:
             | I don't think it is possible, not without changing some
             | parts of the C's specification. At the very least you'd
             | need to be able to somehow encode the length of the buffer
             | in the pointer to it. (There is no semantic difference
             | between a pointer to a simple, fixed-length variable and a
             | pointer to an array.)
        
               | bawolff wrote:
               | Which is what the article proposed.
        
       ___________________________________________________________________
       (page generated 2020-09-12 23:00 UTC)