[HN Gopher] Git's list of banned C functions
       ___________________________________________________________________
        
       Git's list of banned C functions
        
       Author : muds
       Score  : 320 points
       Date   : 2021-03-04 20:33 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | moomin wrote:
       | They should probably add sscanf.
        
         | ed25519FUUU wrote:
         | First thing I looked for. It looks like it _was_ used here:
         | 
         | https://github.com/git/git/blob/master/object-file.c#L1293
         | 
         | And currently used here (at least):
         | 
         | https://github.com/git/git/blob/master/refs.c#L1235
        
       | TheRealSteel wrote:
       | I'm an idiot, I read the headline and thought these were banned
       | from Git entirely. As in, you couldn't commit them to _any_ repo
       | using Git, at all. Thought that seemed a bit harsh.
       | 
       | Turns out you just can't use them when you contribute code to the
       | Git project. That makes sense, and seems reasonable.
        
       | [deleted]
        
       | maxk42 wrote:
       | What would be helpful is an explanation of how each function ends
       | up being misused so people can learn from this.
        
         | petters wrote:
         | Git blame is helpful here. See e.g.https://github.com/git/git/c
         | ommit/1b11b64b815db62f93a04242e4...
        
         | jsmith45 wrote:
         | View the git history for the file. Each commit that adds
         | functions has a detailed explanation of what is wrong with the
         | functions.
        
       | zbendefy wrote:
       | Are there some details on whats wrong with these?
        
         | bvaldivielso wrote:
         | The commit messages that added them explain the reasoning
        
           | ufo wrote:
           | I wish they would have put that on comments instead of on the
           | commit messages. It's not the first time that I've seen this
           | particular list of banned functions being shared online and
           | every time it happens someone has to explain that the most
           | interesting info is hidden in the commit messages.
        
         | alexchamberlain wrote:
         | All the string functions have buffer overrun vulnerabilities if
         | not used carefully. I'm not sure about the time functions
         | though.
        
           | trilinearnz wrote:
           | Very much this. I frequently write small games in C, and the
           | number of times I have been bitten by baffling behaviour
           | because a string somewhere was copied into an array that was
           | too short, are many! Apart from that, I love the simplicity
           | of the language and the stdlib, and it's definitely my
           | preferred hobby programming environment.
           | 
           | It would be good to know what the commonly-accepted
           | alternatives are.
        
           | edflsafoiewq wrote:
           | The time functions are either non-reentrant, or, for the _r
           | versions, have the same problem with buffer overruns.
           | 
           | https://github.com/git/git/commit/1fbfdf556f2abc708183caca53.
           | ..
           | 
           | https://github.com/git/git/commit/91aef030152d121f6b4bc3b933.
           | ..
        
         | [deleted]
        
         | csours wrote:
         | I'm pretty sure you could google each of these with the word
         | 'dangerous'
         | 
         | For example: https://lgtm.com/rules/2154840805/
        
       | whydoyoucare wrote:
       | I am so thankful git isn't forcefully including this header in
       | every C language project and that we have a choice when using
       | git! :-)
        
       | bvaldivielso wrote:
       | Ah this is a very good idea. I guess you still have to make sure
       | that all your translation units include this header, which isn't
       | completely foolproof.
       | 
       | Static analysis would probably be more robust, but way more
       | involved.
        
         | radus wrote:
         | Best of both worlds: use static analysis to ensure the header
         | is included?
        
         | koenigdavidmj wrote:
         | gcc has a -include option, so this can be done once in the
         | Makefile and get the benefit everywhere (unless you're being
         | clever).
        
         | Athos_vk wrote:
         | I remember visual studio having an option to force include a
         | file, surely something like that would exist for other
         | toolchains
        
         | kccqzy wrote:
         | You don't need fancy static analysis. You can find out whether
         | the banned functions are called just by inspecting the compiled
         | object file. Add it to the build step and done.
        
       | EdSchouten wrote:
       | Funnily enough, strtok() is not listed :)
        
       | kgrimes2 wrote:
       | Can a C guru provide a TL;DR of why these are bad?
        
       | drfuchs wrote:
       | It would be nice if the error messages generated would suggest
       | replacement functions that they deem appropriate. I see that I'm
       | not supposed to use gmtime, localtime, ctime, ctime_r, asctime,
       | and asctime_r; but what do they think I _should_ use?
        
         | dev_tty01 wrote:
         | It would be even nicer if it redefined the call to a safe
         | version and then generated a warning message informing the
         | programmer of the substitution.
        
           | pjc50 wrote:
           | You can't do that because the semantics are different in most
           | cases.
        
         | [deleted]
        
         | cle wrote:
         | From the commit messages
         | 
         | > The ctime_r() and asctime_r() functions are reentrant, but
         | have no check that the buffer we pass in is long enough (the
         | manpage says it "should have room for at least 26 bytes").
         | Since this is such an easy-to-get-wrong interface, and since we
         | have the much safer strftime() as well as its more convenient
         | strbuf_addftime() wrapper, let's ban both of those.
         | 
         | (https://github.com/git/git/commit/91aef030152d121f6b4bc3b933..
         | .)
         | 
         | > The traditional gmtime(), localtime(), ctime(), and asctime()
         | functions return pointers to shared storage. This means they're
         | not thread-safe, and they also run the risk of somebody holding
         | onto the result across multiple calls (where each call
         | invalidates the previous result). All callers should be using
         | their reentrant counterparts.
         | 
         | (https://github.com/git/git/commit/1fbfdf556f2abc708183caca53..
         | .)
        
           | tinus_hn wrote:
           | Strangely there is no mention of strtok which has a similar
           | issue.
        
           | drfuchs wrote:
           | Yes, but every hapless user shouldn't have to go searching
           | through a bunch of commit messages to find the suggested
           | replacement. Bad UX.
        
             | capableweb wrote:
             | The UX of using this list is not by manually searching
             | through the list and seeing the reason behind them. You
             | include the file together with the rest of your sources and
             | now you get compilation errors if you try to use them.
             | Can't think of a better UX for banned functions.
             | 
             | Discovering why the thing is banned you only have to do
             | once, if you care. If you're just modifying something
             | quickly and minor in Git, you might not even care why.
        
             | grncdr wrote:
             | It seems pretty safe to assume a developer contributing C
             | code to _git itself_ would know how to use git blame (or
             | the GitHub interface for it).
        
               | orf wrote:
               | Why make it harder, and why make it impossible to update
               | if there are other suggested alternatives that are
               | available since whenever the commit was made?
        
               | masklinn wrote:
               | > Why make it harder
               | 
               | Because there is no way for a commit message to become
               | outdated or detached from what it talks about, both of
               | which are very much issues with comments.
               | 
               | > why make it impossible to update if there are other
               | suggested alternatives that are available since whenever
               | the commit was made?
               | 
               | Because that doesn't really matter.
        
               | cma wrote:
               | > Because there is no way for a commit message to become
               | outdated or detached from what it talks about, both of
               | which are very much issues with comments.
               | 
               | What if they think of another reason why one of the same
               | functions should be disabled?
        
               | underwater wrote:
               | Code is evergreen, whereas a git commit represents a
               | change at a single point in time. It will always be
               | limited by the knowledge the author had available to
               | them.
               | 
               | The commit message from 2020 with suggested alternatives
               | might very well go stale. Does the author go and force a
               | noop commit so they can document new best practice in a
               | new commit message?
        
               | orf wrote:
               | > Because that doesn't really matter.
               | 
               | Ok, so maybe rather than have this file we should run
               | "git log | grep BANNED" and build a list of functions
               | from that? Or maybe we could change all error messages to
               | be "go look at the commit history to work out why this
               | happened".
               | 
               | No? Maybe putting context in source files (or better yet,
               | an error message!) rather than in a side channel like the
               | commit message has value when it comes to understanding
               | and updating, and it won't be lost under the weight of
               | future commits.
        
               | [deleted]
        
               | capableweb wrote:
               | Your source code should describe what the program should
               | do today. It should not contain all historical artifacts
               | about your source code, as it'll grow to big and
               | unmanageable then. Instead, use Git to store temporal
               | information, data that is about change and reasoning
               | behind it. Git is basically a timeline, instead of hard
               | facts of today.
               | 
               | That's why it makes sense to describe the background and
               | reasoning behind a change in a Git commit, instead of
               | inside your source files as comments.
        
               | orf wrote:
               | Totally agree, which is why nobody is suggesting adding
               | the background and reasoning behind the change to the
               | source file as a comment.
               | 
               | They are suggesting adding a more informative error,
               | which may include a subset of that background and
               | reasoning. An error message that points you to the
               | functions you should use instead is infinitely more
               | informative than one that says "this is banned. Bye."
        
               | jorl17 wrote:
               | I find it highly backwards that documentation on "what to
               | use instead of X" is in the commit message disabling X.
               | One _might_ do it and might remember to do it, but IMO it
               | makes absolutely no sense for this not to be documented
               | properly in code, as suggested by OP.
               | 
               | By that logic, a non-insignificant amount of (good)
               | comments in code could be removed and people asked to
               | "git blame the code and check out the commit that made it
               | for the documentation". Of course this could be done, but
               | it sounds ridiculous even typing it out.
        
               | blitz_skull wrote:
               | I disagree. Commits messages exist for the very purpose
               | of adding context to your code base. If you added
               | <complex_function> for something that needs context, sure
               | MAYBE add a comment, but I really pray that I'm going to
               | find a few paragraphs disambiguating the problem within a
               | git commit. If I'm _really_ lucky, maybe I find a PR
               | number or Jira ticket reference as well.
               | 
               | If you're truly clueless as to what could be substituted
               | for these commands, then you don't understand why they're
               | banned. So our first step? Figure out why they're banned.
               | And how would we sanely approach this? Probably by
               | checking the commit message for _why that code is there
               | in the first place_. That's a very safe, sane, and not-
               | at-all backwards assumption. After you understand why
               | it's there, a quick google search might help out if the
               | commit message didn't already include information on
               | alternatives.
               | 
               | Lastly, yeah, I totally agree a large amount of GOOD
               | comments should be relegated to the git commits if all
               | they're doing is adding additional context around a
               | complex piece of logic. Comments do not exist to edifying
               | a code base in any way other than context. They're too
               | easy to let become stale, whereas a git commit will
               | always reference exactly the code you're blaming.
               | 
               | So, I have to really disagree that it's ridiculous or in
               | any way absurd. In fact, I think a lot of code suffers
               | from NOT using git as a way to extend context around a
               | code base. It's SUPER easy with most development
               | environments to select a block of text and blame it. It's
               | so easy that it's almost always my go-to to increase my
               | context of what's been happening around a particular part
               | of the code base.
        
               | barnaclejive wrote:
               | So, you are tied to Git for eternity to preserve
               | documentation?
               | 
               | Might work in practice for a long time, but Git is a
               | version control system, not a documentation system.
        
               | dpedu wrote:
               | For developer documentation - yes, absolutely!
        
               | capableweb wrote:
               | > By that logic, a non-insignificant amount of (good)
               | comments in code could be removed and people asked to
               | "git blame the code and check out the commit that made it
               | for the documentation". Of course this could be done, but
               | it sounds ridiculous even typing it out.
               | 
               | Yes, exactly. You want to understand how a codebase
               | changed and evolved over time? Git is your friend. If you
               | want the facts of the code today? The source code is your
               | friend. That's why the way Linux and Gits Git repository
               | method of storing history makes sense. See also
               | https://news.ycombinator.com/item?id=26348965
               | 
               | Try navigating the Git codebase with a git-blame sidebar
               | (probably VS Code has that somewhere) so you can see the
               | history of the source files. If you wonder why something
               | is what it is, you can checkout the commit that last
               | modified it. Or go even further backwards and figure out
               | in the context it was first added. If you truly want to
               | understand a change, a git repository with well written
               | git messages is a pleasure to understand and dig into.
        
         | [deleted]
        
         | chris_wot wrote:
         | The commits actually do give that info. Take for instance this
         | commit:
         | 
         | https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...
         | 
         | It actually gives examples and a lengthy explanation and
         | reasoning behind the ban.
        
           | xorcist wrote:
           | Now _that 's_ what a good commit message looks like!
        
             | cesarb wrote:
             | Commit messages like that are common in the Linux kernel
             | project, which is where git came from (though this
             | particular commit message is a bit on the longer side).
             | 
             | It makes more sense if you think of it as an email message
             | justifying why the project maintainer should accept that
             | change, because that's what they were before git even
             | existed. Still today, unless you're one of the Linux kernel
             | subsystem maintainers, you have to convert your changes to
             | emails with git-format-patch/git-send-email and send them
             | to the right mailing list. Even the Linux kernel subsystem
             | maintainers keep writing commits in that style out of habit
             | (and because Linus will rant at them if they don't).
        
           | mamon wrote:
           | But why put that info in commit message instead of a comment
           | in the file itself?
        
             | chris_wot wrote:
             | Because comments can be tedious and get out of sync with
             | the repo. Why not check the git history? I wish more repos
             | could be like this!
        
               | adrianmonk wrote:
               | > _Why not check the git history?_
               | 
               | Because that is effort every person who uses the file has
               | to do over and over again, whereas maintaining the file
               | is effort that has to be done once by one person.
        
               | skeletal88 wrote:
               | Someone here commented to use git blame to find the
               | commit that banned the functions and read the commits.
               | These people making the suggestions.. must hate other
               | people and their time. Also, what if someone.. for
               | example runs a code formatter on the file, making git
               | blame useless? Is it really so difficult to make a manual
               | or explain properly in the comments about what
               | replacements to use?
        
               | chris_wot wrote:
               | It sounds like you want a manual. Personal preference I
               | guess. The maintainers seem to have decided to keep it in
               | the history. It's not like this was ever meant for
               | anything other than git itself.
        
               | aendruk wrote:
               | I really wish tooling like this was more common:
               | 
               | https://github.com/eamodio/vscode-
               | gitlens/tree/v11.2.1#curre... (screenshot)
               | 
               | > Current Line Blame: Adds an unobtrusive, customizable,
               | and themable, blame annotation at the end of the current
               | line
        
             | colordrops wrote:
             | Or even in the compile error message itself.
        
         | [deleted]
        
         | colordrops wrote:
         | Also, _why_ the functions are banned.
        
       | lerax wrote:
       | Yes, this is right. Any C decent programmer knows that functions
       | are cursed.
        
       | Animats wrote:
       | About 20 years too late. Those should have been moved to a
       | "deprecated" header file decades ago.
        
       | xvilka wrote:
       | I hope, one day to see it's rewritten in a safer language.
        
         | qbasic_forever wrote:
         | There's a nice Go implementation of git: https://github.com/go-
         | git/go-git
        
       | sys_64738 wrote:
       | scanf?
        
       | abetusk wrote:
       | The Git Mailing List Archive on lore.kernel.org (found in the
       | README from the git mirror on GitHub) has more context [0] [1]
       | [2]. From Jeff King on 2018-07-24:                 The strncpy()
       | function is less horrible than strcpy(), but       is still
       | pretty easy to misuse because of its funny       termination
       | semantics. Namely, that if it truncates it omits       the NUL
       | terminator, and you must remember to add it       yourself. Even
       | if you use it correctly, it's sometimes hard       for a reader
       | to verify this without hunting through the       code. If you're
       | thinking about using it, consider instead:              -
       | strlcpy() if you really just need a truncated but           NUL-
       | terminated string (we provide a compat version, so           it's
       | always available)              - xsnprintf() if you're sure that
       | what you're copying           should fit              - strbuf or
       | xstrfmt() if you need to handle           arbitrary-length heap-
       | allocated strings
       | 
       | I just did a search on the keywords 'banned' and 'strncpy' [2]
       | 
       | [0]
       | https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...
       | 
       | [1]
       | https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...
       | 
       | [2]
       | https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...
       | 
       | [3] https://lore.kernel.org/git/?q=banned+strncpy
        
         | js2 wrote:
         | Psst:
         | 
         | https://github.com/git/git/commits/master/banned.h
         | 
         | (Git development is done by emailing patches. Those patches
         | include the git commit message, which we can see just by
         | looking at the history of the file. Sometimes there's
         | additional discussion on the ML, but the most important details
         | are in the commit message because the git development team is
         | very disciplined about that.)
        
       | captainmuon wrote:
       | It would be interesting to see the rationale behind these bans,
       | and what the suggested alternatives are. Some are obvious, like
       | `strcpy`, but I can't remember what the problem with `sprintf` or
       | the time functions are.
       | 
       | If you are doing something like `sprintf(buffer, "%f, %f", a,
       | b)`, yes it is tricky to choose the size of buffer frugally, but
       | if you replace that by `ftoa` and constructing the string by
       | hand, you are likely to introduce more bugs.
       | 
       | Edit: as pointed out in another post, you can do git blame to see
       | the rationale for each ban, quite interesing.
        
         | monocasa wrote:
         | snprintf will always terminate the string, and won't overflow
         | the buffer.
        
         | Aanok wrote:
         | The trouble with printf-family functions is their variadic
         | nature. If the arguments don't match the format string, you can
         | wreak all sorts of havoc.
         | 
         | A fun exercise you can do is put a "%s" in the format string,
         | omit the string argument and see what happens to the stack.
        
           | anyfoo wrote:
           | That's however relatively easy to verify programmatically,
           | and indeed any recent compiler will complain about that.
           | 
           | I'd say the usual trap is rather the size of the target
           | buffer, because that requires bigger static analysis guns.
           | (I'm ignoring things like "%n", because then you're playing
           | with fire already.)
        
             | Gibbon1 wrote:
             | I think the big three C compilers have pragma's that you
             | can tag printf/scanf with that will cause the compiler to
             | verify the argument list.
        
           | danaliv wrote:
           | There's that, but with sprintf/vsprintf specifically, there's
           | no way to keep it from storing characters past the end of
           | your buffer. For example:                   char buf[2];
           | sprintf(buf, "%d", n);
           | 
           | This will happily write to buf[2] and beyond if n is negative
           | or greater than 9.
        
         | SloopJon wrote:
         | sprintf() warnings have gotten pretty sophisticated these days.
         | I discovered GCC's -Wformat-overflow the other day. It
         | complained that the buffer for a date string wasn't big enough;
         | e.g., sprintf(buf, "%04d-%02u-%02u", year, month, day), where
         | year, month, and day are 16-bit shorts, and buf was probably
         | eleven or twelve bytes.
         | 
         | It may actually be a bug that I got the warning, because the
         | range of each input was checked, and I think the compiler is
         | supposed to be smart enough to remember that.
        
         | dahfizz wrote:
         | This was my reaction as well. Banning strncpy just encourages
         | haphazard manual copying.
        
           | smasher164 wrote:
           | From the commit message:
           | 
           | If you're thinking about using it, consider instead:
           | - strlcpy() if you really just need a truncated but
           | NUL-terminated string (we provide a compat version, so
           | it's always available)            - xsnprintf() if you're
           | sure that what you're copying         should fit            -
           | strbuf or xstrfmt() if you need to handle         arbitrary-
           | length heap-allocated strings
        
             | nwmcsween wrote:
             | strlcpy is safer but effectively running strlen(src) every
             | call is a good wtf
        
           | azurezyq wrote:
           | maybe this https://github.com/git/git/blob/master/strbuf.h ?
        
           | ben_bai wrote:
           | strlcpy is the safe way, that is used by git.
        
           | [deleted]
        
           | syncsynchalt wrote:
           | strncpy doesn't do what you think it does (it is not
           | analogous to strncat). strncpy does not terminate strings on
           | overflow. In C terms, it is not actually a string function
           | and shouldn't be named with `str`.
           | 
           | snprintf or nul-plus-strncat do what you want, but snprintf
           | has portability problems on overflow. Most projects I've been
           | on rely on strlcpy (with a polyfill implementation where not
           | available).
        
           | asdfasgasdgasdg wrote:
           | I think you're meant to use snprintf instead. It would be
           | great to see documentation on the alternatives!
        
       | sys_64738 wrote:
       | getc?
        
       | ape4 wrote:
       | Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX)
       | 
       | /joke
        
         | fatnoah wrote:
         | I'm pretty sure I've seen similar logic in my life.
        
       | attractivechaos wrote:
       | I wonder how they copy strings with strcpy and strncpy both
       | banned. strlcpy? But it is not conforming to major standards. Or
       | just memcpy with extra code?
        
         | dgentile wrote:
         | Edited: Looks like they have safe alternatives: "
         | - strlcpy() if you really just need a truncated but
         | NUL-terminated string (we provide a compat version, so
         | it's always available)            - xsnprintf() if you're sure
         | that what you're copying         should fit            - strbuf
         | or xstrfmt() if you need to handle         arbitrary-length
         | heap-allocated strings     "
        
         | lights0123 wrote:
         | https://github.com/git/git/commit/e488b7aba743d23b830d239dcc...
         | Yes:
         | 
         | > we provide a compat version, so it's always available
        
           | [deleted]
        
           | attractivechaos wrote:
           | This gets me interested. Link [1] below shows their
           | implementation of strlcpy(). This is a questionable
           | implementation. With strncpy, the source string "src" may not
           | be NULL terminated IIRC. The git implementation requires
           | "src" to be NULL terminated. If not, an invalid read. EDIT:
           | according to the strlcpy manpage [2], "src" is required to be
           | NULL terminated, so strlcpy imposes more restrictions and is
           | not a proper replacement of strncpy.
           | 
           | Furthermore, imagine "src" has 1Mb characters but we only
           | want to copy the first 3 chars. The git implementation would
           | traverse the entire 1Mb to find the length first, but a
           | proper implementation only needs to look at the first 3
           | chars. So, they banned strncpy and provided a worse solution
           | to that.
           | 
           | [1]: https://github.com/git/git/blob/master/compat/strlcpy.c
           | 
           | [2]: https://linux.die.net/man/3/strlcpy
        
             | alcover wrote:
             | Agreed. It's O(n) inefficient. I guess looping though chars
             | up to `size` would perform better on average.
             | 
             | I see this `strlcpy` recommanded everywhere.
        
             | kzrdude wrote:
             | You have found the answer - strlcpy is not a replacement
             | for strncpy at all (it's arguably a safer version of
             | strcpy), and git people didn't invent this, it's the
             | existing BSD strlcpy interface.
        
               | attractivechaos wrote:
               | Thanks for the confirmation. But my concern remains: they
               | banned strncpy without a proper replacement. In addition,
               | I didn't know the extra restriction of strlcpy until
               | today (I have never used it before because it is not
               | conforming to C99/POSIX). I might have fallen into this
               | trap.
        
               | notaplumber wrote:
               | The problem is the actually often the opposite, in the
               | real world many treat strncpy as if it behaves like
               | strlcpy. Note that strlcpy is equivalent to:
               | snprintf(buf, sizeof(buf), "%s", string);
               | 
               | strlcpy is on track for future standardization in POSIX,
               | for Issue 8, but even as a de facto standard, it exists
               | in libc on *BSD, macOS, Android, Solaris, QNX, and even
               | Linux using musl.
               | 
               | https://www.austingroupbugs.net/view.php?id=986#c5050
               | 
               | But you're correct in that it is not a replacement for
               | strncpy because no code should be using strncpy.
        
             | tedunangst wrote:
             | Take a step back and consider strlcpy isn't supposed to be
             | a drop in replacement for strncpy (a function which already
             | exists).
        
         | [deleted]
        
         | jabl wrote:
         | memccpy? Most platforms have it, and it's being added to C2X.
         | 
         | See https://developers.redhat.com/blog/2019/08/12/efficient-
         | stri...
        
       | paultopia wrote:
       | Its really wild, as a person coming from other languages who has
       | written maybe ten lines of C in his life that the functions that
       | seem to be massive footguns in C are, like, "format a string" or
       | "get time in GMT." That's... really scary.
        
         | Communitivity wrote:
         | I remember an entire lecture about the use and abuse of sprintf
         | and related functions as a means of exploit. Yeah, when you
         | delve into the internals of C you find things that are
         | terrifying if you are concerned about reliability, security, or
         | performance. The same is true though for many languages. The
         | problem is, as is often the case, the Iron Triangle: good,
         | fast, cheap - pick two. Different sections of the language are
         | written by developers under different constraints and
         | pressures, which leads to different choices. In my experience
         | every language implementation has at least one area that was
         | done quickly for expediency or done poorly because no one else
         | was able to (or wanted to) work on it.
        
         | throwaway09223 wrote:
         | Many of C's problems relate to string handling. These are all
         | legacy functions which have been replaced with safe
         | alternatives many decades ago.
         | 
         | strcpy() was replaced with a safer strncpy() and in turn has
         | been replaced with strlcpy().
         | 
         | The list is a ban of the less safe versions, where more modern
         | alternatives exist.
        
           | Kaze404 wrote:
           | Why are these functions deprecated in favor of others but not
           | removed? I know in Javascript this can happen so as to not
           | break older websites, but in a compiled language this
           | shouldn't be a problem right?
        
             | syncsynchalt wrote:
             | There are actually very few _dangerous_ functions in C
             | (gets is the only one that comes to mind). Others have
             | massive caveats (strncpy) but still have their place.
             | Others are just known to have certain gotchas (strcpy,
             | strcat, sprintf).
             | 
             | The reality of C is that if we deprecated every
             | objectionable function in the stdlib we wouldn't have
             | anything left.
        
             | maxlybbert wrote:
             | The C Standard Committee doesn't actually ship a compiler
             | the way the people behind Java, Python, Lua, C#, Go, Rust,
             | etc. do. The best they can do is deprecate particular
             | functions and hope compiler writers and standard library
             | writers follow along. But the compiler writers have vocal
             | customers who insist the depreciations are overly-cautious.
        
             | sudomakeup wrote:
             | Why wouldn't it be an issue with a compiled language?
             | 
             | Its nearly the exact same reasoning as "we're not going to
             | break older websites"
        
             | lalaithion wrote:
             | The expectation of a C89 programmer is that a valid C89
             | program can be compiled for any machine that has a C89
             | compiler, and likewise for C95, C99, C11, and C17.
             | Furthermore, it's expected that any C89 program can be
             | compiled unchanged on any future version of C, and the
             | standard library is part of the definition of the language,
             | and therefore functions cannot be removed.
        
               | DaiPlusPlus wrote:
               | At a certain point we have to say that _it's wrong_ for
               | someone to expect C89 should still be the LCD.
               | 
               | And yes: it should all still compile, but none of that
               | prohibits the compiler from issuing flashing red/yellow
               | warning messages to your terminal for using footgun
               | functions, preferably with uncomfortable audible
               | notifications too.
               | 
               | All of this is silly though, because even in a strict C89
               | environment you can still have your own safe wrappers
               | over the unsafe functions. I find that very little of
               | modern programming has a hard dependency on ultramodern
               | compiler features (e.g. you can theoretically build
               | React/Redux using only ES3 (1998ish) if you like.
               | Generics using type-erasure can be implemented with
               | macros. Etc.).
               | 
               | Also, C89 conformance doesn't mean much: you can have a
               | confirming C89 system that doesn't even have a heap - nor
               | a stack for autos! (IBM Z/series uses a linked-list for
               | call-frames, crazy stuff!)
        
             | pjc50 wrote:
             | In a compiled language, when you remove a function it fails
             | to compile. So removing them from the standard library
             | _forces_ code changes - they 're not usually drop in
             | replacements because the semantics were wrong in the first
             | place.
             | 
             | Removing strcpy would make the Python transition look easy.
        
             | badsectoracula wrote:
             | Removing anything breaks existing source code that has been
             | tested to work. After all just because something _may_ lead
             | to issues it doesn 't mean it will _always_ lead to issues.
             | 
             | Also in many systems the C library is linked dynamically
             | and shared among all programs so even though a program is
             | compiled it still relies on the underlying system to
             | provide the function.
             | 
             | Finally i'm certain that if a C standard removes something,
             | it'll be treated as the equivalent to that standard not
             | existing. C programmers are already a conservative bunch
             | without such changes.
        
             | gvx wrote:
             | It's not great if you're working on a new release and you
             | realize you also need to change something unrelated because
             | the language changed under you, especially if it's just a
             | bugfix but a high-priority one, or consider the head-aches
             | caused by source-only distributions suddenly breaking for
             | all your new users (or existing users switching to a new
             | computer or spinning up a fresh VM).
        
           | ChrisLomont wrote:
           | These still lead to lots of bugs via off by one errors on
           | lengths or other buffer misuse.
        
           | cestith wrote:
           | Still, unless you're writing something that has to be very
           | low-level all the way through, it's better to use a string-
           | handling library than the stdlib tools for strings.
        
             | stefan_ wrote:
             | The first thing you do is _not use any strings_. You 'll be
             | amazed how much you can get done in languages that aren't
             | so obsessively centered around stringified programming.
        
               | cestith wrote:
               | Most of the code I write has a spec of input and output
               | being some form of text. Still, I tend to write that in
               | languages that have safe string handling and drop into C
               | only when the profiler indicates that's useful.
               | 
               | When handling strings in C, it's useful to use the string
               | functions from glib or pull in one of the specifically
               | safe string handling libraries and not use any C stdlib
               | functions for strings at all.
               | 
               | There are a number of C strings libraries safer to use
               | than the standard library, and many of them are simpler,
               | more feature-rich, or both.
               | 
               | * https://github.com/intel/safestringlib (MIT licensed) *
               | https://github.com/rurban/safeclib (MITish) *
               | https://github.com/mpedrero/safeString (MIT licensed) *
               | https://github.com/antirez/sds (BSD 2-clause, and gives
               | you dynamic strings) * https://github.com/maxim2266/str
               | (BSD 3-clause) * https://github.com/xyproto/egcc (GPL
               | 2.0, includes GC on strings) *
               | https://github.com/composer927/stringstruct (GPL 3.0) *
               | https://github.com/c-factory/strings (MIT licensed) *
               | https://github.com/cavaliercoder/c-stringbuilder (MIT
               | licensed, does dynamic)
               | 
               | If one does use the C standard library directly for
               | handling strings, the advisories from CERT, NASA, Github,
               | and others should be welcome advice (CERT's advice, BTW,
               | includes recommending a safer strings library right off).
        
               | derefr wrote:
               | Yes, sure, write Unix CLI plumbing tools without strings.
        
               | pjc50 wrote:
               | Until you want to communicate with the user, filesystem,
               | or web.
        
               | Animats wrote:
               | It was a design decision of QNX that the kernel never
               | uses strings. Everything the kernel handles is fixed
               | length, except messages, and messages go from one user
               | process to another. The kernel does not allocate space
               | for them. I think they go that right.
               | 
               | There's a QNX user process that's always present, called
               | "proc", which handles pathnames and the "resource
               | managers", programs which respond to path names. But
               | that's in user space, and has all the tools of a user-
               | space program.
        
               | cestith wrote:
               | There are absolutely things that can be written without
               | string handling. Then again, there are things that can't.
               | Not handling strings in the kernel probably was a good
               | decision. That userland I'll bet has string handling
               | though, to be useful to users.
        
           | _kst_ wrote:
           | strncpy() is not a "safer" strcpy(). It can avoid some errors
           | involving writing past the end of the target array ( _if_ you
           | tell it the correct length for that array), but it 's not a
           | true string function, and it can leave the target
           | unterminated and therefore not a valid string.
           | 
           | http://the-flat-trantor-society.blogspot.com/2012/03/no-
           | strn...
        
             | rrauenza wrote:
             | I never could really understand the point of strncpy()...
             | we always end up wrapping to deal with writing an
             | unterminated string.
             | 
             | Was it intended for fixed length records?
        
               | [deleted]
        
               | tedunangst wrote:
               | It is for fixed length records, which is why it also
               | zeroes the remaining space.
        
               | ironmagma wrote:
               | Arguably naming it with "str" is itself a security
               | vulnerability.
        
               | tedunangst wrote:
               | No argument. At best it is a "string to fixed record"
               | function, hence the name, but it is not a string
               | function.
        
               | Someone wrote:
               | Yes. _strncpy_ was intended for copying file names into a
               | buffer that was only zero terminated when the name was
               | shorter than the maximum length of a file name in Unix
               | (14 bytes. See https://stackoverflow.com/a/1454071, https
               | ://devblogs.microsoft.com/oldnewthing/20050107-00/?p=36..
               | .)
               | 
               | You can also use it to overwrite part of an existing
               | string, but I think that's a side effect of the above.
        
             | throwaway09223 wrote:
             | In the interest of satisfying pedantry I think we can agree
             | that strncpy() is _intended_ to be a safer strcpy().
             | 
             | As you say, it does in fact obviate some errors. A value
             | judgement as to which errors are more or less safe may be
             | subjective, but the intent is not.
        
             | icedchai wrote:
             | This is true, and many people don't realize it. I used to
             | call a wrapper function that would always set the last byte
             | to 0.
        
         | draw_down wrote:
         | Now ponder how many people find that state of affairs
         | acceptable but also think JS is a terrible garbage language
         | that idiots like.
        
         | kazinator wrote:
         | gmtime is just not thread-safe that's all, since it returns a
         | static structure; gmtime_r is not banned.
        
           | syncsynchalt wrote:
           | Thanks, I am now a decade out of the C game and I was
           | wracking my brain on what the problem with gmtime would be.
           | My best guess was dodgy is_dst portability /shrug
        
         | cperciva wrote:
         | A better way of looking at it is that functions which expose
         | very simple operations were among the first ones to be placed
         | into the standard library -- and consequentially are the least
         | well thought out.
        
         | jchw wrote:
         | Unfortunately, much of the pain with C surrounds dealing with
         | strings. It's been a bit of a theme on Hacker News for the past
         | few days, but it's actually a pretty good spotlight on
         | something I feel is not always appreciated - strings in C are
         | actually hard, and even the most safe standard functions like
         | strlcpy and strlcat are still only good if truncation is a safe
         | option in a given circumstance (it isn't always.)
         | 
         | (~~Technically~~ Optionally, C11 has strcpy_s and strcat_s
         | which fail explicitly on truncation. So if C11 is acceptable
         | for you, that might be the a reasonable option, provided you
         | always handle the failure case. Apparently, though, it is not
         | usually implemented outside of Microsoft CRT.)
         | 
         | edit: Updated notes regarding C11.
        
           | masklinn wrote:
           | > Technically C11 has strcpy_s and strcat_s
           | 
           | "Theoretically" is the word you're looking for: they're part
           | of the _optional_ Annex K so technically you can 't rely on
           | them being available in a portable program.
           | 
           | And they're basically not implemented by anyone but microsoft
           | (which created them and lobbied for their inclusion).
        
             | jchw wrote:
             | I didn't know that it was Microsoft that lobbied for them;
             | that perplexes me since I thought Microsoft's version of
             | them were a bit different (for example, I think C11's
             | explicitly fail on overlapping inputs where Microsoft
             | specifies undefined behavior) and because Microsoft didn't
             | bother supporting C99 for the longest time. (Probably still
             | don't, since VLA was not optional in C99, IIRC. I think
             | Microsoft was right to avoid VLA, though.)
        
           | InvOfSmallC wrote:
           | I teach at university as external lecturer. Teaching strings
           | in C is the hardest thing I have to do every time. The
           | university decided to explain C to first year student without
           | previous experience. My feedback was to do a precourse in
           | Python to let them relax a bit with programming as a concept
           | and then teach C in a second course.
        
             | kazinator wrote:
             | > _I teach at university as external lecturer. Teaching
             | strings in C is the hardest thing I have to do every time._
             | 
             | But if you keep up the good work you will one day go from
             | extern void *lecturer;
             | 
             | to                 static const lecturer;
        
             | ritmatter wrote:
             | +1, my university's program seemed to work well with
             | "program anything" (Python), "program with objects" (Java),
             | "program some cool lower-level stuff" (C)
        
             | gravypod wrote:
             | Sorry to bug you since this is unrelated. I'm a huge fan of
             | teaching others and I was wondering how you got to be an
             | external lecturer at a college? I'd love to teach classes
             | related to software engineering and data structures. Would
             | you mind emailing me (in my profile) about this?
        
             | _the_inflator wrote:
             | Yep, agree. I used a lot of assembler on C64 and Amiga
             | until I touched so called high level programming languages
             | for the first time. For me thinking in strings was really a
             | weird concept.
             | 
             | Nowadays I find it extremely strange to think of bits and
             | bytes when being confronted with strings.
        
             | austinl wrote:
             | Most of the C I wrote was while in college. I think
             | understanding the question, "why are strings in C hard?" is
             | a good gateway to understanding how programming languages
             | and memory work generally. I agree with you though that
             | teaching C as introductory is probably not the best -- our
             | "Programming in C" course was taken in sophomore year.
             | 
             | I wouldn't want to use it my day job, but I'm glad that it
             | was taught in university just to give the impression that
             | string manipulation is not quite as straightforward as it's
             | made to appear in other languages.
             | 
             | The early days of Swift also reminded me of this problem -
             | strings get even more challenging when you begin to deal
             | with unicode characters, etc.
        
             | orwin wrote:
             | In my school, we had two days to understand the basics of
             | text editors, git (add, commit, rebase, reset, push) and
             | basic bash functions (ls, cd, cp, mv, diff and patch, find,
             | grep...) + pipes, then a day to understand how while,
             | if/else and function calls work, then a day to understand
             | how pointer work, then a day to understand how malloc(),
             | free() and string works (we had to remake strlen, strcpy,
             | and protect them). Two days, over the weekend, to do a
             | small project to validate this.
             | 
             | Then on the monday, it was makefiles if i remember
             | correctly, then open(), read(), close() and write(). Then
             | linking (and new libc functions, like strcat) . A day to
             | consolidate everything, including bash and git (a new small
             | project every hour for 24 hours, you could of course wait
             | until the end of the day to execute each of them). And then
             | some recursivity and the 8 queen problem. Then a small
             | weekend project, a sudoku solver (the hard part was to work
             | with people you never met before tbh).
             | 
             | The 3rd week was more of the same: basic struct/enums
             | exercises, then linked list the next day, maybe static and
             | other keyword in-between. I used the Btree day to
             | understand how linked list worked (and understand how did
             | pointer incrementation and casting really work), and i
             | don't remember the last day (i was probably still on linked
             | lists). Then a big, 5-day project, and either you're in, or
             | you're out.
             | 
             | I assure you, strings were not the hardest part. Not having
             | any leaks was.
        
               | PoignardAzur wrote:
               | Ooh, the Epitech cursus. Nice.
               | 
               | Also, I'd say "not having segfaults" is the hardest thing
               | to get right when you're going through that.
        
           | liuliu wrote:
           | Yeah. I just avoid str manipulations in general in C and when
           | I have to, fuzz it ... (but still, the perf cliff is
           | definitely new to learn in the past few days).
        
           | swlkr wrote:
           | I'm partial to https://github.com/antirez/sds these days
        
           | macjohnmcc wrote:
           | strcpy is a coding challenge where I work for interviews. I
           | typically ask them to write it as the standard version and
           | ask them why they might not want to use it to see if they are
           | aware of the risks. After that I ask them to modify the code
           | to be buffer safe. And for those claiming C++ knowledge ask
           | them to make it work for wchar_t as well to see if they can
           | write a template. Some people really struggle with this.
        
         | IgorPartola wrote:
         | This is a lot like how in JavaScript you have footguns like the
         | with statement or in Python 2 where you have Unicode issues,
         | etc. I am sure we could definitely a new C standard that
         | excludes these functions as obsolete, but the linked header
         | file is a pretty sensible interim solution. C is an old
         | language and it's kind of amazing that code written 30 years
         | ago can still by and large be compiled by a modern compiler.
         | Ever try to run 3 year old React projects using today's React?
         | :)
        
           | detaro wrote:
           | Because individual libraries choosing to change quickly is
           | comparable to language stability how? The relevant comparison
           | would be "run a 3y old react app (or a 20 year old website
           | using JS) in a modern browser or interpreter"
        
             | _the_inflator wrote:
             | Yes, and it would still run fine I guess. I think only
             | eval() changed over time. APIs and so on are still the same
             | except for some Netscape stuff.
        
           | ggregoire wrote:
           | > in JavaScript you have footguns like the with statement
           | 
           | I've been coding in JS on a daily basis for more than 10
           | years and today I learned there is a `with` statement in JS.
           | 
           | https://developer.mozilla.org/en-
           | US/docs/Web/JavaScript/Refe...
           | 
           | Edit: well, seems like it's been deprecated/forbidden since
           | ES5 (2009), so it makes sense I've never seen it.
        
             | GordonS wrote:
             | And me around 20 years - also never even heard of the
             | `with` statement! I think to qualify as a footgun, people
             | actually need to be using it in the real world.
        
           | viklove wrote:
           | It amuses me that HN hates JS so much, that even a topic
           | about problems with C turns into a JS-bashing thread.
           | 
           | Also, I just want to remind you that JS isn't just React.
           | There are plenty of libraries written in C that introduce
           | breaking changes over the course of 3 years. Nothing will
           | stop people from finding ways to complain about JS though, I
           | know. The hate-boner is very real.
        
             | sadgrip wrote:
             | I think in most cases it's probably not hate but a deep,
             | deep love.
        
               | jrimbault wrote:
               | JavaScript, LISP under C disguise. No wonder it's
               | "popular" on HN.
               | 
               | Assorted musing : Rust, OCaml under C disguise.
        
             | orwin wrote:
             | I think most people on HN like Javascript, or at least its
             | idea? I mean, its a very C-like functionnal language,
             | especially since ES6 put Js on the right road (for me at
             | least)?
        
             | lliamander wrote:
             | I appreciate Javascript's LISPy qualities, but it has an
             | inordinate number of footguns and a relative lack of
             | standard, stable libraries. Coming from languages like Java
             | and Erlang that are relatively scrupulous about such things
             | is a bit jarring.
             | 
             | I do like Typescript though, as it adds some really nice
             | ergonomics.
        
         | matheusmoreira wrote:
         | Yeah, because of NUL-terminated strings. They cause so many
         | problems it's not even funny. Even something simple like
         | computing the length of the string is a linear time operation
         | that risks overflowing the buffer. People attempted to fix
         | these problems by creating variations of those functions with
         | added length parameters, thereby negating nearly all benefits
         | of NUL-terminated strings.
         | 
         | Why can't we just have some nice structures instead?
         | struct memory {           size_t size;           unsigned char
         | *address;       };            enum text_encoding {
         | TEXT_ENCODING_UTF8, /* ... */ };            struct text {
         | enum text_encoding encoding;           struct memory bytes;
         | };
         | 
         | All I/O functions should use structures like these. This alone
         | would probably prevent an incredible amount of problems. Every
         | high-level language implements strings like this under the
         | hood. Only reason C can't do it is the enormous amount of
         | legacy code already in existence...
        
           | guerrilla wrote:
           | That would be nice. You hit on the other hell with C strings:
           | modern encodings where wchar_t and mb* are useless and
           | replacements essentially don't exist yet with char8_t,
           | char32_t etc. Then there's the locale chaotic nonsense [1]. A
           | new libc starting fresh would be nice.
           | 
           | 1. https://github.com/mpv-
           | player/mpv/commit/1e70e82baa9193f6f02...
        
         | Camillo wrote:
         | Many of the problems with C descend from a common root, the
         | decision to use bare pointers (memory addresses) as the basic
         | way to refer to strings, arrays etc.
         | 
         | If they had used a {pointer, size} pair instead, it would have
         | avoided all of these string problems, most buffer overflows,
         | even the GTA Online loading problem that was on HN recently.
        
           | cb321 wrote:
           | For what it's worth, while what @Camillo says is both true
           | and important, people usually do not mention the trade offs
           | involved or why that decision was attractive at the time.
           | 
           | These days (ptr,size) is probably 16 bytes -- longer than
           | almost all words in the English language (the scrabble
           | SOWPODS maxes out at 15). A pointer alone is 8B. Back at the
           | dawn of C in 1970, memory was 6..7 orders of magnitude more
           | expensive than today..maybe more inflation adjusted. (Today,
           | cache memory can be almost as precious, but I agree that the
           | benefits of bounded buffers probably outweigh their costs.)
           | 
           | 8B pointers today are considered memory-costly enough "in the
           | large" that even with dozens of GiB machines common, Intel
           | introduced an x32 mode to go back to 32-bit addressing aka 4B
           | pointers. [1] There are obviously more pointers than just
           | char* in most programs, but even so.
           | 
           | Anyway, trade offs are just something people should bear in
           | mind when opining on the "how it should be"s and "What kind
           | of wacky drugs were the designers of language XYZ on?!!?".
           | 
           | [1] https://stackoverflow.com/questions/9233306/32-bit-
           | pointers-...
        
           | Animats wrote:
           | Pascal, which had sized strings, was in wide use before C.
           | Many people, including Bill Atkinson, who wrote many of the
           | original Macintosh applications, thought C was a step
           | backwards.
           | 
           | Pascal, to save one byte, limited strings to length 255. Bad
           | decision.
        
         | [deleted]
        
         | SavantIdiot wrote:
         | If you list the languages you use, I'd be happy to point out
         | the "footguns" in each of them. For all the warts on C, there
         | really is no language that can compete for what it has
         | accomplished over ~50 years.
         | 
         | Recall that during the rise of C, people were writing machine
         | code on punch cards. Assembly -> Machine code has far more
         | footbullets than C, it is a tradeoff between hand holding and
         | tiny fast code.
         | 
         | Wow, this blew up.
         | 
         | To all the people popping off about how great other languages
         | are, tell me: when will we see the Unreal Engine written in
         | Python, or Pascal, or Algol, or Rust, or Go... the next big
         | step is WebASM (or .cu), and that's way more footbullet-y than
         | C. And what is the native language all of your sub-30 year old
         | interpreted languages were written in? Thank you!
        
           | eschaton wrote:
           | This is a grossly inaccurate description of computing at the
           | time of the rise of C. C was competing with Pascal/Modula,
           | BLISS, PL/I, BCPL, and so on, not assembly on punched cards.
           | 
           | The "C competing with assembly" meme was very specific to
           | _microcomputer_ game and operating system development, not
           | more general microcomputer application development, and not
           | to minicomputer or mainframe development.
        
             | JoeAltmaier wrote:
             | Mainframes very quickly were outclassed by minicomputers.
             | They could not respond quickly to technology changes as
             | fast. C was indeed king for decades.
        
           | rodgerd wrote:
           | There's far more critical code in the world running on COBOL
           | and s3[79]0 assembler. COBOL is vastly more important than C.
        
             | varjag wrote:
             | Nearly everything around you runs code that was written in
             | C, and absolutely nothing you can actually see runs COBOL
             | code.
        
             | mkipper wrote:
             | _citation needed_
             | 
             | I'm sure there's a lot of important things that rely on
             | COBOL, but by most definitions of "critical", I think this
             | is way off the mark.
        
               | burnished wrote:
               | COBOL is still used in many banking systems such as ATMs.
               | These are 'critical' systems by most any definition of
               | the word 'critical'.
        
               | varjag wrote:
               | That's a hugely broad definition of critical, enough to
               | encompass most of business and finance software.
        
             | slt2021 wrote:
             | which language z/OS is written in?
        
           | cygx wrote:
           | _Recall that during the rise of C, people were writing
           | machine code on punch cards._
           | 
           | Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...
        
           | samatman wrote:
           | Your edit really isn't helping your case.
           | 
           | Those of us who have always known about less dangerous
           | 'system' languages (Pascal probably being the most popular)
           | lament the fact that so much code got written in C instead.
           | 
           | It wasn't inevitable. It was preventable! It just didn't
           | happen that way for reasons which are largely historical.
           | 
           | I don't work for the Rust Evangelism Strike Force, my main
           | project is written in (as little) C (as possible), but I beg
           | anyone who has a choice: use something else! Rust is... fine,
           | Zig is promising. Ada still works!
           | 
           | Writing out the set {Python, Pascal, Algol, Rust, Go} tempts
           | me to say uncharitable things about your understanding of the
           | profession, but I accept you were just being snarky so I'll
           | just gesture in the direction of how $redacted that is.
        
           | Gibbon1 wrote:
           | My favorite assembly foot gun was a guy I worked with had a
           | cute routine. You had a call to the routine, followed by a
           | null terminated string after that. The routine would spit the
           | string to the terminal. And then return to the location after
           | the string.
           | 
           | He had some bug where in one place it returned to the start
           | of the string, executed it, and kept going. The end result
           | just happened to be a nop. Had been like that in production
           | for a couple of years.
        
           | atoav wrote:
           | Yeah there are footguns in every language. But this is not a
           | boolean question about the presence of footguns, this is
           | about how much one has to know to be able to handle a
           | language safely.
           | 
           | I know C/C#/Python/Rust/Javascript.
           | 
           | After a decade of using C I am still not totally sure if I
           | didn't dangle a pointer somwhere in precisely the wrong way
           | to create havoc. And yeah, that means I have to get better,
           | etc. But that is not the point. The point is, that even with
           | a lot of experience in the language you can still easily
           | shoot yourself into the foot and don't even notice it.
           | 
           | Meanwhile after a month of using Rust I felt confident that I
           | didn't shoot myself in the foot, because I know what the
           | compilers e.g. ownership guarantuees. While in C shooting
           | myself into the foot happen quite often in Rust I would have
           | to specifically find a way to shoot myself into the foot
           | without the compiler yelling at me, and quite frankly I
           | havent found such a way yet.
           | 
           | Javascript is odd, because the typesystem has quite a few
           | footguns in it. This is why such things like Elm or
           | Typescript exist: to avoid these footguns.
           | 
           | I don't want to take away from the accomplishments of C, and
           | I still like the language, but to claim it is equally likely
           | in all languages to shoot yourself into the foot is not true.
        
           | maerF0x0 wrote:
           | Not that I dont believe there are any, but I'd love to hear
           | your perspective...
           | 
           | Go (golang)
        
             | SavantIdiot wrote:
             | Well, shit. Got me there.
        
             | boolemancer wrote:
             | defer having function scope instead of, well, scope scope.
             | 
             | Using defer to unlock locks can lead to some fun deadlocks
             | if you don't realize the issue with the scope, and it's
             | completely unintuitive to someone with experience with
             | other implementations of similar concepts.
        
             | crimper wrote:
             | channel programming and the races caused by closing
             | channels. channels seem nice and easy until they don't.
             | 
             | the whole var/:=/= assignment combined with the error
             | handling style and the shorthand is another one
        
               | maerF0x0 wrote:
               | yeah the lack of determinism in selecting a channel can
               | be tricky for causing bugs where order matters. Luckily
               | in smaller cases you're likely to encounter them as
               | flakey tests (eg 1/2 the time)                  select {
               | case <-ch1:                 case <-ch2:          }
        
               | dpatterbee wrote:
               | Only close channels when trying to tell the receiver that
               | you're not sending more data. Otherwise let the garbage
               | collector deal with it. Channels seem easy until they
               | don't until they do again in my experience.
               | 
               | Don't understand your second point.
        
           | badsectoracula wrote:
           | > when will we see the Unreal Engine written in...
           | 
           | Why would a huge C++ (not C, btw) codebase with roots going
           | back to the 90s be rewritten in any other language?
           | 
           | And in fact how is the language Unreal Engine written in
           | relevant to C having footguns?
        
         | ironmagma wrote:
         | Yeah, there is a culture of complacency in C probably owing to
         | the enormous historical baggage of legacy code that has to be
         | supported and the blurred line between stdlib and system call.
        
           | Spivak wrote:
           | I mean on Linux you're not encumbered by this because the
           | syscall api is stable but in practice most GNU/Linux distros
           | assume glibc. You can't correctly resolve a hostname on Linux
           | without farming out to glibc -- hell even the kernel punts to
           | userspace for dns names but you can technically ignore it if
           | you want.
           | 
           | On BSDs and macOS you're always SOL because the syscall api
           | isn't stable and only the C wrappers are.
        
           | dangerbird2 wrote:
           | c standard library doesn't really relate directly to system
           | calls (at least in modern os'es). In particular, the stdio.h
           | functions are buffered by default, while their system call
           | analogues are not. For unixes, system call wrappers are
           | typically found in <unistd.h>, not the "official" c standard
           | library
        
           | freedomben wrote:
           | I disagree completely. Devs who use C are the least
           | complacent about security in my experience. The problems are
           | from previous eras before they knew about many of these
           | things. A ton of people in modern languages couldn't name a
           | single dangerous function, though they do exist in every
           | language. You'd be amazed at how many race condition vulns
           | result from TOCTOU errors just in authentication, or checking
           | for the existence of a file before opening it, etc.
           | 
           | It's absolutely true that decades ago the C community was
           | complacent, but it's not true now. Source: I taught secure
           | coding in C/C++ in the 00s.
        
             | IgorPartola wrote:
             | What you said. Nobody is complacent. Anyone who thinks the
             | Linux or OpenBSD (etc.) kernel developers take the lazy way
             | out is talking about a thing they know little about. I do
             | think better languages than C exist and maybe could even be
             | used as a basis for new systems. But I have yet to see a
             | mature OS that's as secure and as performant as these.
             | Closest might be the chips I've seen that have an embedded
             | Java byte code interpreter.
        
               | ironmagma wrote:
               | I agree in principle but think these security-focused C
               | developers are focusing on the trees for the forest.
               | Every developer having the responsibility of cultivating
               | their own pet list of banned functions is, frankly, NOT
               | the way to achieve security. Those things need to be
               | enforced at the widest level possible (OS, or language)
               | to have the needed effect.
        
           | dangerbird2 wrote:
           | It's not really complacency: it's that the standard library
           | is intentionally minimalistic to maintain portability and
           | backwards compatibility. If you want sensible string
           | handling, it's usually best to use a high level utility
           | library like GLib(https://developer.gnome.org/glib/stable/)
           | or Apache Portable Runtime(http://apr.apache.org/), or roll
           | your own safe string type (preferably non-null terminating)
        
             | yxhuvud wrote:
             | No, if you want sensible string handling, the sane choice
             | is usually to choose to use a language that is not C. Not
             | always, but definitely usually.
        
               | IgorPartola wrote:
               | It's not hard to have strings like you do in other
               | languages in C. It is hard when you treat _char foo[]_ as
               | if it was a string object like you have in JavaScript or
               | Java or Python. C strings are just chunks of memory
               | terminated by \0. They can still be mildly useful that
               | way but if you actually want to do string operations you
               | need to use a library designed for the problem (variable
               | length, storing length with the object, Unicode support,
               | etc.). Problem is that most people don't start with such
               | a library so they end up doing the hard work themselves
               | in an ad hoc manner.
               | 
               | You can't fuck up _String("Hello ") + String("world")_
               | but you can definitely fuck up _strcat(buf, "Hello ");
               | strcat(buf, "world");_.
        
             | Ar-Curunir wrote:
             | there's nothing inherently unportable about strings though.
        
             | ironmagma wrote:
             | Why do you need backward compatibility with a compiled
             | language? Other languages like Rust and JavaScript (even)
             | avoid that with a pragma tag on the source.
        
               | hctaw wrote:
               | Because not everything is recompiled from source. That's
               | why stable ABIs need to exist.
        
               | ironmagma wrote:
               | Good point, thanks. Could the headers contain the
               | pragmas?
        
               | hctaw wrote:
               | That assumes you have a header, which only exists at
               | compile time for the developer. The running program knows
               | nothing about it.
        
               | ironmagma wrote:
               | Why would a program need to know (e.g.) the details of
               | what system calls or stdlib functions that a procedure it
               | invokes uses? Aren't C functions pretty well separated
               | from each other except for the odd signal handler and
               | assuming a stable ABI? In my view most of the issues with
               | C are semantics within the function blocks.
        
               | rightbyte wrote:
               | The parameters and return value is not in the object
               | files.
        
         | oleganza wrote:
         | Notice that it's a giant PITA to work with any variable-length
         | data. Because language lacks adequate means to abstract away
         | safe fast memory access with generic types, RAII and borrow
         | checkers. Comparing to C, both C++ and Rust (very different
         | beasts) feel like pals of JavaScript: basic operations with
         | dynamic strings and arrays just work(tm).
        
         | frob wrote:
         | As someone who learned C as their first language, strings in
         | every single language after that have felt like cheating.
         | 
         | "What? You mean I can type an arbitrary string and it works? I
         | don't need to worry about terminators or the amount of memory
         | I've allocated? You can concatenate two strings with +?!? What
         | is this magic?"
        
           | macintux wrote:
           | Yeah, every time I decide to play with C for nostalgia's
           | sake, I immediately get hung up on just how painful
           | everything is, especially strings.
           | 
           | I still love C, but I'd do my best not to have to write
           | anything serious with it again.
        
         | munchbunny wrote:
         | The decision to make C strings null terminated with implied
         | length instead of length + blob continues to trip us up, 30+
         | years later. There's a good reason the "safe" versions of those
         | functions all take length parameters. But way back when this
         | approach was chosen, I don't think the state of the art could
         | fully predict this outcome.
         | 
         | But also, "strings" and "time" are actually very complex
         | concepts, and these functions operate on often outdated
         | assumptions about those underlying abstractions.
        
           | jrimbault wrote:
           | 30+ years -> 50+ years
           | 
           | Funny mind thing to forget to increment counters each year.
        
             | segf4ult wrote:
             | C89 was 32 years ago, so I think saying 30+ years is fair.
        
               | lamontcg wrote:
               | Some of us learned C off of the original K&R book.
        
           | coliveira wrote:
           | Null terminated strings are remnants of an era when computers
           | had little memory available. So, at the time it seemed smart
           | to discard the length field and use a single byte-sized
           | terminator (null). If you are writing an operating system for
           | a machine with little memory to spare, this seems like a good
           | decision. Of course things are very different now when memory
           | is not a problem and the goal is safety.
        
           | Blikkentrekker wrote:
           | > _But also, "strings" and "time" are actually very complex
           | concepts, and these functions operate on often outdated
           | assumptions about those underlying abstractions._
           | 
           | Even in safer languages such as _Rust_ , there are often
           | quaestions as to why certain string operations are either
           | impossible, or need to be quite complicated for a rather
           | simple operation and are then met with responses such as
           | "*Did you know that the length of a string can grow from a
           | capitalization operation depending on locale settings of
           | environment variables?
           | 
           |  _P.s._ : In fact, I would argue that strings are not
           | necessarily all that complicated, but simply that many assume
           | that they are simpler than they are, and that code that
           | handles them is thus written on such assumptions that the
           | length of a string remain the same after capitalization, or
           | that the result not be under influence of environment
           | variables.
        
             | munchbunny wrote:
             | > locale settings of environment variables
             | 
             | Also known as "why does my code that parses floats fail in
             | Turkey?"
             | 
             | Also also known as the discrepancy between a string's
             | length-as-in-bytes, its length-as-in-code-points, and its
             | length-as-in-how-humans-count-glyphs.
             | 
             | Strings are hard.
        
               | kazinator wrote:
               | > _Why does my code that parses floats fail in Turkey_
               | 
               | Because you, or someone, called
               | fuck_my_program();
               | 
               | which is defined in "idiot.h" as                 #define
               | fuck_my_program() setlocale(LC_ALL, "")
               | 
               | and the project is missing:                 #define
               | setlocale(x, y) BANNED(setlocale)
               | 
               | Hope that helps!
        
           | retrac wrote:
           | For reasons that were never clearly articulated, the prefix
           | approach was considered odd, backwards, and to have numerous
           | downsides, at least where I learned C. In hindsight, I can
           | only cringe at that attitude. Strings as added in later
           | Pascal, about 40 years ago now, were memory safe in a way
           | that C strings still are not.
        
             | lordgroff wrote:
             | Oh Pascal, why couldn't we have had you instead.
        
             | kazinator wrote:
             | Pascal strings are not inherently memory safe:
             | cat_pascal_strings(pascalstr *uninited_memory,
             | pascalstr *left,                           pascalstr
             | *right);
             | 
             | how big is uninited_memory? Can left and right fit into it?
             | 
             | You need to design language constructs around Pascal srings
             | to make them actually safe. Such as, oh, make it impossible
             | to have an uninitialized such object. The object has o know
             | both its allocation size and the actual size of the string
             | stored in it.
             | 
             | What is unsafe is constructing new objects in an anonymous
             | block of memory that knows nothing about its size.
             | 
             | C programs run aground there not just with strings!
             | struct foo *ptr = malloc(sizeof ptr);  // should be sizeof
             | *ptr!!             if (ptr) {           ptr->name = name;
             | ptr->frobosity = fr;
             | 
             | Oops! The wrong size of allocated only the size of a
             | pointer: 4 or 8 bytes, typically nowadays, but the
             | structure is 48 bytes wide.
             | 
             | "struct foo" itself isn't inferior to a Pascal RECORD; the
             | problem is coming from the wild and loose allocation side
             | of things.
             | 
             | Working with strings in Pascal is relatively safe, but
             | painfully limiting. It's a dead end. You can't build
             | anything on top of it. Can you imagine trying to make a
             | run-time for a high level language in Pascal? You need to
             | be in the driver's seat regarding how strings work.
        
             | munchbunny wrote:
             | The prefix approach turns the neat "strings are just
             | character arrays are just pointers" pattern into something
             | a lot more clunky, because now you've got this really basic
             | data type that is actually a struct and now you have to
             | have an opinion on how wide the length value is and short
             | strings get a lot of memory overhead in just lengths, and
             | so on.
             | 
             | In hindsight, I think the complexity is worth the safety,
             | but I could see why it felt more elegant to use null-
             | terminated strings at the time.
        
               | jdlshore wrote:
               | It's a classic case of moving the complexity from one
               | part of the system to another. "Strings are just
               | character arrays" seems simple and elegant, but in
               | reality is a giant mess, because strings are not just
               | character arrays, any more than dates are just an offset
               | from an epoch.
               | 
               | Human concepts are inherently messy. "Elegant" solutions
               | just shove the mess down the road.
        
             | JoeAltmaier wrote:
             | Hey, languages used length,blob even when C was invented.
             | HP Access BASIC used that kind.
             | 
             | It was a limitation, because they chose a byte length (to
             | save space). So strings up to 255 characters only. It was
             | decades before folks were comfortable with 32-bit length
             | fields. And that still limited you to 4GB strings. In the
             | bad old days, memory usage was king.
        
               | selfhoster11 wrote:
               | The funny thing is that you can just use the topmost bit
               | of the length to indicate that the string length is >127,
               | and chain as many length bytes as you want before you
               | begin the string proper (to save space). It would be
               | still a better encoding than a null at the end.
        
               | [deleted]
        
           | kazinator wrote:
           | The reason that the safe functions take length parameters is
           | that they produce a new object in uninitialized memory, a
           | pointer to which is specified by the caller.
           | 
           | It has nothing to do with null termination.
           | 
           | And _that_ uninitialized memory is not self-describing in any
           | way in the C language. Which is that way in machine language
           | also.
           | 
           | This is a problem you have to bootstrap yourself somehow if
           | you are to have any higher level language.
           | 
           | The machine just gives you a way to carve out blocks of
           | memory that don't know their own type or size. C doesn't
           | improve on that, but it is not the root cause of the
           | situation. Without C, you still have to somehow go from that
           | chaos to order.
           | 
           | Copying two null terminated strings _into an existing null-
           | terminated string_ can be perfectly safe without any size
           | parameters.                  void replace_str(char *dest_str,
           | const char *src_left, const char *src_right);
           | 
           | If dest_str is a string of 17 characters, we know we have 18
           | bytes in which to catenate src_left and src_right.
           | 
           | This is not very useful though.
           | 
           | Now what might be a bit more useful would be if dest_str had
           | two sizes: the length of string currently stored in it, and
           | the size of the underlying storage. This particular operation
           | would ignore the former, and use the latter. It could replace
           | a string of three characters with a 27 character one.
        
       | amir734jj wrote:
       | Maybe instead of just writing a banned message, it should be the
       | name of alternative function to use.
        
       | [deleted]
        
       | 1337_d00dZ wrote:
       | In compilers that implement GCC extensions (such as Clang), you
       | can use the "poison" directive to achieve the same effect (but
       | with a better error message):
       | 
       | #pragma GCC poison printf sprintf fprintf
       | 
       | [0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html
        
       | shadowgovt wrote:
       | To its credit, it's convenient that the C pre-processor is so
       | powerful that it facilitates baking a "C the good parts" concept
       | directly into the compilation process.
        
       | rcgorton wrote:
       | But it isn't even April 1 yet! This is truly a BAAAD joke. So GIT
       | is not implemented in C? Or C++?
        
       | snvsn wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=20792938
        
       | StillBored wrote:
       | These functions are one of the many reasons why I tend to have a
       | C with some C++ classes dialect I use in my own projects.
       | 
       | std::string needs some tweaks, but it can mostly be treated as a
       | built in and it wipes out a huge set of C string issues.
        
       | jancsika wrote:
       | I love seeing "strncpy" right after "strcpy."
       | 
       | If someone wants some fun, try this:
       | 
       | 1. Slurp up all the FOSS projects that extend back to 90s or
       | early 2000s.
       | 
       | 2. Filter by starting at earliest snapshot and finding
       | occurrences of strcpy and friends who don't have the "n" in the
       | middle.
       | 
       | 3. For those occurrences, see which ones were "fixed" by changing
       | them to strncpy and friends in a later commit somewhere.
       | 
       | 4. See if you can isolate that part of the code that has the
       | strncpy/etc. and run gcc on it. Gcc-- for certain cases (string
       | literals, I think)-- can report a warning if "n" has been set to
       | a value that could cause an overflow.
       | 
       | I'm going to speculate that there was a period where C
       | programmers were furiously committing a large number of errors to
       | their codebases because the "n" stands for "safety."
        
         | commandlinefan wrote:
         | Ok, memcpy(dst, src, strlen(src)) it is then!
        
         | gilbetron wrote:
         | Meh, most of us understood the sharp edges of strings pretty
         | well. Before, we'd check the len of strings before strcpy,
         | strncpy let us do it without doing that, and just slap a 0 in
         | if needed. Safe? No. Better? A bit. Do I ever want to do string
         | manipulation again with C? Nope.
        
           | tomjakubowski wrote:
           | Understanding the sharp edges is one thing. Being able to
           | avoid them in practice is another. The history of memory
           | safety problems in C string handling, especially involving
           | strcpy/strncpy, strongly suggests to me that they're
           | unavoidable even for skilled, knowledgeable, and experienced
           | C programmers.
        
       | Luyt wrote:
       | It would be great if the BANNED() macro could suggest the correct
       | function to use.
        
         | tinus_hn wrote:
         | You could send a pull request, it doesn't seem too complicated
         | to implement
        
       | lmilcin wrote:
       | To respond to some of the comments.
       | 
       | It is not that there is anything intrinsically wrong with these
       | functions. You can technically use all of them and I have been
       | using all of them, safely, for decades.
       | 
       | The issue is they are huge traps to the point that in a larger
       | piece of software one can say "well, it's just not worth it".
       | 
       | You can go much, much, much further than that.
       | 
       | In couple embedded projects I worked some of the rules were:
       | 
       | * dynamic allocation after application has started is banned --
       | any heap buffers and data structures must be allocated at the
       | start of the application and after that any allocation is a
       | compile time error,
       | 
       | * any constructs that would prevent statically calculating stack
       | usage were banned (for example any form of recursion except when
       | exact recursion depth is ensured statically),
       | 
       | * any locks were banned,
       | 
       | * absolutely every data structure must have size ensured, in a
       | simple way, beyond any reasonable doubt,
       | 
       | etc.
        
         | whatisthiseven wrote:
         | It is interesting to read the rules you came up with to limit
         | memory usage, and then to think of the criticisms one gets in
         | Java for limiting memory usage. In Java we try to limit new as
         | much as possible to prevent the GC from pausing too much, or
         | inconveniently, or for too long. And basically all the rules
         | you say are what we also use in Java.
         | 
         | Except when you have these rules in Java, the ironic counter-
         | point is "if you are doing this much memory control yourself,
         | you should just use C or C++ or something".
         | 
         | I'll keep your comment in mind next time I see that rebuttal.
         | Thank you.
        
         | zwieback wrote:
         | The stack thing was always the big worry for me. Without a
         | comprehensive static code analysis tool that's hard to do. And
         | runtime stack checking adds quite a bit of overhead, especially
         | if you also have to worry about running on the interrupt stack
         | and possibly switching.
        
         | xondono wrote:
         | Anything enforcing MISRA has essentially (almost) no way of
         | allocating memory at runtime.
        
           | fsociety wrote:
           | It's funny, I worked exclusively with MISRA at the start of
           | my career. Eventually I started a job at a FAANG and received
           | quizzical comments on why I implemented a memory arena.
           | 
           | The argument was to allocate memory freely and let it pool
           | memory as necessary. Fair enough, it was simpler and fit the
           | standard expectation of development.
           | 
           | The issue is that if you talk with the allocator team they
           | complain of not being able to fix performance issues fast
           | enough due to allocations firing off left and right in the
           | middle of a request.
           | 
           | I never realized that my view of C programming is heavily
           | influenced by MISRA until your comment.
           | 
           | I know game engine programming follows a similar, perhaps
           | unspoken, convention.
        
             | munchbunny wrote:
             | The lack of runtime allocations in game engine programming
             | comes from a different motivation: allocations are
             | expensive, garbage collections are expensive, cache
             | coherency matters, and you're chucking around a lot of very
             | similar looking objects, so... object pools!
        
               | orwin wrote:
               | Yeah, the first time we coded a scroller shooting game
               | with my friend (at school), we were baffled that our
               | terminal-based scroller lagged more than the raycaster we
               | did two weeks prior. Was it a C vs C++ thing?
               | 
               | Turns out, creating then destroying every single
               | missile/enemy was extremely costly
        
             | hctaw wrote:
             | Custom allocators are quite common, it's not an arcane
             | convention. I think the rule of thumb is preallocate until
             | it gets questionable in complexity, then write your custom
             | allocator - and really it's only applicable to code with a
             | real-time deadline (hard or soft). Otherwise the system
             | allocator is going to be a lot smarter than yours once it
             | leaves microbenchmarks.
        
         | closeparen wrote:
         | How often does the dynamic allocation rule lead to an ad-hoc
         | allocator appearing inside the program?
         | 
         | Also doesn't the OS lie? I thought the memory wasn't really
         | physically assigned until first use.
        
           | syncsynchalt wrote:
           | In my experience dynamic allocation is banned in either (a)
           | small embedded environments or (b) high scrutiny environments
           | (soft realtime, safety critical, etc).
           | 
           | In both cases the project size is small enough, or the
           | scrutiny is high enough that the ad-hoc allocator doesn't
           | develop. The environment is also simple enough that the
           | memory cheats you're thinking of don't exist (or you can
           | squash them by touching all allocated memory up front).
        
       | at_a_remove wrote:
       | I have only ever dabbled in C, just to look at other people's
       | code and occasionally when I really needed speed, so I am at what
       | I would call a "Pretty Pathetic" level, able to recognize that I
       | am looking _at_ C.
       | 
       | However, I look at old books on C, and then I look at this list,
       | and I wonder if it would not have been helpful to, after
       | mentioning that a function was banned, suggest what the
       | replacement is, even as a comment.
        
       ___________________________________________________________________
       (page generated 2021-03-04 23:00 UTC)