[HN Gopher] Git's list of banned C functions ___________________________________________________________________ Git's list of banned C functions Author : muds Score : 320 points Date : 2021-03-04 20:33 UTC (2 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | moomin wrote: | They should probably add sscanf. | ed25519FUUU wrote: | First thing I looked for. It looks like it _was_ used here: | | https://github.com/git/git/blob/master/object-file.c#L1293 | | And currently used here (at least): | | https://github.com/git/git/blob/master/refs.c#L1235 | TheRealSteel wrote: | I'm an idiot, I read the headline and thought these were banned | from Git entirely. As in, you couldn't commit them to _any_ repo | using Git, at all. Thought that seemed a bit harsh. | | Turns out you just can't use them when you contribute code to the | Git project. That makes sense, and seems reasonable. | [deleted] | maxk42 wrote: | What would be helpful is an explanation of how each function ends | up being misused so people can learn from this. | petters wrote: | Git blame is helpful here. See e.g.https://github.com/git/git/c | ommit/1b11b64b815db62f93a04242e4... | jsmith45 wrote: | View the git history for the file. Each commit that adds | functions has a detailed explanation of what is wrong with the | functions. | zbendefy wrote: | Are there some details on whats wrong with these? | bvaldivielso wrote: | The commit messages that added them explain the reasoning | ufo wrote: | I wish they would have put that on comments instead of on the | commit messages. It's not the first time that I've seen this | particular list of banned functions being shared online and | every time it happens someone has to explain that the most | interesting info is hidden in the commit messages. | alexchamberlain wrote: | All the string functions have buffer overrun vulnerabilities if | not used carefully. I'm not sure about the time functions | though. | trilinearnz wrote: | Very much this. I frequently write small games in C, and the | number of times I have been bitten by baffling behaviour | because a string somewhere was copied into an array that was | too short, are many! Apart from that, I love the simplicity | of the language and the stdlib, and it's definitely my | preferred hobby programming environment. | | It would be good to know what the commonly-accepted | alternatives are. | edflsafoiewq wrote: | The time functions are either non-reentrant, or, for the _r | versions, have the same problem with buffer overruns. | | https://github.com/git/git/commit/1fbfdf556f2abc708183caca53. | .. | | https://github.com/git/git/commit/91aef030152d121f6b4bc3b933. | .. | [deleted] | csours wrote: | I'm pretty sure you could google each of these with the word | 'dangerous' | | For example: https://lgtm.com/rules/2154840805/ | whydoyoucare wrote: | I am so thankful git isn't forcefully including this header in | every C language project and that we have a choice when using | git! :-) | bvaldivielso wrote: | Ah this is a very good idea. I guess you still have to make sure | that all your translation units include this header, which isn't | completely foolproof. | | Static analysis would probably be more robust, but way more | involved. | radus wrote: | Best of both worlds: use static analysis to ensure the header | is included? | koenigdavidmj wrote: | gcc has a -include option, so this can be done once in the | Makefile and get the benefit everywhere (unless you're being | clever). | Athos_vk wrote: | I remember visual studio having an option to force include a | file, surely something like that would exist for other | toolchains | kccqzy wrote: | You don't need fancy static analysis. You can find out whether | the banned functions are called just by inspecting the compiled | object file. Add it to the build step and done. | EdSchouten wrote: | Funnily enough, strtok() is not listed :) | kgrimes2 wrote: | Can a C guru provide a TL;DR of why these are bad? | drfuchs wrote: | It would be nice if the error messages generated would suggest | replacement functions that they deem appropriate. I see that I'm | not supposed to use gmtime, localtime, ctime, ctime_r, asctime, | and asctime_r; but what do they think I _should_ use? | dev_tty01 wrote: | It would be even nicer if it redefined the call to a safe | version and then generated a warning message informing the | programmer of the substitution. | pjc50 wrote: | You can't do that because the semantics are different in most | cases. | [deleted] | cle wrote: | From the commit messages | | > The ctime_r() and asctime_r() functions are reentrant, but | have no check that the buffer we pass in is long enough (the | manpage says it "should have room for at least 26 bytes"). | Since this is such an easy-to-get-wrong interface, and since we | have the much safer strftime() as well as its more convenient | strbuf_addftime() wrapper, let's ban both of those. | | (https://github.com/git/git/commit/91aef030152d121f6b4bc3b933.. | .) | | > The traditional gmtime(), localtime(), ctime(), and asctime() | functions return pointers to shared storage. This means they're | not thread-safe, and they also run the risk of somebody holding | onto the result across multiple calls (where each call | invalidates the previous result). All callers should be using | their reentrant counterparts. | | (https://github.com/git/git/commit/1fbfdf556f2abc708183caca53.. | .) | tinus_hn wrote: | Strangely there is no mention of strtok which has a similar | issue. | drfuchs wrote: | Yes, but every hapless user shouldn't have to go searching | through a bunch of commit messages to find the suggested | replacement. Bad UX. | capableweb wrote: | The UX of using this list is not by manually searching | through the list and seeing the reason behind them. You | include the file together with the rest of your sources and | now you get compilation errors if you try to use them. | Can't think of a better UX for banned functions. | | Discovering why the thing is banned you only have to do | once, if you care. If you're just modifying something | quickly and minor in Git, you might not even care why. | grncdr wrote: | It seems pretty safe to assume a developer contributing C | code to _git itself_ would know how to use git blame (or | the GitHub interface for it). | orf wrote: | Why make it harder, and why make it impossible to update | if there are other suggested alternatives that are | available since whenever the commit was made? | masklinn wrote: | > Why make it harder | | Because there is no way for a commit message to become | outdated or detached from what it talks about, both of | which are very much issues with comments. | | > why make it impossible to update if there are other | suggested alternatives that are available since whenever | the commit was made? | | Because that doesn't really matter. | cma wrote: | > Because there is no way for a commit message to become | outdated or detached from what it talks about, both of | which are very much issues with comments. | | What if they think of another reason why one of the same | functions should be disabled? | underwater wrote: | Code is evergreen, whereas a git commit represents a | change at a single point in time. It will always be | limited by the knowledge the author had available to | them. | | The commit message from 2020 with suggested alternatives | might very well go stale. Does the author go and force a | noop commit so they can document new best practice in a | new commit message? | orf wrote: | > Because that doesn't really matter. | | Ok, so maybe rather than have this file we should run | "git log | grep BANNED" and build a list of functions | from that? Or maybe we could change all error messages to | be "go look at the commit history to work out why this | happened". | | No? Maybe putting context in source files (or better yet, | an error message!) rather than in a side channel like the | commit message has value when it comes to understanding | and updating, and it won't be lost under the weight of | future commits. | [deleted] | capableweb wrote: | Your source code should describe what the program should | do today. It should not contain all historical artifacts | about your source code, as it'll grow to big and | unmanageable then. Instead, use Git to store temporal | information, data that is about change and reasoning | behind it. Git is basically a timeline, instead of hard | facts of today. | | That's why it makes sense to describe the background and | reasoning behind a change in a Git commit, instead of | inside your source files as comments. | orf wrote: | Totally agree, which is why nobody is suggesting adding | the background and reasoning behind the change to the | source file as a comment. | | They are suggesting adding a more informative error, | which may include a subset of that background and | reasoning. An error message that points you to the | functions you should use instead is infinitely more | informative than one that says "this is banned. Bye." | jorl17 wrote: | I find it highly backwards that documentation on "what to | use instead of X" is in the commit message disabling X. | One _might_ do it and might remember to do it, but IMO it | makes absolutely no sense for this not to be documented | properly in code, as suggested by OP. | | By that logic, a non-insignificant amount of (good) | comments in code could be removed and people asked to | "git blame the code and check out the commit that made it | for the documentation". Of course this could be done, but | it sounds ridiculous even typing it out. | blitz_skull wrote: | I disagree. Commits messages exist for the very purpose | of adding context to your code base. If you added | <complex_function> for something that needs context, sure | MAYBE add a comment, but I really pray that I'm going to | find a few paragraphs disambiguating the problem within a | git commit. If I'm _really_ lucky, maybe I find a PR | number or Jira ticket reference as well. | | If you're truly clueless as to what could be substituted | for these commands, then you don't understand why they're | banned. So our first step? Figure out why they're banned. | And how would we sanely approach this? Probably by | checking the commit message for _why that code is there | in the first place_. That's a very safe, sane, and not- | at-all backwards assumption. After you understand why | it's there, a quick google search might help out if the | commit message didn't already include information on | alternatives. | | Lastly, yeah, I totally agree a large amount of GOOD | comments should be relegated to the git commits if all | they're doing is adding additional context around a | complex piece of logic. Comments do not exist to edifying | a code base in any way other than context. They're too | easy to let become stale, whereas a git commit will | always reference exactly the code you're blaming. | | So, I have to really disagree that it's ridiculous or in | any way absurd. In fact, I think a lot of code suffers | from NOT using git as a way to extend context around a | code base. It's SUPER easy with most development | environments to select a block of text and blame it. It's | so easy that it's almost always my go-to to increase my | context of what's been happening around a particular part | of the code base. | barnaclejive wrote: | So, you are tied to Git for eternity to preserve | documentation? | | Might work in practice for a long time, but Git is a | version control system, not a documentation system. | dpedu wrote: | For developer documentation - yes, absolutely! | capableweb wrote: | > By that logic, a non-insignificant amount of (good) | comments in code could be removed and people asked to | "git blame the code and check out the commit that made it | for the documentation". Of course this could be done, but | it sounds ridiculous even typing it out. | | Yes, exactly. You want to understand how a codebase | changed and evolved over time? Git is your friend. If you | want the facts of the code today? The source code is your | friend. That's why the way Linux and Gits Git repository | method of storing history makes sense. See also | https://news.ycombinator.com/item?id=26348965 | | Try navigating the Git codebase with a git-blame sidebar | (probably VS Code has that somewhere) so you can see the | history of the source files. If you wonder why something | is what it is, you can checkout the commit that last | modified it. Or go even further backwards and figure out | in the context it was first added. If you truly want to | understand a change, a git repository with well written | git messages is a pleasure to understand and dig into. | [deleted] | chris_wot wrote: | The commits actually do give that info. Take for instance this | commit: | | https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e... | | It actually gives examples and a lengthy explanation and | reasoning behind the ban. | xorcist wrote: | Now _that 's_ what a good commit message looks like! | cesarb wrote: | Commit messages like that are common in the Linux kernel | project, which is where git came from (though this | particular commit message is a bit on the longer side). | | It makes more sense if you think of it as an email message | justifying why the project maintainer should accept that | change, because that's what they were before git even | existed. Still today, unless you're one of the Linux kernel | subsystem maintainers, you have to convert your changes to | emails with git-format-patch/git-send-email and send them | to the right mailing list. Even the Linux kernel subsystem | maintainers keep writing commits in that style out of habit | (and because Linus will rant at them if they don't). | mamon wrote: | But why put that info in commit message instead of a comment | in the file itself? | chris_wot wrote: | Because comments can be tedious and get out of sync with | the repo. Why not check the git history? I wish more repos | could be like this! | adrianmonk wrote: | > _Why not check the git history?_ | | Because that is effort every person who uses the file has | to do over and over again, whereas maintaining the file | is effort that has to be done once by one person. | skeletal88 wrote: | Someone here commented to use git blame to find the | commit that banned the functions and read the commits. | These people making the suggestions.. must hate other | people and their time. Also, what if someone.. for | example runs a code formatter on the file, making git | blame useless? Is it really so difficult to make a manual | or explain properly in the comments about what | replacements to use? | chris_wot wrote: | It sounds like you want a manual. Personal preference I | guess. The maintainers seem to have decided to keep it in | the history. It's not like this was ever meant for | anything other than git itself. | aendruk wrote: | I really wish tooling like this was more common: | | https://github.com/eamodio/vscode- | gitlens/tree/v11.2.1#curre... (screenshot) | | > Current Line Blame: Adds an unobtrusive, customizable, | and themable, blame annotation at the end of the current | line | colordrops wrote: | Or even in the compile error message itself. | [deleted] | colordrops wrote: | Also, _why_ the functions are banned. | lerax wrote: | Yes, this is right. Any C decent programmer knows that functions | are cursed. | Animats wrote: | About 20 years too late. Those should have been moved to a | "deprecated" header file decades ago. | xvilka wrote: | I hope, one day to see it's rewritten in a safer language. | qbasic_forever wrote: | There's a nice Go implementation of git: https://github.com/go- | git/go-git | sys_64738 wrote: | scanf? | abetusk wrote: | The Git Mailing List Archive on lore.kernel.org (found in the | README from the git mirror on GitHub) has more context [0] [1] | [2]. From Jeff King on 2018-07-24: The strncpy() | function is less horrible than strcpy(), but is still | pretty easy to misuse because of its funny termination | semantics. Namely, that if it truncates it omits the NUL | terminator, and you must remember to add it yourself. Even | if you use it correctly, it's sometimes hard for a reader | to verify this without hunting through the code. If you're | thinking about using it, consider instead: - | strlcpy() if you really just need a truncated but NUL- | terminated string (we provide a compat version, so it's | always available) - xsnprintf() if you're sure that | what you're copying should fit - strbuf or | xstrfmt() if you need to handle arbitrary-length heap- | allocated strings | | I just did a search on the keywords 'banned' and 'strncpy' [2] | | [0] | https://lore.kernel.org/git/20180724092828.GD3288@sigill.int... | | [1] | https://lore.kernel.org/git/20190103044941.GA20047@sigill.in... | | [2] | https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or... | | [3] https://lore.kernel.org/git/?q=banned+strncpy | js2 wrote: | Psst: | | https://github.com/git/git/commits/master/banned.h | | (Git development is done by emailing patches. Those patches | include the git commit message, which we can see just by | looking at the history of the file. Sometimes there's | additional discussion on the ML, but the most important details | are in the commit message because the git development team is | very disciplined about that.) | captainmuon wrote: | It would be interesting to see the rationale behind these bans, | and what the suggested alternatives are. Some are obvious, like | `strcpy`, but I can't remember what the problem with `sprintf` or | the time functions are. | | If you are doing something like `sprintf(buffer, "%f, %f", a, | b)`, yes it is tricky to choose the size of buffer frugally, but | if you replace that by `ftoa` and constructing the string by | hand, you are likely to introduce more bugs. | | Edit: as pointed out in another post, you can do git blame to see | the rationale for each ban, quite interesing. | monocasa wrote: | snprintf will always terminate the string, and won't overflow | the buffer. | Aanok wrote: | The trouble with printf-family functions is their variadic | nature. If the arguments don't match the format string, you can | wreak all sorts of havoc. | | A fun exercise you can do is put a "%s" in the format string, | omit the string argument and see what happens to the stack. | anyfoo wrote: | That's however relatively easy to verify programmatically, | and indeed any recent compiler will complain about that. | | I'd say the usual trap is rather the size of the target | buffer, because that requires bigger static analysis guns. | (I'm ignoring things like "%n", because then you're playing | with fire already.) | Gibbon1 wrote: | I think the big three C compilers have pragma's that you | can tag printf/scanf with that will cause the compiler to | verify the argument list. | danaliv wrote: | There's that, but with sprintf/vsprintf specifically, there's | no way to keep it from storing characters past the end of | your buffer. For example: char buf[2]; | sprintf(buf, "%d", n); | | This will happily write to buf[2] and beyond if n is negative | or greater than 9. | SloopJon wrote: | sprintf() warnings have gotten pretty sophisticated these days. | I discovered GCC's -Wformat-overflow the other day. It | complained that the buffer for a date string wasn't big enough; | e.g., sprintf(buf, "%04d-%02u-%02u", year, month, day), where | year, month, and day are 16-bit shorts, and buf was probably | eleven or twelve bytes. | | It may actually be a bug that I got the warning, because the | range of each input was checked, and I think the compiler is | supposed to be smart enough to remember that. | dahfizz wrote: | This was my reaction as well. Banning strncpy just encourages | haphazard manual copying. | smasher164 wrote: | From the commit message: | | If you're thinking about using it, consider instead: | - strlcpy() if you really just need a truncated but | NUL-terminated string (we provide a compat version, so | it's always available) - xsnprintf() if you're | sure that what you're copying should fit - | strbuf or xstrfmt() if you need to handle arbitrary- | length heap-allocated strings | nwmcsween wrote: | strlcpy is safer but effectively running strlen(src) every | call is a good wtf | azurezyq wrote: | maybe this https://github.com/git/git/blob/master/strbuf.h ? | ben_bai wrote: | strlcpy is the safe way, that is used by git. | [deleted] | syncsynchalt wrote: | strncpy doesn't do what you think it does (it is not | analogous to strncat). strncpy does not terminate strings on | overflow. In C terms, it is not actually a string function | and shouldn't be named with `str`. | | snprintf or nul-plus-strncat do what you want, but snprintf | has portability problems on overflow. Most projects I've been | on rely on strlcpy (with a polyfill implementation where not | available). | asdfasgasdgasdg wrote: | I think you're meant to use snprintf instead. It would be | great to see documentation on the alternatives! | sys_64738 wrote: | getc? | ape4 wrote: | Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX) | | /joke | fatnoah wrote: | I'm pretty sure I've seen similar logic in my life. | attractivechaos wrote: | I wonder how they copy strings with strcpy and strncpy both | banned. strlcpy? But it is not conforming to major standards. Or | just memcpy with extra code? | dgentile wrote: | Edited: Looks like they have safe alternatives: " | - strlcpy() if you really just need a truncated but | NUL-terminated string (we provide a compat version, so | it's always available) - xsnprintf() if you're sure | that what you're copying should fit - strbuf | or xstrfmt() if you need to handle arbitrary-length | heap-allocated strings " | lights0123 wrote: | https://github.com/git/git/commit/e488b7aba743d23b830d239dcc... | Yes: | | > we provide a compat version, so it's always available | [deleted] | attractivechaos wrote: | This gets me interested. Link [1] below shows their | implementation of strlcpy(). This is a questionable | implementation. With strncpy, the source string "src" may not | be NULL terminated IIRC. The git implementation requires | "src" to be NULL terminated. If not, an invalid read. EDIT: | according to the strlcpy manpage [2], "src" is required to be | NULL terminated, so strlcpy imposes more restrictions and is | not a proper replacement of strncpy. | | Furthermore, imagine "src" has 1Mb characters but we only | want to copy the first 3 chars. The git implementation would | traverse the entire 1Mb to find the length first, but a | proper implementation only needs to look at the first 3 | chars. So, they banned strncpy and provided a worse solution | to that. | | [1]: https://github.com/git/git/blob/master/compat/strlcpy.c | | [2]: https://linux.die.net/man/3/strlcpy | alcover wrote: | Agreed. It's O(n) inefficient. I guess looping though chars | up to `size` would perform better on average. | | I see this `strlcpy` recommanded everywhere. | kzrdude wrote: | You have found the answer - strlcpy is not a replacement | for strncpy at all (it's arguably a safer version of | strcpy), and git people didn't invent this, it's the | existing BSD strlcpy interface. | attractivechaos wrote: | Thanks for the confirmation. But my concern remains: they | banned strncpy without a proper replacement. In addition, | I didn't know the extra restriction of strlcpy until | today (I have never used it before because it is not | conforming to C99/POSIX). I might have fallen into this | trap. | notaplumber wrote: | The problem is the actually often the opposite, in the | real world many treat strncpy as if it behaves like | strlcpy. Note that strlcpy is equivalent to: | snprintf(buf, sizeof(buf), "%s", string); | | strlcpy is on track for future standardization in POSIX, | for Issue 8, but even as a de facto standard, it exists | in libc on *BSD, macOS, Android, Solaris, QNX, and even | Linux using musl. | | https://www.austingroupbugs.net/view.php?id=986#c5050 | | But you're correct in that it is not a replacement for | strncpy because no code should be using strncpy. | tedunangst wrote: | Take a step back and consider strlcpy isn't supposed to be | a drop in replacement for strncpy (a function which already | exists). | [deleted] | jabl wrote: | memccpy? Most platforms have it, and it's being added to C2X. | | See https://developers.redhat.com/blog/2019/08/12/efficient- | stri... | paultopia wrote: | Its really wild, as a person coming from other languages who has | written maybe ten lines of C in his life that the functions that | seem to be massive footguns in C are, like, "format a string" or | "get time in GMT." That's... really scary. | Communitivity wrote: | I remember an entire lecture about the use and abuse of sprintf | and related functions as a means of exploit. Yeah, when you | delve into the internals of C you find things that are | terrifying if you are concerned about reliability, security, or | performance. The same is true though for many languages. The | problem is, as is often the case, the Iron Triangle: good, | fast, cheap - pick two. Different sections of the language are | written by developers under different constraints and | pressures, which leads to different choices. In my experience | every language implementation has at least one area that was | done quickly for expediency or done poorly because no one else | was able to (or wanted to) work on it. | throwaway09223 wrote: | Many of C's problems relate to string handling. These are all | legacy functions which have been replaced with safe | alternatives many decades ago. | | strcpy() was replaced with a safer strncpy() and in turn has | been replaced with strlcpy(). | | The list is a ban of the less safe versions, where more modern | alternatives exist. | Kaze404 wrote: | Why are these functions deprecated in favor of others but not | removed? I know in Javascript this can happen so as to not | break older websites, but in a compiled language this | shouldn't be a problem right? | syncsynchalt wrote: | There are actually very few _dangerous_ functions in C | (gets is the only one that comes to mind). Others have | massive caveats (strncpy) but still have their place. | Others are just known to have certain gotchas (strcpy, | strcat, sprintf). | | The reality of C is that if we deprecated every | objectionable function in the stdlib we wouldn't have | anything left. | maxlybbert wrote: | The C Standard Committee doesn't actually ship a compiler | the way the people behind Java, Python, Lua, C#, Go, Rust, | etc. do. The best they can do is deprecate particular | functions and hope compiler writers and standard library | writers follow along. But the compiler writers have vocal | customers who insist the depreciations are overly-cautious. | sudomakeup wrote: | Why wouldn't it be an issue with a compiled language? | | Its nearly the exact same reasoning as "we're not going to | break older websites" | lalaithion wrote: | The expectation of a C89 programmer is that a valid C89 | program can be compiled for any machine that has a C89 | compiler, and likewise for C95, C99, C11, and C17. | Furthermore, it's expected that any C89 program can be | compiled unchanged on any future version of C, and the | standard library is part of the definition of the language, | and therefore functions cannot be removed. | DaiPlusPlus wrote: | At a certain point we have to say that _it's wrong_ for | someone to expect C89 should still be the LCD. | | And yes: it should all still compile, but none of that | prohibits the compiler from issuing flashing red/yellow | warning messages to your terminal for using footgun | functions, preferably with uncomfortable audible | notifications too. | | All of this is silly though, because even in a strict C89 | environment you can still have your own safe wrappers | over the unsafe functions. I find that very little of | modern programming has a hard dependency on ultramodern | compiler features (e.g. you can theoretically build | React/Redux using only ES3 (1998ish) if you like. | Generics using type-erasure can be implemented with | macros. Etc.). | | Also, C89 conformance doesn't mean much: you can have a | confirming C89 system that doesn't even have a heap - nor | a stack for autos! (IBM Z/series uses a linked-list for | call-frames, crazy stuff!) | pjc50 wrote: | In a compiled language, when you remove a function it fails | to compile. So removing them from the standard library | _forces_ code changes - they 're not usually drop in | replacements because the semantics were wrong in the first | place. | | Removing strcpy would make the Python transition look easy. | badsectoracula wrote: | Removing anything breaks existing source code that has been | tested to work. After all just because something _may_ lead | to issues it doesn 't mean it will _always_ lead to issues. | | Also in many systems the C library is linked dynamically | and shared among all programs so even though a program is | compiled it still relies on the underlying system to | provide the function. | | Finally i'm certain that if a C standard removes something, | it'll be treated as the equivalent to that standard not | existing. C programmers are already a conservative bunch | without such changes. | gvx wrote: | It's not great if you're working on a new release and you | realize you also need to change something unrelated because | the language changed under you, especially if it's just a | bugfix but a high-priority one, or consider the head-aches | caused by source-only distributions suddenly breaking for | all your new users (or existing users switching to a new | computer or spinning up a fresh VM). | ChrisLomont wrote: | These still lead to lots of bugs via off by one errors on | lengths or other buffer misuse. | cestith wrote: | Still, unless you're writing something that has to be very | low-level all the way through, it's better to use a string- | handling library than the stdlib tools for strings. | stefan_ wrote: | The first thing you do is _not use any strings_. You 'll be | amazed how much you can get done in languages that aren't | so obsessively centered around stringified programming. | cestith wrote: | Most of the code I write has a spec of input and output | being some form of text. Still, I tend to write that in | languages that have safe string handling and drop into C | only when the profiler indicates that's useful. | | When handling strings in C, it's useful to use the string | functions from glib or pull in one of the specifically | safe string handling libraries and not use any C stdlib | functions for strings at all. | | There are a number of C strings libraries safer to use | than the standard library, and many of them are simpler, | more feature-rich, or both. | | * https://github.com/intel/safestringlib (MIT licensed) * | https://github.com/rurban/safeclib (MITish) * | https://github.com/mpedrero/safeString (MIT licensed) * | https://github.com/antirez/sds (BSD 2-clause, and gives | you dynamic strings) * https://github.com/maxim2266/str | (BSD 3-clause) * https://github.com/xyproto/egcc (GPL | 2.0, includes GC on strings) * | https://github.com/composer927/stringstruct (GPL 3.0) * | https://github.com/c-factory/strings (MIT licensed) * | https://github.com/cavaliercoder/c-stringbuilder (MIT | licensed, does dynamic) | | If one does use the C standard library directly for | handling strings, the advisories from CERT, NASA, Github, | and others should be welcome advice (CERT's advice, BTW, | includes recommending a safer strings library right off). | derefr wrote: | Yes, sure, write Unix CLI plumbing tools without strings. | pjc50 wrote: | Until you want to communicate with the user, filesystem, | or web. | Animats wrote: | It was a design decision of QNX that the kernel never | uses strings. Everything the kernel handles is fixed | length, except messages, and messages go from one user | process to another. The kernel does not allocate space | for them. I think they go that right. | | There's a QNX user process that's always present, called | "proc", which handles pathnames and the "resource | managers", programs which respond to path names. But | that's in user space, and has all the tools of a user- | space program. | cestith wrote: | There are absolutely things that can be written without | string handling. Then again, there are things that can't. | Not handling strings in the kernel probably was a good | decision. That userland I'll bet has string handling | though, to be useful to users. | _kst_ wrote: | strncpy() is not a "safer" strcpy(). It can avoid some errors | involving writing past the end of the target array ( _if_ you | tell it the correct length for that array), but it 's not a | true string function, and it can leave the target | unterminated and therefore not a valid string. | | http://the-flat-trantor-society.blogspot.com/2012/03/no- | strn... | rrauenza wrote: | I never could really understand the point of strncpy()... | we always end up wrapping to deal with writing an | unterminated string. | | Was it intended for fixed length records? | [deleted] | tedunangst wrote: | It is for fixed length records, which is why it also | zeroes the remaining space. | ironmagma wrote: | Arguably naming it with "str" is itself a security | vulnerability. | tedunangst wrote: | No argument. At best it is a "string to fixed record" | function, hence the name, but it is not a string | function. | Someone wrote: | Yes. _strncpy_ was intended for copying file names into a | buffer that was only zero terminated when the name was | shorter than the maximum length of a file name in Unix | (14 bytes. See https://stackoverflow.com/a/1454071, https | ://devblogs.microsoft.com/oldnewthing/20050107-00/?p=36.. | .) | | You can also use it to overwrite part of an existing | string, but I think that's a side effect of the above. | throwaway09223 wrote: | In the interest of satisfying pedantry I think we can agree | that strncpy() is _intended_ to be a safer strcpy(). | | As you say, it does in fact obviate some errors. A value | judgement as to which errors are more or less safe may be | subjective, but the intent is not. | icedchai wrote: | This is true, and many people don't realize it. I used to | call a wrapper function that would always set the last byte | to 0. | draw_down wrote: | Now ponder how many people find that state of affairs | acceptable but also think JS is a terrible garbage language | that idiots like. | kazinator wrote: | gmtime is just not thread-safe that's all, since it returns a | static structure; gmtime_r is not banned. | syncsynchalt wrote: | Thanks, I am now a decade out of the C game and I was | wracking my brain on what the problem with gmtime would be. | My best guess was dodgy is_dst portability /shrug | cperciva wrote: | A better way of looking at it is that functions which expose | very simple operations were among the first ones to be placed | into the standard library -- and consequentially are the least | well thought out. | jchw wrote: | Unfortunately, much of the pain with C surrounds dealing with | strings. It's been a bit of a theme on Hacker News for the past | few days, but it's actually a pretty good spotlight on | something I feel is not always appreciated - strings in C are | actually hard, and even the most safe standard functions like | strlcpy and strlcat are still only good if truncation is a safe | option in a given circumstance (it isn't always.) | | (~~Technically~~ Optionally, C11 has strcpy_s and strcat_s | which fail explicitly on truncation. So if C11 is acceptable | for you, that might be the a reasonable option, provided you | always handle the failure case. Apparently, though, it is not | usually implemented outside of Microsoft CRT.) | | edit: Updated notes regarding C11. | masklinn wrote: | > Technically C11 has strcpy_s and strcat_s | | "Theoretically" is the word you're looking for: they're part | of the _optional_ Annex K so technically you can 't rely on | them being available in a portable program. | | And they're basically not implemented by anyone but microsoft | (which created them and lobbied for their inclusion). | jchw wrote: | I didn't know that it was Microsoft that lobbied for them; | that perplexes me since I thought Microsoft's version of | them were a bit different (for example, I think C11's | explicitly fail on overlapping inputs where Microsoft | specifies undefined behavior) and because Microsoft didn't | bother supporting C99 for the longest time. (Probably still | don't, since VLA was not optional in C99, IIRC. I think | Microsoft was right to avoid VLA, though.) | InvOfSmallC wrote: | I teach at university as external lecturer. Teaching strings | in C is the hardest thing I have to do every time. The | university decided to explain C to first year student without | previous experience. My feedback was to do a precourse in | Python to let them relax a bit with programming as a concept | and then teach C in a second course. | kazinator wrote: | > _I teach at university as external lecturer. Teaching | strings in C is the hardest thing I have to do every time._ | | But if you keep up the good work you will one day go from | extern void *lecturer; | | to static const lecturer; | ritmatter wrote: | +1, my university's program seemed to work well with | "program anything" (Python), "program with objects" (Java), | "program some cool lower-level stuff" (C) | gravypod wrote: | Sorry to bug you since this is unrelated. I'm a huge fan of | teaching others and I was wondering how you got to be an | external lecturer at a college? I'd love to teach classes | related to software engineering and data structures. Would | you mind emailing me (in my profile) about this? | _the_inflator wrote: | Yep, agree. I used a lot of assembler on C64 and Amiga | until I touched so called high level programming languages | for the first time. For me thinking in strings was really a | weird concept. | | Nowadays I find it extremely strange to think of bits and | bytes when being confronted with strings. | austinl wrote: | Most of the C I wrote was while in college. I think | understanding the question, "why are strings in C hard?" is | a good gateway to understanding how programming languages | and memory work generally. I agree with you though that | teaching C as introductory is probably not the best -- our | "Programming in C" course was taken in sophomore year. | | I wouldn't want to use it my day job, but I'm glad that it | was taught in university just to give the impression that | string manipulation is not quite as straightforward as it's | made to appear in other languages. | | The early days of Swift also reminded me of this problem - | strings get even more challenging when you begin to deal | with unicode characters, etc. | orwin wrote: | In my school, we had two days to understand the basics of | text editors, git (add, commit, rebase, reset, push) and | basic bash functions (ls, cd, cp, mv, diff and patch, find, | grep...) + pipes, then a day to understand how while, | if/else and function calls work, then a day to understand | how pointer work, then a day to understand how malloc(), | free() and string works (we had to remake strlen, strcpy, | and protect them). Two days, over the weekend, to do a | small project to validate this. | | Then on the monday, it was makefiles if i remember | correctly, then open(), read(), close() and write(). Then | linking (and new libc functions, like strcat) . A day to | consolidate everything, including bash and git (a new small | project every hour for 24 hours, you could of course wait | until the end of the day to execute each of them). And then | some recursivity and the 8 queen problem. Then a small | weekend project, a sudoku solver (the hard part was to work | with people you never met before tbh). | | The 3rd week was more of the same: basic struct/enums | exercises, then linked list the next day, maybe static and | other keyword in-between. I used the Btree day to | understand how linked list worked (and understand how did | pointer incrementation and casting really work), and i | don't remember the last day (i was probably still on linked | lists). Then a big, 5-day project, and either you're in, or | you're out. | | I assure you, strings were not the hardest part. Not having | any leaks was. | PoignardAzur wrote: | Ooh, the Epitech cursus. Nice. | | Also, I'd say "not having segfaults" is the hardest thing | to get right when you're going through that. | liuliu wrote: | Yeah. I just avoid str manipulations in general in C and when | I have to, fuzz it ... (but still, the perf cliff is | definitely new to learn in the past few days). | swlkr wrote: | I'm partial to https://github.com/antirez/sds these days | macjohnmcc wrote: | strcpy is a coding challenge where I work for interviews. I | typically ask them to write it as the standard version and | ask them why they might not want to use it to see if they are | aware of the risks. After that I ask them to modify the code | to be buffer safe. And for those claiming C++ knowledge ask | them to make it work for wchar_t as well to see if they can | write a template. Some people really struggle with this. | IgorPartola wrote: | This is a lot like how in JavaScript you have footguns like the | with statement or in Python 2 where you have Unicode issues, | etc. I am sure we could definitely a new C standard that | excludes these functions as obsolete, but the linked header | file is a pretty sensible interim solution. C is an old | language and it's kind of amazing that code written 30 years | ago can still by and large be compiled by a modern compiler. | Ever try to run 3 year old React projects using today's React? | :) | detaro wrote: | Because individual libraries choosing to change quickly is | comparable to language stability how? The relevant comparison | would be "run a 3y old react app (or a 20 year old website | using JS) in a modern browser or interpreter" | _the_inflator wrote: | Yes, and it would still run fine I guess. I think only | eval() changed over time. APIs and so on are still the same | except for some Netscape stuff. | ggregoire wrote: | > in JavaScript you have footguns like the with statement | | I've been coding in JS on a daily basis for more than 10 | years and today I learned there is a `with` statement in JS. | | https://developer.mozilla.org/en- | US/docs/Web/JavaScript/Refe... | | Edit: well, seems like it's been deprecated/forbidden since | ES5 (2009), so it makes sense I've never seen it. | GordonS wrote: | And me around 20 years - also never even heard of the | `with` statement! I think to qualify as a footgun, people | actually need to be using it in the real world. | viklove wrote: | It amuses me that HN hates JS so much, that even a topic | about problems with C turns into a JS-bashing thread. | | Also, I just want to remind you that JS isn't just React. | There are plenty of libraries written in C that introduce | breaking changes over the course of 3 years. Nothing will | stop people from finding ways to complain about JS though, I | know. The hate-boner is very real. | sadgrip wrote: | I think in most cases it's probably not hate but a deep, | deep love. | jrimbault wrote: | JavaScript, LISP under C disguise. No wonder it's | "popular" on HN. | | Assorted musing : Rust, OCaml under C disguise. | orwin wrote: | I think most people on HN like Javascript, or at least its | idea? I mean, its a very C-like functionnal language, | especially since ES6 put Js on the right road (for me at | least)? | lliamander wrote: | I appreciate Javascript's LISPy qualities, but it has an | inordinate number of footguns and a relative lack of | standard, stable libraries. Coming from languages like Java | and Erlang that are relatively scrupulous about such things | is a bit jarring. | | I do like Typescript though, as it adds some really nice | ergonomics. | matheusmoreira wrote: | Yeah, because of NUL-terminated strings. They cause so many | problems it's not even funny. Even something simple like | computing the length of the string is a linear time operation | that risks overflowing the buffer. People attempted to fix | these problems by creating variations of those functions with | added length parameters, thereby negating nearly all benefits | of NUL-terminated strings. | | Why can't we just have some nice structures instead? | struct memory { size_t size; unsigned char | *address; }; enum text_encoding { | TEXT_ENCODING_UTF8, /* ... */ }; struct text { | enum text_encoding encoding; struct memory bytes; | }; | | All I/O functions should use structures like these. This alone | would probably prevent an incredible amount of problems. Every | high-level language implements strings like this under the | hood. Only reason C can't do it is the enormous amount of | legacy code already in existence... | guerrilla wrote: | That would be nice. You hit on the other hell with C strings: | modern encodings where wchar_t and mb* are useless and | replacements essentially don't exist yet with char8_t, | char32_t etc. Then there's the locale chaotic nonsense [1]. A | new libc starting fresh would be nice. | | 1. https://github.com/mpv- | player/mpv/commit/1e70e82baa9193f6f02... | Camillo wrote: | Many of the problems with C descend from a common root, the | decision to use bare pointers (memory addresses) as the basic | way to refer to strings, arrays etc. | | If they had used a {pointer, size} pair instead, it would have | avoided all of these string problems, most buffer overflows, | even the GTA Online loading problem that was on HN recently. | cb321 wrote: | For what it's worth, while what @Camillo says is both true | and important, people usually do not mention the trade offs | involved or why that decision was attractive at the time. | | These days (ptr,size) is probably 16 bytes -- longer than | almost all words in the English language (the scrabble | SOWPODS maxes out at 15). A pointer alone is 8B. Back at the | dawn of C in 1970, memory was 6..7 orders of magnitude more | expensive than today..maybe more inflation adjusted. (Today, | cache memory can be almost as precious, but I agree that the | benefits of bounded buffers probably outweigh their costs.) | | 8B pointers today are considered memory-costly enough "in the | large" that even with dozens of GiB machines common, Intel | introduced an x32 mode to go back to 32-bit addressing aka 4B | pointers. [1] There are obviously more pointers than just | char* in most programs, but even so. | | Anyway, trade offs are just something people should bear in | mind when opining on the "how it should be"s and "What kind | of wacky drugs were the designers of language XYZ on?!!?". | | [1] https://stackoverflow.com/questions/9233306/32-bit- | pointers-... | Animats wrote: | Pascal, which had sized strings, was in wide use before C. | Many people, including Bill Atkinson, who wrote many of the | original Macintosh applications, thought C was a step | backwards. | | Pascal, to save one byte, limited strings to length 255. Bad | decision. | [deleted] | SavantIdiot wrote: | If you list the languages you use, I'd be happy to point out | the "footguns" in each of them. For all the warts on C, there | really is no language that can compete for what it has | accomplished over ~50 years. | | Recall that during the rise of C, people were writing machine | code on punch cards. Assembly -> Machine code has far more | footbullets than C, it is a tradeoff between hand holding and | tiny fast code. | | Wow, this blew up. | | To all the people popping off about how great other languages | are, tell me: when will we see the Unreal Engine written in | Python, or Pascal, or Algol, or Rust, or Go... the next big | step is WebASM (or .cu), and that's way more footbullet-y than | C. And what is the native language all of your sub-30 year old | interpreted languages were written in? Thank you! | eschaton wrote: | This is a grossly inaccurate description of computing at the | time of the rise of C. C was competing with Pascal/Modula, | BLISS, PL/I, BCPL, and so on, not assembly on punched cards. | | The "C competing with assembly" meme was very specific to | _microcomputer_ game and operating system development, not | more general microcomputer application development, and not | to minicomputer or mainframe development. | JoeAltmaier wrote: | Mainframes very quickly were outclassed by minicomputers. | They could not respond quickly to technology changes as | fast. C was indeed king for decades. | rodgerd wrote: | There's far more critical code in the world running on COBOL | and s3[79]0 assembler. COBOL is vastly more important than C. | varjag wrote: | Nearly everything around you runs code that was written in | C, and absolutely nothing you can actually see runs COBOL | code. | mkipper wrote: | _citation needed_ | | I'm sure there's a lot of important things that rely on | COBOL, but by most definitions of "critical", I think this | is way off the mark. | burnished wrote: | COBOL is still used in many banking systems such as ATMs. | These are 'critical' systems by most any definition of | the word 'critical'. | varjag wrote: | That's a hugely broad definition of critical, enough to | encompass most of business and finance software. | slt2021 wrote: | which language z/OS is written in? | cygx wrote: | _Recall that during the rise of C, people were writing | machine code on punch cards._ | | Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ... | samatman wrote: | Your edit really isn't helping your case. | | Those of us who have always known about less dangerous | 'system' languages (Pascal probably being the most popular) | lament the fact that so much code got written in C instead. | | It wasn't inevitable. It was preventable! It just didn't | happen that way for reasons which are largely historical. | | I don't work for the Rust Evangelism Strike Force, my main | project is written in (as little) C (as possible), but I beg | anyone who has a choice: use something else! Rust is... fine, | Zig is promising. Ada still works! | | Writing out the set {Python, Pascal, Algol, Rust, Go} tempts | me to say uncharitable things about your understanding of the | profession, but I accept you were just being snarky so I'll | just gesture in the direction of how $redacted that is. | Gibbon1 wrote: | My favorite assembly foot gun was a guy I worked with had a | cute routine. You had a call to the routine, followed by a | null terminated string after that. The routine would spit the | string to the terminal. And then return to the location after | the string. | | He had some bug where in one place it returned to the start | of the string, executed it, and kept going. The end result | just happened to be a nop. Had been like that in production | for a couple of years. | atoav wrote: | Yeah there are footguns in every language. But this is not a | boolean question about the presence of footguns, this is | about how much one has to know to be able to handle a | language safely. | | I know C/C#/Python/Rust/Javascript. | | After a decade of using C I am still not totally sure if I | didn't dangle a pointer somwhere in precisely the wrong way | to create havoc. And yeah, that means I have to get better, | etc. But that is not the point. The point is, that even with | a lot of experience in the language you can still easily | shoot yourself into the foot and don't even notice it. | | Meanwhile after a month of using Rust I felt confident that I | didn't shoot myself in the foot, because I know what the | compilers e.g. ownership guarantuees. While in C shooting | myself into the foot happen quite often in Rust I would have | to specifically find a way to shoot myself into the foot | without the compiler yelling at me, and quite frankly I | havent found such a way yet. | | Javascript is odd, because the typesystem has quite a few | footguns in it. This is why such things like Elm or | Typescript exist: to avoid these footguns. | | I don't want to take away from the accomplishments of C, and | I still like the language, but to claim it is equally likely | in all languages to shoot yourself into the foot is not true. | maerF0x0 wrote: | Not that I dont believe there are any, but I'd love to hear | your perspective... | | Go (golang) | SavantIdiot wrote: | Well, shit. Got me there. | boolemancer wrote: | defer having function scope instead of, well, scope scope. | | Using defer to unlock locks can lead to some fun deadlocks | if you don't realize the issue with the scope, and it's | completely unintuitive to someone with experience with | other implementations of similar concepts. | crimper wrote: | channel programming and the races caused by closing | channels. channels seem nice and easy until they don't. | | the whole var/:=/= assignment combined with the error | handling style and the shorthand is another one | maerF0x0 wrote: | yeah the lack of determinism in selecting a channel can | be tricky for causing bugs where order matters. Luckily | in smaller cases you're likely to encounter them as | flakey tests (eg 1/2 the time) select { | case <-ch1: case <-ch2: } | dpatterbee wrote: | Only close channels when trying to tell the receiver that | you're not sending more data. Otherwise let the garbage | collector deal with it. Channels seem easy until they | don't until they do again in my experience. | | Don't understand your second point. | badsectoracula wrote: | > when will we see the Unreal Engine written in... | | Why would a huge C++ (not C, btw) codebase with roots going | back to the 90s be rewritten in any other language? | | And in fact how is the language Unreal Engine written in | relevant to C having footguns? | ironmagma wrote: | Yeah, there is a culture of complacency in C probably owing to | the enormous historical baggage of legacy code that has to be | supported and the blurred line between stdlib and system call. | Spivak wrote: | I mean on Linux you're not encumbered by this because the | syscall api is stable but in practice most GNU/Linux distros | assume glibc. You can't correctly resolve a hostname on Linux | without farming out to glibc -- hell even the kernel punts to | userspace for dns names but you can technically ignore it if | you want. | | On BSDs and macOS you're always SOL because the syscall api | isn't stable and only the C wrappers are. | dangerbird2 wrote: | c standard library doesn't really relate directly to system | calls (at least in modern os'es). In particular, the stdio.h | functions are buffered by default, while their system call | analogues are not. For unixes, system call wrappers are | typically found in <unistd.h>, not the "official" c standard | library | freedomben wrote: | I disagree completely. Devs who use C are the least | complacent about security in my experience. The problems are | from previous eras before they knew about many of these | things. A ton of people in modern languages couldn't name a | single dangerous function, though they do exist in every | language. You'd be amazed at how many race condition vulns | result from TOCTOU errors just in authentication, or checking | for the existence of a file before opening it, etc. | | It's absolutely true that decades ago the C community was | complacent, but it's not true now. Source: I taught secure | coding in C/C++ in the 00s. | IgorPartola wrote: | What you said. Nobody is complacent. Anyone who thinks the | Linux or OpenBSD (etc.) kernel developers take the lazy way | out is talking about a thing they know little about. I do | think better languages than C exist and maybe could even be | used as a basis for new systems. But I have yet to see a | mature OS that's as secure and as performant as these. | Closest might be the chips I've seen that have an embedded | Java byte code interpreter. | ironmagma wrote: | I agree in principle but think these security-focused C | developers are focusing on the trees for the forest. | Every developer having the responsibility of cultivating | their own pet list of banned functions is, frankly, NOT | the way to achieve security. Those things need to be | enforced at the widest level possible (OS, or language) | to have the needed effect. | dangerbird2 wrote: | It's not really complacency: it's that the standard library | is intentionally minimalistic to maintain portability and | backwards compatibility. If you want sensible string | handling, it's usually best to use a high level utility | library like GLib(https://developer.gnome.org/glib/stable/) | or Apache Portable Runtime(http://apr.apache.org/), or roll | your own safe string type (preferably non-null terminating) | yxhuvud wrote: | No, if you want sensible string handling, the sane choice | is usually to choose to use a language that is not C. Not | always, but definitely usually. | IgorPartola wrote: | It's not hard to have strings like you do in other | languages in C. It is hard when you treat _char foo[]_ as | if it was a string object like you have in JavaScript or | Java or Python. C strings are just chunks of memory | terminated by \0. They can still be mildly useful that | way but if you actually want to do string operations you | need to use a library designed for the problem (variable | length, storing length with the object, Unicode support, | etc.). Problem is that most people don't start with such | a library so they end up doing the hard work themselves | in an ad hoc manner. | | You can't fuck up _String("Hello ") + String("world")_ | but you can definitely fuck up _strcat(buf, "Hello "); | strcat(buf, "world");_. | Ar-Curunir wrote: | there's nothing inherently unportable about strings though. | ironmagma wrote: | Why do you need backward compatibility with a compiled | language? Other languages like Rust and JavaScript (even) | avoid that with a pragma tag on the source. | hctaw wrote: | Because not everything is recompiled from source. That's | why stable ABIs need to exist. | ironmagma wrote: | Good point, thanks. Could the headers contain the | pragmas? | hctaw wrote: | That assumes you have a header, which only exists at | compile time for the developer. The running program knows | nothing about it. | ironmagma wrote: | Why would a program need to know (e.g.) the details of | what system calls or stdlib functions that a procedure it | invokes uses? Aren't C functions pretty well separated | from each other except for the odd signal handler and | assuming a stable ABI? In my view most of the issues with | C are semantics within the function blocks. | rightbyte wrote: | The parameters and return value is not in the object | files. | oleganza wrote: | Notice that it's a giant PITA to work with any variable-length | data. Because language lacks adequate means to abstract away | safe fast memory access with generic types, RAII and borrow | checkers. Comparing to C, both C++ and Rust (very different | beasts) feel like pals of JavaScript: basic operations with | dynamic strings and arrays just work(tm). | frob wrote: | As someone who learned C as their first language, strings in | every single language after that have felt like cheating. | | "What? You mean I can type an arbitrary string and it works? I | don't need to worry about terminators or the amount of memory | I've allocated? You can concatenate two strings with +?!? What | is this magic?" | macintux wrote: | Yeah, every time I decide to play with C for nostalgia's | sake, I immediately get hung up on just how painful | everything is, especially strings. | | I still love C, but I'd do my best not to have to write | anything serious with it again. | munchbunny wrote: | The decision to make C strings null terminated with implied | length instead of length + blob continues to trip us up, 30+ | years later. There's a good reason the "safe" versions of those | functions all take length parameters. But way back when this | approach was chosen, I don't think the state of the art could | fully predict this outcome. | | But also, "strings" and "time" are actually very complex | concepts, and these functions operate on often outdated | assumptions about those underlying abstractions. | jrimbault wrote: | 30+ years -> 50+ years | | Funny mind thing to forget to increment counters each year. | segf4ult wrote: | C89 was 32 years ago, so I think saying 30+ years is fair. | lamontcg wrote: | Some of us learned C off of the original K&R book. | coliveira wrote: | Null terminated strings are remnants of an era when computers | had little memory available. So, at the time it seemed smart | to discard the length field and use a single byte-sized | terminator (null). If you are writing an operating system for | a machine with little memory to spare, this seems like a good | decision. Of course things are very different now when memory | is not a problem and the goal is safety. | Blikkentrekker wrote: | > _But also, "strings" and "time" are actually very complex | concepts, and these functions operate on often outdated | assumptions about those underlying abstractions._ | | Even in safer languages such as _Rust_ , there are often | quaestions as to why certain string operations are either | impossible, or need to be quite complicated for a rather | simple operation and are then met with responses such as | "*Did you know that the length of a string can grow from a | capitalization operation depending on locale settings of | environment variables? | | _P.s._ : In fact, I would argue that strings are not | necessarily all that complicated, but simply that many assume | that they are simpler than they are, and that code that | handles them is thus written on such assumptions that the | length of a string remain the same after capitalization, or | that the result not be under influence of environment | variables. | munchbunny wrote: | > locale settings of environment variables | | Also known as "why does my code that parses floats fail in | Turkey?" | | Also also known as the discrepancy between a string's | length-as-in-bytes, its length-as-in-code-points, and its | length-as-in-how-humans-count-glyphs. | | Strings are hard. | kazinator wrote: | > _Why does my code that parses floats fail in Turkey_ | | Because you, or someone, called | fuck_my_program(); | | which is defined in "idiot.h" as #define | fuck_my_program() setlocale(LC_ALL, "") | | and the project is missing: #define | setlocale(x, y) BANNED(setlocale) | | Hope that helps! | retrac wrote: | For reasons that were never clearly articulated, the prefix | approach was considered odd, backwards, and to have numerous | downsides, at least where I learned C. In hindsight, I can | only cringe at that attitude. Strings as added in later | Pascal, about 40 years ago now, were memory safe in a way | that C strings still are not. | lordgroff wrote: | Oh Pascal, why couldn't we have had you instead. | kazinator wrote: | Pascal strings are not inherently memory safe: | cat_pascal_strings(pascalstr *uninited_memory, | pascalstr *left, pascalstr | *right); | | how big is uninited_memory? Can left and right fit into it? | | You need to design language constructs around Pascal srings | to make them actually safe. Such as, oh, make it impossible | to have an uninitialized such object. The object has o know | both its allocation size and the actual size of the string | stored in it. | | What is unsafe is constructing new objects in an anonymous | block of memory that knows nothing about its size. | | C programs run aground there not just with strings! | struct foo *ptr = malloc(sizeof ptr); // should be sizeof | *ptr!! if (ptr) { ptr->name = name; | ptr->frobosity = fr; | | Oops! The wrong size of allocated only the size of a | pointer: 4 or 8 bytes, typically nowadays, but the | structure is 48 bytes wide. | | "struct foo" itself isn't inferior to a Pascal RECORD; the | problem is coming from the wild and loose allocation side | of things. | | Working with strings in Pascal is relatively safe, but | painfully limiting. It's a dead end. You can't build | anything on top of it. Can you imagine trying to make a | run-time for a high level language in Pascal? You need to | be in the driver's seat regarding how strings work. | munchbunny wrote: | The prefix approach turns the neat "strings are just | character arrays are just pointers" pattern into something | a lot more clunky, because now you've got this really basic | data type that is actually a struct and now you have to | have an opinion on how wide the length value is and short | strings get a lot of memory overhead in just lengths, and | so on. | | In hindsight, I think the complexity is worth the safety, | but I could see why it felt more elegant to use null- | terminated strings at the time. | jdlshore wrote: | It's a classic case of moving the complexity from one | part of the system to another. "Strings are just | character arrays" seems simple and elegant, but in | reality is a giant mess, because strings are not just | character arrays, any more than dates are just an offset | from an epoch. | | Human concepts are inherently messy. "Elegant" solutions | just shove the mess down the road. | JoeAltmaier wrote: | Hey, languages used length,blob even when C was invented. | HP Access BASIC used that kind. | | It was a limitation, because they chose a byte length (to | save space). So strings up to 255 characters only. It was | decades before folks were comfortable with 32-bit length | fields. And that still limited you to 4GB strings. In the | bad old days, memory usage was king. | selfhoster11 wrote: | The funny thing is that you can just use the topmost bit | of the length to indicate that the string length is >127, | and chain as many length bytes as you want before you | begin the string proper (to save space). It would be | still a better encoding than a null at the end. | [deleted] | kazinator wrote: | The reason that the safe functions take length parameters is | that they produce a new object in uninitialized memory, a | pointer to which is specified by the caller. | | It has nothing to do with null termination. | | And _that_ uninitialized memory is not self-describing in any | way in the C language. Which is that way in machine language | also. | | This is a problem you have to bootstrap yourself somehow if | you are to have any higher level language. | | The machine just gives you a way to carve out blocks of | memory that don't know their own type or size. C doesn't | improve on that, but it is not the root cause of the | situation. Without C, you still have to somehow go from that | chaos to order. | | Copying two null terminated strings _into an existing null- | terminated string_ can be perfectly safe without any size | parameters. void replace_str(char *dest_str, | const char *src_left, const char *src_right); | | If dest_str is a string of 17 characters, we know we have 18 | bytes in which to catenate src_left and src_right. | | This is not very useful though. | | Now what might be a bit more useful would be if dest_str had | two sizes: the length of string currently stored in it, and | the size of the underlying storage. This particular operation | would ignore the former, and use the latter. It could replace | a string of three characters with a 27 character one. | amir734jj wrote: | Maybe instead of just writing a banned message, it should be the | name of alternative function to use. | [deleted] | 1337_d00dZ wrote: | In compilers that implement GCC extensions (such as Clang), you | can use the "poison" directive to achieve the same effect (but | with a better error message): | | #pragma GCC poison printf sprintf fprintf | | [0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html | shadowgovt wrote: | To its credit, it's convenient that the C pre-processor is so | powerful that it facilitates baking a "C the good parts" concept | directly into the compilation process. | rcgorton wrote: | But it isn't even April 1 yet! This is truly a BAAAD joke. So GIT | is not implemented in C? Or C++? | snvsn wrote: | Previous discussion: | https://news.ycombinator.com/item?id=20792938 | StillBored wrote: | These functions are one of the many reasons why I tend to have a | C with some C++ classes dialect I use in my own projects. | | std::string needs some tweaks, but it can mostly be treated as a | built in and it wipes out a huge set of C string issues. | jancsika wrote: | I love seeing "strncpy" right after "strcpy." | | If someone wants some fun, try this: | | 1. Slurp up all the FOSS projects that extend back to 90s or | early 2000s. | | 2. Filter by starting at earliest snapshot and finding | occurrences of strcpy and friends who don't have the "n" in the | middle. | | 3. For those occurrences, see which ones were "fixed" by changing | them to strncpy and friends in a later commit somewhere. | | 4. See if you can isolate that part of the code that has the | strncpy/etc. and run gcc on it. Gcc-- for certain cases (string | literals, I think)-- can report a warning if "n" has been set to | a value that could cause an overflow. | | I'm going to speculate that there was a period where C | programmers were furiously committing a large number of errors to | their codebases because the "n" stands for "safety." | commandlinefan wrote: | Ok, memcpy(dst, src, strlen(src)) it is then! | gilbetron wrote: | Meh, most of us understood the sharp edges of strings pretty | well. Before, we'd check the len of strings before strcpy, | strncpy let us do it without doing that, and just slap a 0 in | if needed. Safe? No. Better? A bit. Do I ever want to do string | manipulation again with C? Nope. | tomjakubowski wrote: | Understanding the sharp edges is one thing. Being able to | avoid them in practice is another. The history of memory | safety problems in C string handling, especially involving | strcpy/strncpy, strongly suggests to me that they're | unavoidable even for skilled, knowledgeable, and experienced | C programmers. | Luyt wrote: | It would be great if the BANNED() macro could suggest the correct | function to use. | tinus_hn wrote: | You could send a pull request, it doesn't seem too complicated | to implement | lmilcin wrote: | To respond to some of the comments. | | It is not that there is anything intrinsically wrong with these | functions. You can technically use all of them and I have been | using all of them, safely, for decades. | | The issue is they are huge traps to the point that in a larger | piece of software one can say "well, it's just not worth it". | | You can go much, much, much further than that. | | In couple embedded projects I worked some of the rules were: | | * dynamic allocation after application has started is banned -- | any heap buffers and data structures must be allocated at the | start of the application and after that any allocation is a | compile time error, | | * any constructs that would prevent statically calculating stack | usage were banned (for example any form of recursion except when | exact recursion depth is ensured statically), | | * any locks were banned, | | * absolutely every data structure must have size ensured, in a | simple way, beyond any reasonable doubt, | | etc. | whatisthiseven wrote: | It is interesting to read the rules you came up with to limit | memory usage, and then to think of the criticisms one gets in | Java for limiting memory usage. In Java we try to limit new as | much as possible to prevent the GC from pausing too much, or | inconveniently, or for too long. And basically all the rules | you say are what we also use in Java. | | Except when you have these rules in Java, the ironic counter- | point is "if you are doing this much memory control yourself, | you should just use C or C++ or something". | | I'll keep your comment in mind next time I see that rebuttal. | Thank you. | zwieback wrote: | The stack thing was always the big worry for me. Without a | comprehensive static code analysis tool that's hard to do. And | runtime stack checking adds quite a bit of overhead, especially | if you also have to worry about running on the interrupt stack | and possibly switching. | xondono wrote: | Anything enforcing MISRA has essentially (almost) no way of | allocating memory at runtime. | fsociety wrote: | It's funny, I worked exclusively with MISRA at the start of | my career. Eventually I started a job at a FAANG and received | quizzical comments on why I implemented a memory arena. | | The argument was to allocate memory freely and let it pool | memory as necessary. Fair enough, it was simpler and fit the | standard expectation of development. | | The issue is that if you talk with the allocator team they | complain of not being able to fix performance issues fast | enough due to allocations firing off left and right in the | middle of a request. | | I never realized that my view of C programming is heavily | influenced by MISRA until your comment. | | I know game engine programming follows a similar, perhaps | unspoken, convention. | munchbunny wrote: | The lack of runtime allocations in game engine programming | comes from a different motivation: allocations are | expensive, garbage collections are expensive, cache | coherency matters, and you're chucking around a lot of very | similar looking objects, so... object pools! | orwin wrote: | Yeah, the first time we coded a scroller shooting game | with my friend (at school), we were baffled that our | terminal-based scroller lagged more than the raycaster we | did two weeks prior. Was it a C vs C++ thing? | | Turns out, creating then destroying every single | missile/enemy was extremely costly | hctaw wrote: | Custom allocators are quite common, it's not an arcane | convention. I think the rule of thumb is preallocate until | it gets questionable in complexity, then write your custom | allocator - and really it's only applicable to code with a | real-time deadline (hard or soft). Otherwise the system | allocator is going to be a lot smarter than yours once it | leaves microbenchmarks. | closeparen wrote: | How often does the dynamic allocation rule lead to an ad-hoc | allocator appearing inside the program? | | Also doesn't the OS lie? I thought the memory wasn't really | physically assigned until first use. | syncsynchalt wrote: | In my experience dynamic allocation is banned in either (a) | small embedded environments or (b) high scrutiny environments | (soft realtime, safety critical, etc). | | In both cases the project size is small enough, or the | scrutiny is high enough that the ad-hoc allocator doesn't | develop. The environment is also simple enough that the | memory cheats you're thinking of don't exist (or you can | squash them by touching all allocated memory up front). | at_a_remove wrote: | I have only ever dabbled in C, just to look at other people's | code and occasionally when I really needed speed, so I am at what | I would call a "Pretty Pathetic" level, able to recognize that I | am looking _at_ C. | | However, I look at old books on C, and then I look at this list, | and I wonder if it would not have been helpful to, after | mentioning that a function was banned, suggest what the | replacement is, even as a comment. ___________________________________________________________________ (page generated 2021-03-04 23:00 UTC)