[HN Gopher] Tell HN: C Experts Panel - Ask us anything about C ___________________________________________________________________ Tell HN: C Experts Panel - Ask us anything about C Hi HN, We are members of the C Standard Committee and associated C experts, who have collaborated on a new book called Effective C, which was discussed recently here: https://news.ycombinator.com/item?id=22716068. After that thread, dang invited me to do an AMA and I invited my colleagues so we upgraded it to an AUA. Ask us about C programming, the C Standard or C standardization, undefined Behavior, and anything C-related! The book is still forthcoming, but it's available for pre-order and early access from No Starch Press: https://nostarch.com/Effective_C. Here's who we are: rseacord - Robert C. Seacord is a Technical Director at NCC Group, and author of the new book by No Starch Press "Effective C: An Introduction to Professional C Programming" and C Standards Committee (WG14) Expert. AaronBallman - Aaron Ballman is a compiler frontend engineer for GrammaTech, Inc. and works primarily on the static analysis tool, CodeSonar. He is also a frontend maintainer for Clang, a popular open source compiler for C, C++, and other languages. Aaron is an expert for the JTC1/SC22/WG14 C programming language and JTC1/SC22/WG21 C++ programming language standards committees and is a chapter author for Effective C. msebor - Martin Sebor is Principal Engineer at Red Hat and expert for the JTC1/SC22/WG14 C programming language and JTC1/SC22/WG21 C++ programming language standards committees and the official Technical Reviewer for Effective C. DougGwyn - Douglas Gwyn is Emeritus at US Army Research Laboratory and Member Emeritus for the JTC1/SC22/WG14 C programming language and a major contributor to Effective C. pascal_cuoq - Pascal Cuoq is the Chief Scientist at TrustInSoft and co-inventor of the Frama-C technology. Pascal was a reviewer for Effective C and author of a foreword part. NickDunn - Nick Dunn is a Principal Security Consultant at NCC Group, ethical hacker, software security tester, code reviewer, and major contributor to Effective C. Fire away with your questions and comments about C! Author : rseacord Score : 604 points Date : 2020-04-14 13:03 UTC (9 hours ago) | papermachete wrote: | How and why will C combat Rust? | pascal_cuoq wrote: | In my opinion, the two languages are going to co-exist for a | long time. C has billions of lines of legacy software written | in it... In recent news, COBOL developers were sought after in | order to update existing COBOL software, so the same thing will | happen with C, perhaps to the end of humanity (I have become | pessimistic as to humanity's future). | | There are pieces of software that should be given priority for | a rewrite in Rust, but most of C software is never going to be | rewritten, because there is simply too much of it. | | Therefore, even if C did not have any advantage of its own over | Rust, there would still be legacy software to maintain and to | extend. | | The advantages of C include that sometimes, an embedded | processor with a proprietary instruction set is provided by the | chipmaker with its own C compiler, which is the only compiler | supporting the instruction set; that C is still currently used | to write the runtimes of higher-level languages (I'm familiar | with OCaml, but it isn't too much of a stretch to imagine that | the runtimes of Python, Haskell,... are also written in C). | cesarb wrote: | > In my opinion, the two languages are going to co-exist for | a long time. | | It goes deeper than that, in a couple of places Rust depends | on the C standard: the fixed-layout `#[repr(C)]` structs | (without that attribute, the compiler is free to reorder the | struct fields; with that attribute, it's laid out the way C | would do it), and the `extern "C"` function call ABI. The way | to call any other language from Rust, or Rust from any other | language, is to go through `extern "C"` functions passing | `#[repr(C)]` structs. So even if the C language dies one day, | parts of it will live in Rust forever (or as long as the Rust | language lives). | sramsay wrote: | There's tons of legacy C around, we have to maintain it, it's | not ideal unless you're on some niche platform, lots of stuff | should probably be written in a better language . . . | | I sincerely hope this is not the general attitude of the | standards committee. Some of us actually _prefer_ C, and | would like to see the language continue to flourish. | pascal_cuoq wrote: | Note that among the C experts participating in this AMA, I | am not one who is in the standardization committee. At | 14:59 EDT, just before the AMA was posted, we were joking | between ourselves about me having to post this disclaimer | but I guess there was a hidden truth in the joke. | rseacord wrote: | C is a pretty well established language, so this question | should probably be asked the other way around. C was primarily | designed to complete with FORTRAN. | ape4 wrote: | Rust has a package manager while C and C++ don't (as far as I | now). This alone make Rust more attractive for some projects. I | hope C and C++ get one. | clarry wrote: | Open up WG14 mailing list for non-members? | | It's hard to appreciate what's going on at WG14 (or take part) | when you can see the results only from afar, with none of the | surrounding discussion. | | I recently read Jens Gustedt's blog on C2x where he casually | recommended this as a way to get involved: "The best is to get | involved in the standard's process by adhering to your national | standards body, come to the WG14 meetings and/or subscribing to | the committee's mailing list." | | Afaict (from browsing the wg14 site), the mailing list and its | archives are not open to access. | | https://webcache.googleusercontent.com/search?q=cache:TnEGL4... | | EDIT: In general, how is one supposed to approach wg14 with ideas | or need for clarification on the standard's wording / | interpretation? | AaronBallman wrote: | > In general, how is one supposed to approach wg14 with ideas | or need for clarification on the standard's wording / | interpretation? | | I'm currently working on an update to the committee website to | clarify exactly this sort of thing! Unfortunately, the update | is not live yet, but it should hopefully be up Soon(tm). | | Currently, the approach for clarifications and ideas both | require you to find someone on the committee to ask the | question or champion your proposal for you. We hope to improve | this process as part of this website update to make it easier | for community collaboration. | DougGwyn wrote: | In general, the committee accepts what we used to call "defect | reports" (now something like "requests for improvement"), | assigns them "WG14 series" sequence numbers, and upon requests | for "floor time" schedules meeting discussions. Occasional | votes are taken, which might trigger modifications to the draft | standard. At some point, the committee decides that the updated | draft standard is ready for public review, and the various | national representatives deal with review comments. All this | starts with proposal documents in "WG14 series" form. | ori_b wrote: | Agreed. I would like to get involved, but I don't see any | reasonable way for me to do that as an individual. | aray wrote: | When do you think we will get an update to C11 or more recent | version of C to MISRA? Do you all have any influence on "Safety | Critical C" standards? | AaronBallman wrote: | The MISRA committee is a separate organization from the C | standards committee, but there is overlap between the two | groups and an official liaison process for the committees to | collaborate. So there's a bit of bidirectional influence | between the two groups. | | I am not on the MISRA committee, but I believe they talk a bit | about their public roadmap in this video: | https://vimeo.com/190304951 | shric wrote: | Is there a rule that any new proposals must already be a feature | in an existing major implementation? | beefhash wrote: | (Not one of the OPs:) Wasn't C11 Annex K, the notoriously | failed bounds-checking interfaces, a example of not having an | existing implementation? | AaronBallman wrote: | Annex K had an existing implementation from Microsoft. It | wasn't a fully conforming implementation when C11 shipped, | however (the specification drifted apart from the initial | implementation). | AaronBallman wrote: | Yes, the C2x charter has this requirement: http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2086.htm | shric wrote: | Thanks, so from "Only those features that have a history and | are in common use by a commercial implementation should be | considered", this precludes stuff that may only exist in | clang, gcc, glibc, etc.? If so, why? | parenthesis wrote: | You could interpret that as "in common use by a | commercial[ly used] implementation". | AaronBallman wrote: | I wouldn't read into "commercial" there, I think we meant | "production-quality" instead. (We should fix that!) | | Basically, we prefer seeing features that real users have | used as opposed to an experimental branch of a compiler | that doesn't have usage experience. Knowing it can be | implemented is one thing, but knowing users want to use it | is more compelling. | hellofunk wrote: | I really like the relative simplicity of C compared to C++ and | recently wrote a project in C, but eventually rewrote it in C++ | for just a few seemingly trivial reasons that nonetheless were | important time savers. I'd love to know if the C standard, as can | run on GPUs also, will ever evolve to offer: | | 1) namespaces, so function names don't need to be 30 characters | to avoid naming collision | | 2) guaranteed copy elision or RVO -- provides greater confidence | for common idioms and expressivity compared to passing out | parameters | AvImd wrote: | Is there a possibility there will be introduced a new rule saying | "if the compiler detects an UB it should abort the compilation | instead of breaking the code in the most incomprehensible way | possible"? | | Right now it's just scary to start a new project in C. It would | be really great if there was more emphasis on correctness of the | produced code instead of the insane optimizations. | DougGwyn wrote: | Try using "lint" or other code checkers. | Someone wrote: | int i; [...] i += 1; | | potentially is undefined behavior; _i_ could overflow. | | Compilers nowadays are fairly good at warning about _definite_ | undefined behavior. | | I don't think anybody would be happy with a compiler that | aborted on all _potential_ undefined behavior. That would | (almost) be equivalent to banning the use of all signed ints. | AaronBallman wrote: | It would be _wonderful_ (IMO) if we could get to that point, | but that would leave implementations with too great of a burden | because many forms of UB can only be caught at runtime (without | a considerable number of false positives). Generally, the C | committee makes things a "constraint violation" (aka, we would | like implementations to err) whenever something can be caught | at compile time, and we leave the undefined behavior hammer for | scenarios where there is not a reasonable alternative. | | Thankfully, there are a lot of tools to help developers catch | UB these days (UBSan, static analyzers, valgrind, etc). I would | recommend using those tools whenever starting a new project in | C (or C++, for that matter). | klodolph wrote: | This can only be done at compile time in very specific cases. | The huge problem here is the compiler has no way of knowing | which cases of undefined behavior are _bugs in the program_ and | which cases of undefined behavior are just examples of | unreachable code. If the compiler aborted compilation when it | detected undefined behavior, you'd be getting a lot of false | positives for unreachable code, and you'd need to solve that | problem (figuring out how to generate sensible errors and | suppress them). This is not even remotely easy. | | If you are concerned about safety there are ways to achieve | that, like using MISRA C, formally verifying your C, or by | writing another language like Rust. | kzrdude wrote: | Good point, but could it not be required that the unreachable | code would be annotated to be unreachable? It could even have | a (development only) assertion in the location. | klodolph wrote: | That would be an immense undertaking. It's not really just | that some statement or expression is unreachable (we have | __builtin_unreachable() in GCC for stuff like that) but | that certain states are unreachable. | | For example, int buffer_len(struct buffer | *buf) { return buf->end - buf->start; } | | There are at least three states that trigger undefined | behavior: buf is not a valid pointer, buf->end - buf->start | doesn't fit in int, and buf->end and buf->start don't point | to the same object. | | I'm not sure how you would annotate this. At the function | call site, you would somehow need to show that buf is a | valid pointer, and that start/end point to same object and | the difference fits in an int. It would start looking more | like Coq or Agda than C. | | Honestly, I think if you really want this kind of safety, | your options are to use formal methods or switch to a | different language. | | There's also this weird assumption here that the compiler | detects undefined behavior in your program and then mangles | it. It's really the opposite--the compiler assumes that | there is no undefined behavior in your program, and | optimizes accordingly. In practice you can turn | optimizations off and get something much closer to the | "machine model" of C (which doesn't really exist anyway) | but most people hate it because their code is too slow. | kzrdude wrote: | Thanks, so it's definitely easier said than done! Good | explanation. | AvImd wrote: | > If the compiler aborted compilation when it detected | undefined behavior, you'd be getting a lot of false positives | for unreachable code | | Could you please provide an example of this? | toasted_flakes wrote: | Overflow of signed integers is undefined. | int add(int a, int b) { return a + b; } | | Unless the compiler can prove that `add` is never called | with a and b values resulting in an overflow, this code can | lead to UB, and, under your rules, the compilation aborts. | msebor wrote: | Some implementations have been making a lot of effort to do | just that. GCC in particular has been adding these types of | checks (either as warnings or sanitizers) in recent years and | although there is still much to improve I'd like to think we | have made good progress. | | Adding a rule requiring implementations to error out in cases | of undefined behavior would be hard to specify in the standard. | It could (and in my view should) be done by providing non- | normative encouragement as "Recommended Practice." | mcguire wrote: | Any chance of getting something like Frama-C officially blessed? | Jahak wrote: | Tell me where I can get the C89 standard for free (pdf or other | formats) | pascal_cuoq wrote: | The last time I needed it, archive.org had a link to a PDF of | it. | | I couldn't find that again in one minutes, but here is the text | version: | http://web.archive.org/web/20030222051144/http://home.earthl... | Jahak wrote: | Thanks | grok22 wrote: | Things I would like C to have: | | - stricter type-checks on typedef types (useful when passing | function parameters) - gcc's ' warn_unused_result' attribute for | functions (ensure error returns are checked) - on-entry/on-exit | qualifiers for functions (to do things like make sure you | lock/unlock semaphores for instance before entry/exit of | function) - D language's 'scope' feature (better handling of | error path) - loops in the c pre-processor! (better code-gen) | | Any chance any of this is on the radar for the next-gen C | standard? Some of these are just ergonomics, but the first two | might've have saved me some grief a few times. | _kst_ wrote: | typedef, in spite of the name, doesn't create a new type. It | only creates a new name for an existing type. Changing that | would break existing code. | | I wouldn't mind seeing a new feature that _does_ define a new | type (one that 's identical to, but incompatible with, an | existing type), but we can't call it "typedef". | | In a sense that feature already exists. You can define a | structure with a single member of an existing type. But you | have to refer to the member by name to do anything with it. | arunc wrote: | What do you think about D language's mode to work as a better C | alternative[0]? It seems to even do printf format validation. Can | this be the future of C? | | [0] https://dlang.org/spec/betterc.html | begriffs wrote: | Has there been a survey to determine what percentage of known | compilers support each C version, like C89, C99, C11? I've been | sticking to C99 because I assumed later versions won't be widely | adopted for a long time to come. Is this accurate? | DougGwyn wrote: | There is a Web page I saw a few days ago that does that, | probably findable by grepping Wikipedia. Unfortunately I forget | its URL. | emilfihlman wrote: | Will you ever add / have you considered adding sane formatting | options for fixed length variables in printf? Say %u32 or %s64 ? | | Have you considered adding access to structure members by index | or by string name? Have you considered dynamic structures? | rmind wrote: | Just FYI -- there are macros for the fixed-length types, e.g.: | printf("U32: %" PRIu23 ", U64: " PRId64, (uint32_t)1, | (int64_t)2); | | Perhaps not as handy as %u32 or %s64, but it's here. | emilfihlman wrote: | Yeah, and the issue is with those macros exactly. It makes | writing code on them really damn annoying and it relies on C | constant string concatenation, breaks the flow quite a lot. | _kst_ wrote: | Which is why I usually convert to intmax_t or uintmax_t, or | to some type that I know is wide enough: | uint64_t foo = ...; printf("foo = %ju\n", | (uintmax_t)foo); /* OR */ printf("foo = | %llu\n", (unsigned long long)foo); | michaelt wrote: | I think what emilfihlman means is those macros are hard to | remember and clumsy to use - which you might agree with when | I point out you made two mistakes in two usages :-p | [deleted] | AaronBallman wrote: | > Will you ever add / have you considered adding sane | formatting options for fixed length variables in printf? Say | %u32 or %s64 ? | | I'm not certain about the historical answer to this, but I do | know that we're currently considering a proposal to introduce | an exact bit-width integer type '_ExtInt(N)' to the language, | and how to handle format specifiers for it is part of those | discussions, so we are considering some changes in this area. | | > Have you considered adding access to structure members by | index or by string name? Have you considered dynamic | structures? | | I don't recall seeing any such proposals. I'm not familiar with | the term "dynamic structures", what do you have in mind there? | emilfihlman wrote: | >and how to handle format specifiers for it is part of those | discussions, so we are considering some changes in this area. | | Please, please, please pick short and descriptive format | specifiers, like %[suf]\d+, ie s64 | v=somenumber; printf("%s64\n", v); | | _ExtInt(N) and PRIx64 etc look absolutely horrid. u?int\d+_t | are also really bad, it would be great to have just [suf]\d+ | as types, where \d+ is 8, 16, 32, 64 for [us] and 32 and 64 | for f. | | >what do you have in mind there? | | Say like VLAs but structures with members that are | dynamically defined and used. | AaronBallman wrote: | > Please, please, please pick short and descriptive format | specifiers, like %[su]\d+, ie | | That's my personal preference as well. Using the PRI macros | always makes me feel sad. | | > Say like VLAs but structures with members that are | dynamically defined and used. | | Ah, no, I don't recall any proposals along those lines. | It's an interesting idea, and I'd be curious what the | runtime performance characteristics would be vs what kind | of new coding patterns would emerge that you couldn't do | previously though! | bumblebritches5 wrote: | Hey guys, | | How likely would the standard be to accept a proposal to add | compile time reflection to the preprocessor, or even adopt C++'s | constexpr? | | My use case is creating a global array in a header from static | compound literals in multiple source files at compile time, and | outside of some crazy clang-tblgen type solution, or very | platform specific linker hacks, it's completely unsupported by C. | dhhwrongagain wrote: | Is memset(malloc(0), 0, 0) undefined behavior? | DougGwyn wrote: | Let's assume the types have been corrected. malloc((size_t)0) | behavior is defined by the implementation; there are two | choices: (a) always returns a null pointer; or (b) acts like | malloc((size_t)1) which can allocate or fail, and if it | allocates then the program shall not try to reference anything | through the returned non-null pointer. Now, memset itself is | required (among other things) to be given as its first argument | a valid pointer to a byte array. In particular, it shall not be | a null pointer. Tracking through the conformance requirements, | if the malloc call returns a null pointer then the behavior is | undefined. Thus, you should not program like this. | hsivonen wrote: | Does the committee have plans to deprecate (as in: give compiler | license to complain suchthat compiler developers can appeal to | yhe standard when users complain back) locale-sensitive functions | like isdigit, which is useless for processing protocol syntax, | because it is locale-sensitive, and useless for processing | natural-language text, because it examines only one UTF-8 codw | unit? | DougGwyn wrote: | isdigit is likely to remain, because much existing code does | use it (perhaps in different contexts from the one you cited). | If you need a different function specification to do something | different, it could be added in a future release, but that | doesn't mean that we need to force programmers to change their | existing code. | hsivonen wrote: | Does there exist a use case in portable code such that use of | isdigit is not a bug? | | How does the committee view non-portable existing code | generally when considering changes? | DougGwyn wrote: | Code can be non-portable for various reasons, not all of | them bad. I just grepped a recent release of DWB and found | about 100 uses of isdigit, most of which were not input | from random text but rather were used internally, such as | "register" names (limited to a specified range). Other | packages are likely to have similar usage patterns. I | really don't want to have to edit that code just for | aesthetics. | _kst_ wrote: | What about giving isdigit and friends defined behavior for | any argument value that's within the range of any of char, | signed char, or unsigned char? | | The background (I know Doug knows this): isdigit() takes an | argument of type int, which is required to be either within | the range of unsigned char, or have the value EOF (required | to be negative, typically -1). | | The problem: plain char is often signed, typically with a | range of -128..+127. You might have a negative char value in | a string -- but passing any negative value other than EOF to | isdigit() has undefined behavior. Thus to use isdigit() | safely on arbitrary data, you have to cast the argument to | unsigned char: if (isdigit((unsigned | char)s[i])) ... | | A lot of C programmers aren't aware of this and will pass | arbitrary char values to isdigit() and friends -- which works | fine most of the time, but risks going kaboom. | | Changing this could raise issues if -1 is a valid character | value and also the value of EOF, but practically speaking -1 | or 0xff will almost never be a digit in any real-world | character set. (It's y in Unicode and Latin-1, which might | cause problems for islower and isalnum.) | bonzini wrote: | Is there any reason to keep the undefined behavior for shifts of | negative numbers, instead of making it implementation defined? | Most compilers (for twos-complement architectures at least) are | not using that latitude, and I would also guess that most | programs that are written for twos-complement arithmetic likewise | not expecting undefined behavior for non-overflowing left shifts | of negative numbers. Thanks! | DougGwyn wrote: | "Implementation-defined" is a nuisance, because then you need | to add code for all the variations, which also requires a set | of standard macros, etc. It is easier and less trouble-prone to | just avoid using the currently undefined behavior. | hyc_symas wrote: | The standard string library is still pretty bad. This would have | been a much better addition for safe strcpy. | | Safe strcpy char *stecpy(char *d, const char | *s, const char *e) { while (d < e && *s) | *d++ = *s++; if (d < e) *d = '\0'; | return d; } main() { char buf[64]; | char *ptr, *end = buf+sizeof(buf) ; ptr = | stecpy(buf, "hello", end); ptr = stecpy(ptr, " world", | end); } | | Existing solutions are still error-prone, requiring continual | recalculation of buffer len after each use in a long sequence, | when the only thing that matters is where the buffer ends, which | is effectively a constant across multiple calls. | | What are the chances of getting something like this added to the | standard library? | pascal_cuoq wrote: | For what it's worth, I personally like this approach, because | there are some cases in which it requires less arithmetic in | order to be used correctly. And it lends itself better to some | forms of static analysis, for similar reasons, in the following | sense: | | There is the problem of detecting that the function overflows | despite being a "safe" function. And there is the problem of | precisely predicting what happens after the call, because there | might be an undefined behavior in that part of the execution. | When writing to, say, a member of a struct, you pass the | address of the next member and the analyzer can safely assume | that that member and the following ones are not modified. With | a function that receives a length, the analyzer has to detect | that if the pointer passed points 5 bytes before the end of the | destination, the accompanying size it 5, if the pointer points | 4 bytes before the end the accompanying size is 4, etc. | | This is a much more difficult problem, and as soon as the | analyzer fails to capture this information, it appears that the | safe function a) might not be called safely and b) might | overwrite the following members of the struct. | | a) is a false positive, and b) generally implies tons of false | positives in the remainder of the analysis. | | (In this discussion I assume that you want to allow a call to a | memory function to access several members of a struct. You can | also choose to forbid this, but then you run into a different | problem, which is that C programs do this on purpose more often | than you'd think.) | doublesCs wrote: | What's wrong with: *p += sprintf(*p, | "hello"); *p += sprintf(*p, "world"); | ftvy wrote: | It looks like you'd be dereferencing the pointer p, but you'd | also need to make sure that what p points to has enough | memory. | pjscott wrote: | That could lead to buffer overflow.A | doublesCs wrote: | When I wrote that, I had in mind the observation about | continued recalculation of buffer len. My suggestion has no | such thing. It looks so good that I imagine this was | probably how it was intended to be used. With that in mind, | isn't it the user's job to know the size of the buffers | he's using? Doesn't expecting that the function know about | buffer size go against the single responsibility principle? | | I'm new to C, in case you couldn't tell. | clarry wrote: | > With that in mind, isn't it the user's job to know the | size of the buffers he's using? | | Yes. The user knows the size of his buffer, and then | passes that knowledge on to the string constructing | functions so that they do not overflow the buffer. | | > Doesn't expecting that the function know about buffer | size go against the single responsibility principle? | | What's single responsibility again? "Execute this one | assembly instruction"? | | What you want from standard library functions is, | usually, "construct a string into this buffer (whose size | is N)." | pascal_cuoq wrote: | The problem in practice is that you do not write "hello" | and "world" to the destination buffer. You write data | that is computed more or less directly from user inputs. | Often a malicious user. | | So the user only needs to find a way to make the data | longer than the developer expected. This may be very | simple: the developer may have written a screensaver to | accept 20 characters for a password, because who has a | longer password than this? Everyone knows that only the | first 8 characters matter anyway. (This may have been | literally true a long time ago, I think, although it's | terrible design. Anyway only 8 characters of hash were | stored, so in a sense characters after the first 8 did | not buy you as much security as the first 8, even if it | was not literally true.) | | And this is how there were screensavers that, when you | input ~500 characters into the password field, would | simply crash and leave the applications they were hiding | visible and ready for user input. This is an actual | security bug that has happened in actual Unix | screensavers. The screensavers were written in C. | | And long story short, we have been having the exact same | problem approximately once a week for the last 25 years. | Many people agree that it is urgent to finally fix this, | especially as the consequences are getting worse and | worse as computers are more connected. | | One solution that some favor is functions that make it | easier not to overflow buffers because you tell them the | size of the buffer instead of trying to guess in advance | how much is enough for all possible data that may be | written in the buffer. This is the thing being discussed | in this thread. The function sprintf is not a contender | in this discussion. The function snprintf could be, if | used wisely, but it is a bit unwieldy and the OP's | proposal has a specific advantage: you compute the end | pointer only once, because this is the invariant. | wahern wrote: | Perhaps you meant snprintf. But snprintf can fail on | allocation failure, fail if the buffer size is > INT_MAX, and | in general isn't very light weight--last time I checked | glibc, snprintf was a thin wrapper around the printf | machinery and is not for the faint of heart--e.g. | initializing a proxy FILE object, lots of malloc interspersed | with attempts to avoid malloc by using alloca. | | It can also fail on bad format specifiers--not directly | irrelevant here except that it forces snprintf to have a | signed return value, and mixing signed (the return value) and | unsigned (the size limit parameter) types is usually bad | hygiene, especially in interfaces intended to obviate buffer | overflows. | spc476 wrote: | Well, that should be `snprintf()` to start with, but even | with that, there are issues. The return type of `snprintf()` | is `int`, so it can return a negative value if there was some | error, so you have to check for that case. That out of the | way, a positive return value is (and I'm quoting from the man | page on my system) "[i]f the output was truncated due to this | limit then the return value is the number of characters which | would have been written to the final string if enough space | had been available." So to safely use `snprintf()` the code | would look something like: int size = | snprintf(NULL,0,"some format string blah blah ..."); | if (size < 0) error(); if (size == INT_MAX) | error(); // because we need one more byte to store the NUL | byte size++; char *p = malloc(size); | if (p == NULL) error(); int newsize = | snprintf(p,size,"some format string blah blabh ... "); | if (newsize < 0) error(); if (newsize > size) | { // ... um ... we still got truncated? } | | Yes, using NULL with `snprintf()` if the size is 0 is allowed | by C99 (I just checked the spec). | | One thing I've noticed about the C standard library is that | is seems adverse to functions allocating memory (outside of | `malloc()`, `calloc()` and `realloc()`). I wonder if this has | something to do with embedded systems? | msebor wrote: | There are many improved versions of string APIs out there, too | many in fact to choose from, and most suffer from one flaw or | another, depending on one's point of view. Most of my recent | proposals to incorporate some that do solve some of the most | glaring problems and that have been widely available for a | decade or more and are even parts of other standards (POSIX) | have been rejected by the committee. I think only memccpy and | strdup and strdndup were added for C2X. (See http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2349.htm for an overview.) | AceJohnny2 wrote: | > _Most of my recent proposals [...] have been rejected by | the committee._ | | Does anyone have insight on why? | [deleted] | clarry wrote: | 1. Are there any plans for standardizing empty initializer lists? | struct foo { int a; void *p; }; struct foo f = {0}; | // legal C, f->p initialized like a static variable | struct foo f = {}; // not legal but supported by gcc | | To me it would make sense that there is no need to specify a | value for any of the members that are intended to be initialized | exactly like static variables (and the first member is not | special so I shouldn't have to explicitly assign a zero?). | However the syntax currently demands at least one initializer. | | -- | | 2. I recall seeing a proposal for allowing declarations after | case labels: switch (foo) { case 1: | int var; // ... } | | This is currently not allowed and you'd have to wrap the lines | after case in braces, or insert a semicolon after the case label. | Is this making it to c2x? | | -- | | 3. I've run into some recent controversy w.r.t. having multiple | functions called main (and this has come up in production code). | In particular, I ran into a program programs that has a static | main() function (with parameters that are not void or int and | char _[]), which is not intended to be_ the* main function that | is the program's entry point. | | gcc warns about this because the parameters disagree with what's | prescribed for the program entry point. It's not clear to me | whether this is intended to be legal or not. | | -- | | 4. Looking at the requirements for main brings up another | question: it says how main should be defined (no static or extern | keyword). However, the definition could be preceded by a static | declaration, which then affects the definition that follows: | | _If the declaration of an identifier for a function has no | storage-class specifier, its linkage is determined exactly as if | it were declared with the storage-class specifier extern._ | | _For an identifier declared with the storage-class specifier | extern in a scope in which a prior declaration of that identifier | is visible, if the prior declaration specifies internal or | external linkage, the linkage of the identifier at the later | declaration is the same as the linkage specified at the prior | declaration._ | | Therefore, it is possible to have a main function with internal | linkage and a definition that exactly matches the one given in | the spec: static int main(int, char *[]); | int main(int argc, char *argv[]) { /* ... */ } | | As one might guess, this program doesn't make it through the | linker when compiled with gcc. Is this supposed to be legal? | Should the spec perhaps require main to have external linkage, | and then allow other functions called main with internal linkage | (and parameters that do not match what is required of the | external one)? | | EDIT: --- | | Are the fixes w.r.t. reserved identifiers going to make it in | c2x? Can I finally have a function called toilet() without | undefined behavior? | potiuper wrote: | Any plans to add semantics for exceptional situations such as | divide by zero and dereferencing a null pointer? | https://blog.regehr.org/archives/232 | | Or incorporating features from this 14 item list? | https://blog.regehr.org/archives/1180 | | As it appears these have failed: | https://blog.regehr.org/archives/1287 | DougGwyn wrote: | The problem is that if the checks are always performed, the | object code is significantly slowed down. If all computers | supported the checking in hardware, then we could do it. You | don't really want the current C approach (signal) to trigger | except in an emergency, because there is no way to insert | cleanup/retry/etc. recovery code via a signal handler. | rseacord wrote: | I don't know of any plans to add semantics for divide-by-zero | of dereferencing a null pointer. I'm guessing this is not | viable because there is no agreed upon semantics among | different implementations. | | Making C friendlier is always a good idea, and I think the | committee is (slowly) working towards this goal. I would have | to examine these papers by John Regehr in more detail. Looking | quickly at his proposals I can see why there he couldn't find | consensus for these ideas as some of them do appear | controversial. | | An example of a friendly dialect of C is always is C0 | (C-naught) from CMU. I don't think I'm exaggerating when I say | that this language has not "caught on". | rurban wrote: | 1. When will we get proper strings in the stdlib? | | 2. When we will get the Secure Annex K extensions? | | 3. When we will get mandatory warnings when the compiler decides | to throw away statements it thinks it doesn't need? Like memset | or assignments. Compilers are getting worse and worse, and | certainly not better. | | ad 1) Strings are Unicode nowadays, not ASCII. Nobody uses wchar | but Microsoft. Everybody else is using utf8, but there's nothing | in the standard. Not even search functions with proper casing | rules and normalization. Searching for strings should be pretty | basic enough. | | 2. The usual glibc answer is just bollocks. You either do | compile-time bounds checks or you don't. But when you don't, you | have to do it at runtime. So it's either the compilers job, or | the stdlib job. But certainly not the users. | rseacord wrote: | For (2) I guess it depends. Annex K is obviously already a part | of the standard so it depends on the implementation. There is a | push to eliminate Annex K altogether from the C Standard. If | this push fails, it may be the case that more libraries will | add support for this optional feature of the language. In the | meanwhile, there is the Open Watcom compiler implementation | [1], the Safe C Library [2], and Slibc [3]. | | [1] Watcom C Library Reference Version 1.8. Open Watcom. 2008. | ftp://ftp.openwatcom.org/manuals/current/clib.pdf | | [2] Safe C Library -- A full implementation of Annex K | https://github.com/rurban/safeclib/ | | [3] slibc https://code.google.com/archive/p/slibc/ | rseacord wrote: | For (3) mandatory warnings the closest thing is probably | ISO/IEC TS 17961:2013. The purpose of ISO/IEC TS 17961 is to | establish a baseline set of requirements for analyzers, | including static analysis tools and C language compilers, to be | applied by vendors that wish to diagnose insecure code beyond | the requirements of the language standard. All rules are meant | to be enforceable by static analysis. The criterion for | selecting these rules is that analyzers that implement these | rules must be able to effectively discover secure coding errors | without generating excessive false positives. | rseacord wrote: | Going to try to answer these separately. For (1) if you mean | strings that are primitive types my guess is never. When had an | hour discussion on this topic at a London meeting where we were | discussing new features for C11 and my take away was that this | would never happen because it would require a significant | change to the memory model for the language. | rurban wrote: | For the u8 type sure. Nobody needs a new type. | | But at least add wcsnorm and wcsfc as I implemented them in | the safeclib are required. Not even coreutils, grep, awk, ... | can search unicode strings. | | And u8 library variants of str* and wcs* are definitely | needed, maybe just with uchar* not char*. | DougGwyn wrote: | Why would the utilities not handle unicode searching? | Unicode characters match properly, the null terminator | works the same, and non-ANSI codes are just one or more | random 8-bit values which can be compared, copied, etc. | [deleted] | dboon wrote: | What are two or three C codebases that are elegantly and cleanly | written, and that every mid-level C programmer should read for | sake of knowledge? | pascal_cuoq wrote: | I would recommend musl, although the style is a bit | idiosyncratic in places: https://www.musl-libc.org | | Mbed TLS, since I have it in mind from another thread, is also | a pretty clean C library for the problem it tries to solve; | it's a testament to its design that we (TrustInSoft, who had | not participated to its development) were able to verify that | some uses of the library were free of Undefined Behavior: | https://tls.mbed.org | uasm wrote: | > "I would recommend musl, although the style is a bit | idiosyncratic in places: https://www.musl-libc.org" | | Opened a random part of musl out of sheer boredom. Here's | what I see: | | https://git.musl-libc.org/cgit/musl/tree/include/aio.h | | A bunch of return codes #defined like so (see | https://git.musl-libc.org/cgit/musl/tree/src/aio/aio.c): | | #define AIO_CANCELED 0 #define AIO_NOTCANCELED 1 #define | AIO_ALLDONE 2 | | #define LIO_READ 0 #define LIO_WRITE 1 #define LIO_NOP 2 | | #define LIO_WAIT 0 #define LIO_NOWAIT 1 | | Why weren't they using an enum instead? I wouldn't sign off | on this code (and I don't think it lives up to best | practices). | pdw wrote: | musl is implementing POSIX. POSIX requires those constants | to be preprocessor defines. (Generally, musl asssumes the | reader is quite familiar with the C and POSIX standards, | which makes sense since it's a libc implementation.) | rvp-x wrote: | A lot of you seem to be working on commercial solutions to C's | insecurity. Does this feel like a conflict of interest to you? | pascal_cuoq wrote: | I have been told in this very AMA that I lacked enthusiasm | about C (and the gratuitous insecurity of the language when we | know that a well-designed type system and a few runtime checks | solve the problem entirely is indeed the reason for my | perceived lack of enthusiasm): | https://news.ycombinator.com/item?id=22865912 | | I hope that this perceived lack of enthusiasm means I am | handling the conflict of interest honorably. | rseacord wrote: | Good question, but not at all! I've been working as hard as I | can for the past 15 years to improve C Language security as | have other security-minded members of the committee. Generally | speaking, we are in the minority as performance is still the | major driver for the language. Any security solution that | introduces > 5% overhead, for example, is a nonstarter. I think | we all understand that are jobs are completely safe no matter | what security improvements we can get adopted. | | The committee works a lot lobbyist. A minority of people with a | large financial interest in the technology (such as compiler | writers) have undue influence because they participate in the | process. I always encourage C language users to take a more | active role, but they usually don't. Cisco is an example of | user community that actively takes part in C Standardization. | pjmlp wrote: | I guess this is why vendors like Apple, Oracle, ARM and | Google end up going the hardware memory tagging route | instead. | WalterBright wrote: | I wrote about a simple addition to C that could eliminate most | buffer overflows: | | https://www.digitalmars.com/articles/C-biggest-mistake.html | | I.e. offering a way that arrays won't automatically decay to | pointers when passed as a function parameter. | quelsolaar wrote: | Arrays are pointers. If they aren't pointers then you need to | copy the data when you are giving an array as a function | parameter. that's a lot slower. Being able to prepare an set of | data in an array and then giving a pointer to a function is | very useful. You could add a second type of array on top of | what you have in C that includes more stuff, but if that's what | you want you can implement that yourself with a struct. | napsy wrote: | An array is not a pointer. These are completely different | data types. For example, you can't apply pointer arithmetic | to arrays without casting them to pointers. | WalterBright wrote: | That's right. They are converted to pointers when passed to | a function, even if the function declares the parameter as | an array. | napsy wrote: | They're not converted but can be implicitly casted to | pointer types. | _kst_ wrote: | No, they're converted. There is no such thing as an | "implicit cast". And it's not specific to arguments in | function calls. | | Array types and pointer types are distinct. | | An expression of array type is, in most but not all | contexts, implicitly converted (really more of a compile- | time adjustment) to an expression of pointer type that | yields the address of the 0th element of the array | object. The exceptions are when the array expression is | the operand of a unary & (address-of) or sizeof operator, | or when it's a string literal in an initializer used to | initialize an array (sub)object. (The N1570 draft | incorrectly lists _Alignof as another exception. In fact, | _Alignof can only take a parenthesized type name as its | operand.) | | If you do: int arr[10]; | some_func(arr); | | then arr is "converted" to the equivalent of &arr[0] -- | not because it's an argument in a function call, but | because it's not in one of the three contexts listed | above in which the conversion doesn't take place. | | Another rule that causes confusion here is that if you | define a function parameter with an array type, it's | treated as a pointer parameter. For example, these | declarations are exactly equivalent: | void func(int arr[]); void func(int arr[42]); // | the 42 is quietly ignored void func(int *arr); | | Suggested reading: http://www.c-faq.com/, particularly | section 6, "Arrays and Pointers". | | A conversion converts a value of one type to another type | (possibly the same one). The term "cast" refers only to | an explicit conversion, one specified by a cast operator | (a parenthesized type name preceding the expression to be | converted, like "(double)42"). An implicit conversion is | one that isn't specified by a cast operator. | JoeAltmaier wrote: | Sure you can. int aFoo[]; has many legal array operations | possible: *(aFoo+3) should work fine and | return the 4th int in the array. | quelsolaar wrote: | they are accessed using pointer arithmetic, if you wanted | them to contain length data, you would need a different | access pattern. I think one of the great features of C is | that it doesn't do anything under the hood, its all | explicit. If you want to bounds check, then do it. | WalterBright wrote: | > they are accessed using pointer arithmetic | | Not always. Consider: int a[3]; | a[1] = 2; | | This is not using pointer arithmetic. Dump the generated | code if you don't believe me :-) | quelsolaar wrote: | Its still pointer arithmetic, its just done compile time | rather then at execution. Still, you deserve style points | :-) | seamyb88 wrote: | Thoughts on Gnome glib, gobject, vala etc? | | I tend to use glib for my (academic) code for pretending C is a | high-level language. It also seems to make up for implementation- | dependent functions in C and many portability issues. Also, IMO, | vala > C++. | | My question is, really, are there any other tools for high-level | C programming and do you know of any disadvantages of the Gnome | stack? | tayistay wrote: | To what extent does compiler complexity factor into your thinking | about the evolution of C? | | Thanks for this! | AaronBallman wrote: | When the committee considers proposals, we do consider the | implementation burden of the proposal as part of the feature. | If parts of the proposal would be an undue burden for an | implementation, the committee may request modifications to the | proposal, or justification as to why the burden is necessary. | tayistay wrote: | Thanks. Do you have an example of a proposal that the | committee considered an undue burden for an implementation | but was otherwise sound? | AaronBallman wrote: | Not off the top of my head, but as an example along similar | lines, when talking about whether we could realistically | specify twos complement integer representations for C2x, we | had to determine whether this would require an | implementation to emulate twos complement in order to | continue to support C. Such emulation might have been too | much of a burden for an implementation's users to bear for | performance reasons and could have been a reason to not | progress the proposal. | floatms wrote: | 1. How likely are named constants of any types to be included in | C2x? I'm referring to the idea of making register const values be | usable in constant expressions. | | 2. Is there, or was there ever a proposal to make struct types | without a tag be structurally typed? This would not break | backwards compatibility as far as I can see, and would make these | types much more useful as ad-hoc bags of data. Small example: | struct {size_t size; void *data;} data = get_data(); int | hash = hash_data(data); | | I believe there was at least one proposal about error handling | that more or less relied on the above to be valid semantically. | | 3. Is there any interest in making the variadic function | interface a bit nicer to use? I would like to bring back an old | feature and have an intrinsic to extract a pointer from the | variadic parameter list, so that we can iterate over it ourselves | (or even index directly). void *arg_ptr = | va_ptr(last); | | More out there would be a parameter that would be implicitly | passed to a variadic function to indicate the number of | arguments. void variadic(..., va_size count) { | } variadic(10, 20, 30); // count would be three | pascal_cuoq wrote: | 3. would have to be a new mechanism for variadic functions, | that would have to be distinguished in header files from the | old mechanism with which it is incompatible. So this proposal | would imply some new keyword or syntax. I am not in the | committee, but I don't think this is going to happen. The | improvement is way too incremental to force a new syntax. | | (The committee is fine with incremental improvements, but new | syntax need to have strong motivation behind it, much stronger | than this.) | floatms wrote: | Yes, I know that this is the most disruptive out of the | three. The implicit parameter more so than the va_ptr() | intrinsic (in my opinion), but I understand that changes like | these are not very well motivated (except for a slightly | nicer developer experience). | uecker wrote: | (disclaimer: also a WG14 member) | | 1. I want this too. | | 2. Here is my proposal: http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2366.pdf | | 3. Yes, variadic functions should be improved. | msebor wrote: | I'd expect a proposal for (1) to be well received. The only | proposal I recall that deals with (2) is http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2067.pdf. I think it's still | being discussed. (3) is highly unlikely if it involved ABI | changes. Even if it could be done without such changes unless | there is a precedent for it in an existing compiler (and | preferably more), it would likely be a tough sell. | floatms wrote: | Is the linked proposal really dealing with unnamed struct | types? I skimmed it and it seems like it is dealing with | named constants. Also, is there a proposal for (1) currently, | or is someone planning on writing one? Regarding (3), yes, | this one was mostly wishful thinking. | oldiob wrote: | Is the committee planning on working on the preprocessor? I don't | see any reason for not boosting it. It's time for C to have real | meta-programming. Would be nice to have local macros that are | scoped. | | On another note: | | - Official support for __attribute__ | | - void pointers should offset the same size as char pointers. | | - typeof (can't stress this one enough) | | - __VA_OPT__ | | - inline assembly | | - range designated initializer for arrays | | - some GCC/Clang builtins | | - for-loop once (Same as for loop, but doesn't loop) | | Finally, stop putting C++ craps into C. | jparkie wrote: | +1 for Modern Metaprogramming. | | I know some people are against metaprogramming because they | believe the abstractions hide the intrinsic of how the | underlying code will execute, but I would love to write | substantial tests in C without relying on FFI to Python or C++ | to perform property-based testing, complex fuzzing, and | whatever. I feel metaprogramming would be a huge boon for C | tooling and developer productivity. | oldiob wrote: | In my point of view, there's a difference between abstraction | created by the language, e.g. lambdas or virtual table in | C++, and abstraction created by the programmers via the CPP. | | The former is compiler dependent and you cannot know how it's | implemented. The former is simple text substitution and | you're the one implementing it. I often find myself creating | small embedded languages in CPP for making abstraction, and I | know exactly what C code it's going to generate and thus the | penalty if there's any. | | People that are afraid of the preprocessor simply don't | understand how powerful it's in good hands. | blocks_plz wrote: | Thanks for the AMA | | 1. Will the Apple's Blocks extension, which allows creation of | Closures and Lambda functions, be included in C2X? | | 2. Are there any plans to improve the _Generic interface (to make | it easy to switch on multiple arguements, etc.)? | AaronBallman wrote: | > 1. Will the Apple's Blocks extension, which allows creation | of Closures and Lambda functions, be included in C2X? | | We haven't seen a proposal to add them to C2x, yet. However, | there has been some interest within the committee regarding the | idea, so I think such a proposal could have some support. | | > 2. Are there any plans to improve the _Generic interface (to | make it easy to switch on multiple arguements, etc.)? | | I haven't seen any such plans, but there is some awareness that | _Generic can be hard to use, especially as you try to compose | generic operations together. | yvdriess wrote: | +1 for the first point. Every major compiler can do the lambda- | lifting transformation, either because of C++ lambda or OpenMP | support. It's frustrating doing this manually while knowing the | compiler supports it internally, but does not expose it | natively. | beefhash wrote: | C has been making strides towards complete Unicode support. I've | been having trouble following along though: Am I correct in | assuming that there's no _actual_ multi-byte UTF-8 to UTF-32 Rune | function and the best approximation depends on whatever wchar_t | is? How would I best handle pure Unicode input and output | scenarios on a "hostile" OS whose native character encoding is | some EBCDIC abomination or a Windows codepage? | loeg wrote: | Probably link libicu rather than rely on libc. | rurban wrote: | libicu is a 40MB mess where you need only 5Kb of it. Only | case folding and one normalization is needed, with tiny | tables. | | Additionally the used UNICODE_MAJOR and _MINOR are needed. | They are always years behind, and you never know which tables | versions are implemented. | moonchild wrote: | Converting arrays of utf8-encoded char to arrays of | utf32-encoded 'rune' would probably not do what you want. That | still leaves e.g. combining diacritical marks as separate from | the characters they modify. If you care about breaking up text | into codepoints, you probably also care about that sort of | thing. The base unit of unicode is the extended grapheme | cluster. In order to actually convert text into extended | grapheme clusters, however, you need to have a database that | tells you what kind of codepoint each codepoint is. Since c is | standardized less frequently than unicode, any kind of unicode | or utf support from the specification would quickly get out of | date. | jasonhansel wrote: | Can/should the C language be extended to better support vector | processors and GPGPU? | hedora wrote: | I frequently rely on reading and writing uninitialized struct | padding in code that compare and swaps the underlying struct | representation with some (up to 128bit) integer. | | I could use a union type, but that adds extra memory operations, | and is finicky. | | Is there a better way? | parenthesis wrote: | Could we have variadic macros with zero arguments in the | standard? I'm not using any compiler that doesn't allow it. | pascal_cuoq wrote: | The C standard description does not allow a function that does | not have at least one normal argument before the variadic | arguments. | | Conceptually, something must indicate to the function how many | arguments it is supposed to request next, and with what types. | Yes, you could write a function where this information is | passed through a static-lifetime variable, but in practice the | first mandatory argument is almost always used for that anyway. | david2ndaccount wrote: | You're replying to a comment about macros, not about | functions. | emilfihlman wrote: | Have you considered adding multiplexing capability to the | standard? It would be great to have a directly portable one. | DougGwyn wrote: | We would need a specific proposal and assurance that nearly all | computers can efficiently provide that service. It is more | likely in the POSIX standard. | emilfihlman wrote: | Though it's interesting that threads were added to the | standard. Perhaps though they filled a niche that wasn't as | well filled as select/poll/epoll/kqueue/etc had already since | pthread api is perhaps harder. | DougGwyn wrote: | I thought it would be best to standardize just a single | thread, which should be the basic unit to be embedded in a | good parallel-processing model. However, others prevailed. | [deleted] | [deleted] | om42 wrote: | Not particular to the C language, but what are your opinions on | build systems, particularly for the embedded space? There's a | couple vendor specific embedded IDEs and toolchains and having to | glue together make/cmake files to support all of them can be a | pain. | msebor wrote: | Robert's upcoming book has a survey of a few popular IDEs. | asimpletune wrote: | Why is shifting by a negative amount undefined? | kps wrote: | Because people want `c = a << b` to compile into `shl c, a, b` | and C89 made the giant mistake of calling it 'undefined' | instead of 'implementation-defined, possibly fatal'. | hawski wrote: | What do you think about Zig language [0] and if you have any | opinions on it, what distinguishing features would you like to | see adopted in the C world? | | [0] https://ziglang.org/ | radford-neal wrote: | The syntax used in the following function definition is said to | be obsolescent in C11: | | int f (a, n) int n; int a[n][n]; { return a[n-1][n-1]; } | | How could one define this function without using the obsolete | syntax? | AaronBallman wrote: | You couldn't in that parameter order. However, you could do | this: int f(size_t n, int a[n][n]) { return a[n-1][n-1]; } | | (https://godbolt.org/z/DV9c-C) | | Btw, that definition was obsolescent in C89 too. | radford-neal wrote: | Well, yes. But putting the array argument(s) first is the | more natural order, in my opinion. And it is surely odd that | only one order is allowed in this context, when otherwise C | is happy with changing the order of parameters to be whatever | you like. | | Plus, of course, there may be existing code using such | functions, with parameters in the order that would become | impossible if this syntax were disallowed. | Daemon404 wrote: | What has been the rationale or hinderance for not adding locale- | independent versions of various stdlib functions? | | Practically every second C codebase on earth has their own | implementations of these at some point, and it remains a huge | problem for e.g. writers of libraries, where you don't know | how/where your library will be used. | msebor wrote: | First, there needs to be a proposal for adding a feature (I'm | not aware of one having been submitted recently). Second, any | non-trivial proposed feature needs to have some existing user | experience behind it. For libraries that typically means | implementations shipping with operating systems or compilers | (but successful third party libraries might also be | considered). Finally, it also needs to appeal to people on the | committee; that can be quite challenging as well. Many | proposals that meet the first two criteria die because they | simply don't get enough support within the committee. | Daemon404 wrote: | Sounds mostly like the issue is nobody has bothered to submit | a proposal for it then? (There is _so_ much in-the-wild | experience and code dealing with this issue, I cannot imagine | the second point being problematic.) | | On the third point, I have trouble thinking of any technical | objections to such proposal. | rwmj wrote: | To clarify, do you mean functions like c_isalpha (part of | Gnulib) which is like isalpha but only matches 7 bit ASCII | characters? | Daemon404 wrote: | An easy (and problematic) example is decimal separators | (radix characters) being parsed or written differently based | on locale. | loeg wrote: | Have any of you looked at the CHERI hardware architecture and fat | capability pointers, broadly? | Uptrenda wrote: | What would you say to people who claim that writing "secure C | code" is impossible [not me but I'm curious what you all think]? | AaronBallman wrote: | I'd ask them if they really meant "impossible" or just "harder | than I wish it was". | | I've typically found that the tradeoffs between security, | performance, and implementation efforts are usually more to | blame for why writing secure C code is a challenge. There are a | ton of tools out there to help with writing secure code | (compiler diagnostics, secure coding standards, static | analyzers, fuzzers, sanitizers, etc), but you need to use all | the tools at your disposal (instead of only a single source of | security) which adds implementation cost and sometimes runtime | overhead that needs to be balanced against shipping a product. | | This isn't to suggest that the language itself doesn't have | sharp edges that would be nice to smooth over, though! | mesaframe wrote: | How to become a compiler engineer if you don't have a degree in | CS? | axelf4 wrote: | In C89 is there a portable way to figure out the alignment | requirement for a struct, to be able to, say, store it after the | NUL terminator in the same allocation as a C string? | DougGwyn wrote: | I'm not sure what your requirement is. Usually things work out | if you're careful not to assume any specific value for | alignment etc. It may mean a few unused bytes here and there, | but keeping things simple and portable often pays off. | quelsolaar wrote: | Being able to know your alignments is VERY important for a | lot of network implementations. They are all defined by the | ABIs, but its very annoying that the standard keeps thinking | that alignment is unknowable, when in fact its impossible to | implement a ABI without defining it. One of the reasons I | stick to C89. | DougGwyn wrote: | Note that the ABIs cover endianness as well as value range | and/or object widths. In general, one needs to have | explicit marshaling and unmarshaling functions to map from | network octet array and C internal data representation. | Failure to get this right is (or used to be) a common bug | for code developed and tested on too few architectures. | quelsolaar wrote: | Sure, it wont be portable between any architectures, but | a lot of times you know you will be on a little endian | platform where types are aligned to their sizeofs. That | covers a lot of ground and the performance gains you get | from optimizing with this in mind is significant. There | is value in C being able to be portable, but there is | also a huge value in being able to write non-portable | code that takes advantage of what you know about the | platform. C needs to acknowledge that that is a | legitimate use case. | sgawlik wrote: | When you're looking at an unfamiliar C code base for the first | time, how do you approach it? Which files do you look for? Which | tools to you open up immediately? | rseacord wrote: | This depends a bunch on what your goals are. There are no | specially named files, so looking for a particular filename is | not particularly useful. It is sometimes informative to find | the file containing the main, but not always. | | My job at NCC Group involves a lot of code reviews, so | frequently the files that are of interest to me are the ones | that contain the most defects. I typically identify these by | compiling with compiler warnings turned up and warning | suppression turned down. I'll frequently also make use of | static and dynamic analysis, including the GCC and Clang | sanitizers. | jhallenworld wrote: | cscope can help | clarry wrote: | Is there a vim-style cscope interface for emacs? I hate that | xcscope brings up its own persistent buffers (replacing other | buffers that I had deliberately placed on the screen). Vim, | conveniently, just pops up the cscope interface when I need | to enter some input, and then hides it away. Also I don't | think xcscope works with evil's tag stack whereas in vim, I | believe, you can just return to where you were with ^T, | whether using ctags or cscope. | DougGwyn wrote: | Yes, I have found it helpful. One nice feature is that it | uses a character-terminal interface, not a platform-specific | GUI. | DougGwyn wrote: | It all depends on how organized previous workers were, and what | your goal is for a modification of the source text. Often, | headers (dot-h files) document the data structures and | interfaces. | loeg wrote: | I start with generating tags. exctags | --exclude=TAGS --exclude=TAGS.NEW --append -R -f TAGS.NEW | --sort=yes && mv TAGS.NEW TAGS | | My editor (vim) has native support for quickly jumping from a | use to definition via this TAGS index. History is preserved | (i.e., there is a "back" button), so you can quickly dive | through 5 layers of API and back out to understand where a | value went. It is quite useful for starting with what you know | and following it to the surprising behavior, without executing | the code. | mey wrote: | This is a subjective question. From the array of tools in your | belt, when do you personally/professionally reach for C, or maybe | more interestingly, when do you _not_ reach for C? | DougGwyn wrote: | Since I do almost all my software development in a Unix | environment, usually I check the toolbox to see if there is | already a program that has nearly the functionality I want, and | if so then I cobble together a shell script. Sometimes (as with | the Sudoku solver) it will be necessary to build a new | component, and for that I usually use C since I am comfortable | and experienced with it. (Also, if coded in Standard C, odds | are that I can install it on whatever platform I need, with | little or no adaptation.) | [deleted] | RandNOx wrote: | - Which differences between the C abstract machine and actual | modern CPUs/hardware have proven most difficult to deal with in | the language? | | - Are you planning any addition regarding modeling of how modern | CPUs work (e.g. pipelines, branches, speculative execution, cache | lines, etc)? | | PS: Thank you for doing this! | AaronBallman wrote: | > - Which differences between the C abstract machine and actual | modern CPUs/hardware have proven most difficult to deal with in | the language? | | For me, I think it's 'volatile' because, by its nature, you | can't describe what it means in the abstract machine very well. | For instance, consider a proposal to add something like a | "secure clear" function for clearing out sensitive data. The | natural inclination is to pretend that data is volatile so the | optimizer won't dead-code strip your secure clear function | call, but that leaves questions about things like cache lines, | distributed memory, etc. | | > - Are you planning any addition regarding modeling of how | modern CPUs work (e.g. pipelines, branches, speculative | execution, cache lines, etc)? | | Maybe? ;-) We tend to talk about features at a higher level of | abstraction than the hardware because hardware changes at such | a rapid pace compared to the standards process. So we largely | leave hardware-specific considerations as a matter of QoI for | implementers. | | However, that doesn't mean we wouldn't consider proposals for | more concrete things like a defensive attribute to help | mitigate speculative execution attacks. | 7532yahoogmail wrote: | pascal_cuoq - Pascal Cuoq is the Chief Scientist at TrustInSoft | and co-inventor of the Frama-C technology | | This looks to be a hell'va' good tool chain. I'm playing with as | of yesterday. | jpizza wrote: | Hello, | | First off thank you so much for taking the time to answer | questions. | | As a new programmer starting with C I am trying to learn how to | go from a beginner to an intermediate any recommendations of | projects to help learn C? | | It is difficult for me to find projects that I see are "valuable" | for a lack of a better term. | | Thank you! | DougGwyn wrote: | One possibility is to modify some existing program to include | an additional new feature. You should soon develop a sense for | what works well versus what causes problems. | emreiyican wrote: | I know this opinion is unpopular and contradict with a core value | of the C standardization committee but I personally think at some | point, C standard should abandon supporting the legacy codebase. | I think bool and stdint definitions should be available as part | of the standard feature set and shouldn't need including their | respective headers. These and some other features are available | at the core of every modern language but C, and C has to provide | them via other means. Is the sentiment of discontinuing legacy | support shared within the committee, by any proportion? | xyzzy2020 wrote: | Can't upvote enough. I think these changes could also be made | in a way that can be mechanically-translatable. | | For example: removing the register keyword, always requiring a | return statement, etc etc. | | A lot of changes can me made that will make static analysis | easier. | | There will always be people with 50 year old code bases that | will never change (and some c89 compiler will always be there | for them), but the language is pervasive enough that it | deserves progressive changes to make it (even) simpler and | safer and slightly more high level. | loeg wrote: | I'd love it if we could do away with all the headers. | | Just #include <stdc.h> and be done with it. No need to remember | stdio, stdint, stdbool, limits, assert, signal.h, etc, etc. | | This new header comes with a guarantee that use of identifiers | in the standard-reserved namespace will break your code. | Perhaps compilers could even enforce this preemptively. | DougGwyn wrote: | You can easily create your own stdc.h include file. Something | similar was done on Plan 9. | | Note that by including the content of all the headers, you're | increasing the chance for collisions with application | identifiers. You might consider that more of a benefit than a | drawback. | AaronBallman wrote: | We've started doing some things in this area, but I don't think | the committee would abandon legacy code bases entirely. | Instead, we try to make a migration path for code bases. | | For instance, we added the '_Bool' data type and require you to | include <stdbool.h> to spell it 'bool' instead and to get | 'true' and 'false' identifiers. This was done to not impact | existing code bases that had their own bool/true/false | implementation with those spellings. Now that "enough" time has | passed for legacy code bases to update, we're looking into | making these "first-class" features of the language and not | requiring <stdbool.h> to be included to use them. We're doing | the same for things like _Static_assert vs static_assert, etc | for the same reason. | tayistay wrote: | I'm no C expert, but my two wishes for C would be: | | - Basic type inference to reduce keystrokes, and prevent ripples | when changing types. (like auto in C++) | | - Equality operators defined for structs. Perhaps even | lexicographical comparison, if I'm dreaming. | | Any thoughts on either of those? | cyber1 wrote: | Ken Thompson, Rob Pike, Brian Kernighan, Russ Cox, Robert | Griesemer are guys who created Unix, B, C, Go, Utf-8, etc. Maybe | it will be useful to invite these guys(one of them) in the C | Standards Committee for help to improve and design new language | features? | rseacord wrote: | I think a lot of these dudes are retired. A lot of good C | people like P.J. Plauger, John Benito, and Clark Nelson have | all retired recently. Anyway, they are all invited back. As an | incentive, we typically have free coffee and snacks at most of | the meetings. :) | Tronic2 wrote: | char effectively behaves as a signed type, making it unsuitable | for binary operations (e.g. UTF-8 manipulation). I/O functions | deal with char pointers, so using unsigned type like uint8_t | requires casting back and forth. Is there any way out of this | problem, and am I already breaking the aliasing rules with that | cast? | emilfihlman wrote: | There are no aliasing differences between uint8_t and char as | far as I know. | hsivonen wrote: | In practice not. In theory, it's implementation-defined | whether yhere are differences. | emilfihlman wrote: | At least from what I've heard that's because stdint values | are optional. | | 6.2.5p17 The three types char, signed char, and unsigned | char are collectively called the character types. The | implementation shall define char to have the same range, | representation, and behavior as either signed char or | unsigned char. 48) | | and | | 5.2.4.2.1 says that width of char, signed char and unsigned | char are the same (8). | radford-neal wrote: | I don't think it's anything to do with uint8_t being | optional. It's because a char might have more than 8 | bits. | msebor wrote: | Casting between the three character types is safe and doesn't | violate aliasing rules. In addition, objects of all types can | be accessed by lvalues of any of the three character types | (though unsigned char is recommended), so there's no problem | there either. | | I/O functions that take a plain char* are designed to | interoperate with char arrays and strings, so passing in | unsigned or signed char is a sign that they aren't being used | as intended. (Functions that traffic in binary data like | fread/fwrite should take void*). | packetlost wrote: | I'm about a mid-level experienced developer, and have been | attempting to learn C via a few side projects. I come from mostly | Python and Go, which both have very robust standard libraries, so | I was quite surprised to find that string parsing is _very_ | poorly supported in C. Is there a reason that very common string | parsing cases are missing from the C stdlib? | WFHRenaissance wrote: | A bit off topic, but what are your views on Golang? I'm leaving | this pretty open-ended, but I'm curious how you see it | interacting with the C/C++ ecosystem in the future. | tridentboy wrote: | Know it's not exactly related to what you do. But do you have | some recommendations of books/online classes to learn C? | Koshkin wrote: | Why not keep C a simple little language with fast compile times | and delegate all "enhancements" (such as 'cleanup') to C++? | nchelluri wrote: | Hello, just a quick note; I wanted to buy the book so I went to | the website and when I picked my country as Canada it started | giving me a strange list of provinces (definitely not Canadian) | so I abandoned the process for now. | billpollock wrote: | I've asked our Operations Manager to look into this issue. | Thanks for bringing this to our attention. We'll get it sorted | out. Please email info@nostarch.com so that they can help | troubleshoot. | rseacord wrote: | I'll pass this on to the publisher.... | jfmc wrote: | C is a great low-level language to write the engines/runtimes of | other languages. | rudchenkos wrote: | Are any concurrency primitives planned for introduction in future | C revisions? | AaronBallman wrote: | We currently have not seen papers proposing to add new | concurrency primitives for C2x, but we have been actively | working on the concurrency object model and would welcome | proposals for new primitives or concurrency-related fixes. | | One goal is to re-unify C with the concurrency object model | used by C++ to make std::atomic<T> and _Atomic(T) be ABI | compatible as intended in C11. Some small fixes in this area | are the removal of ATOMIC_VAR_INIT, clarifying whether library | functions can use thread_local storage for internal state, and | things along those lines. However, we expect there to be more | efforts in this area as we progress the standard. | jpfr wrote: | C11 has seen new features, such as Generic Selection. Is the | current language standardization converging (just adding | clarifications, removing the surface for undefined behavior, | etc.) or is C still growing with new features? | | In other words, will the C standard be effectively "done" at some | time in the future? | msebor wrote: | Fixing minor bugs or inconsistencies and reducing the number | and kinds of instances of undefined behavior are some of the | efforts keeping the C committee busy. | | Reviewing proposals to incorporate features supported by common | implementations is another. | | Aligning with other standards (e.g., floating point) and | improving compatibility with others (C++) is yet another. | | In general, when an ISO standard is done it essentially becomes | dead. So for the C standard to continue to be active (on ISO's | books) it needs to evolve. | ken wrote: | It's interesting to hear the standardization perspective, | because it's pretty much the opposite of my perspective as a | user. | | I see the classic path of any programming language -- | regardless of standardization -- is to continuously add | features until it's too big and complex that nobody wants to | deal with it any more. Then it's replaced by a newer, simpler | language that takes the important bits and drops the | unnecessary complexities. At that point, everybody sees that | the older language was barking up the wrong tree, and they | stop wasting time on it. | | It's not the cessation of language change that _causes_ | language death -- that 's merely a symptom. You can't keep a | language alive simply by changing it every year. Some people | sure have tried. | | Alternatively, until it's evolved so much that there is so | much diversity of implementation that simply knowing a | library is written in "language X" doesn't tell me much about | how it's written, or whether I can use it in my program which | is also written in "language X". | | Then again, C is the exception to every rule, so maybe we can | keep piling on features indefinitely, and people will have to | use it (even if they don't like it), for the same reason they | started using it decades ago (even if we didn't like it). | rseacord wrote: | I would say no, that we are still adding new features. Aaron | Ballman was responsible for adding attributes to the C2x (he | can tell you more). We're also looking at #embed feature to | incorporate binaries the way that #include incorporates text. | rseacord wrote: | A full list of proposals to WG14 can be found here: | | http://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log... | | These papers are usually quite interesting. | ancarda wrote: | As a C newbie, will there ever be "safe" C, i.e. no undefined | behavior and help with writing code that has less memory related | crashes/bugs? For comparison, Rust has the `unsafe { }' block | which lets you mark regions of code as being able to do funky | stuff. Could we get the opposite for C, i.e. `safe { }' and for | an entire file, `#pragma safe'? | | I have a love-hate relationship with C - I like it for small | projects, but anything serious I really need to write it in a | more safe language. I think GCC has some flags that can help, and | I've been using tools like splint, but something baked into the | standard would be amazing. | sramsay wrote: | I'm pretty happy with C as it is, but I will admit to being | surprised that a "minimalistic Rust" hasn't risen to | prominence. | | I guess what I mean by that is a language that has Rust's | hyperactive, strongly opinionated compiler, borrow checker, no | NULL, immutable by default, etc, but in a language that is no | more syntactically ambitious that C89. I would be way more into | a language like that than Rust. | | A language that sort of _feels_ like Go, but can actually be | used for low-level systems programming. | Leherenn wrote: | I think it's going to arrive, but some time is needed to see | what works in Rust or not. D is going this way as well, so | should provide another data point. | modeless wrote: | Can you do anything to push Microsoft to implement recent C | standards? Their failure to fully implement even C99 in Visual | Studio is holding the language back. | AaronBallman wrote: | Not really -- vendors are free to ignore newer releases of the | standard that do not meet their customers needs and the | committee can't do much about it. | | However, as a user, you can help apply pressure on the vendor | to support newer standards. For instance, with Microsoft, you | could support this feedback request: | https://developercommunity.visualstudio.com/idea/387315/add-... | DougGwyn wrote: | There is little that the C Standards group can do about it. One | idea is to write a C Standards conformance into contracts. When | I was in the government we often did that, but it still wasn't | enough clout. | overfl0w wrote: | Can memory safety be ensured in the C programming language? By | static analysis at compile time for example? | [deleted] | pascal_cuoq wrote: | It is possible to guarantee that a C program does not have any | undefined behavior, which includes all the memory errors that | are often also security vulnerabilities. | | "Static analysis" may be the wrong name to classify the tools | that work in that area, because "static analysis" is usually | used for purely automatic tools, whereas the tools used to | guarantee the absence of undefined behaviors are not entirely | automatic except for the simplest of programs. | | Results of a static analyzer are often characterized in terms | of "false positives" and "false negatives". It is a possible | design choice to make an analyzer with no false negatives. It | is absolutely not impossible! (Some people think it is | fundamentally impossible because it sounds like a computer | science theorem, but it isn't one. The theorem would apply if | one intended to make an analyzer with no false positives and no | false negatives--and if computers were Turing machines.) | | Analyzers designed to have no false positives are called | "sound". In practice, this kind of analyzer may prove that a | simple program is free of Undefined Behavior if the program is | a simple example of 100 lines, but for a more realistic | software component of at least a few thousand lines, the result | will be obtained after a collaborative human-analyzer process | (in which the analyzer catches reasoning errors made the human, | so the result is still better than what you can get with code | reviews alone). | | Here is what the result of this collaborative human-analyzer | process may look like for a library as cleanly designed and | self-contained as Mbed TLS (formerly PolarSSL): https://trust- | in-soft.com/polarSSL_demo.pdf? | emilfihlman wrote: | Why isn't there a binary prefix in the standard? Like 0b0111010? | [deleted] | ocithrowaway wrote: | A couple of (I hope easy) requests - 1. Can we add separators in | constants (C++ does 0xFFFF'FFFF'FFFF'FFFF any other reasonable | scheme is fine too?) | | 2. I think many compilers already do this, but can the static | initialization rules be relaxed a bit? static | const int a = 0; static const int b = a; /* This is not | standard C afaik. */ | | Thank you, CodeandC | rightbyte wrote: | A binary literal would be nice too. Doing masks for embedded | systems makes my head hurt sometimes. "Cpp compatibility" etc | etc could be the excuse to implement it. | msebor wrote: | WG14 in general looks favorably at proposals to align C more | closely with C++ (within the overall spirit of the language) | and I'd expect (1) would viewed in that light. | | I'd also say there is consensus that (2) would be beneficial. | There are some good ideas in http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2067.pdf although I don't | think repurposing the register keyword for it was very popular. | Not just because it wouldn't be compatible with C++ which | deprecated register some time ago, but also because it's novel | with no implementation or user experience behind it. My | impression that this is waiting for a new proposal. | MaxBarraclough wrote: | Does the following code fragment cause undefined behaviour? | unsigned int x; x -= x; | | There's a lengthy StackOverflow thread where various C language- | lawyers disagree on what the spec has to say about trap values, | and under what circumstances reading an uninitialised variable | causes UB. I'd appreciate an authoritative answer. Thanks for | dropping by on HN! | | https://stackoverflow.com/q/11962457/ | pascal_cuoq wrote: | This example is clearly UB. | | You could argue that it suddenly becomes less UB if you take | the address of x: unsigned int x; &x; | x -= x; | | I'm not sure if this will add anything to the discussion on SO, | but if you allow programs to do this, then after applying | modern optimizing C compilers, you may end with multiplications | by 2 that produce odd results, or uninitialized char variables | that contain 500: | http://blog.frama-c.com/index.php?post/2013/03/13/indetermin... | | So the short answer is that, for all intent and purposes, you | should consider use of uninitialized variables as UB, because C | compilers already do. (There exists somewhere a document | clarifying what C compilers can and cannot do with | indeterminate values. A search for "wobbly values" might turn | it up. Anyway, you do not want to have wobbly values in your C | programs any more than you want it to have undefined behavior.) | MaxBarraclough wrote: | Interesting link, thanks. So then: | | * Under C90, reading an uninitialized local was explicitly | listed as UB. | | * Under C99, if you weren't using a character type, it was | still essentially UB, by way of trap values. (I don't think | the particulars of the target hardware platform are | relevant.) | | * C11 reintroduced UB even for some cases involving character | types. We were already invoking UB under C99, so we know | we're still invoking UB under C11. | | > You could argue that it suddenly becomes less UB if you | take the address of x | | I don't think so. As we're not using a character type, I | don't think taking its address would change anything. This | aligns with what msebor said. | | Lastly, from the article: > No, GCC is | still acting as if j *= 2; was undefined. | | I think GCC's behaviour is legal here. The target platform | may have no trap values, but I don't see that GCC is | prohibited from behaving as if there _are_. It would be legal | (albeit bizarre) for it to generate code for a completely | different ISA, and to bundle an emulator. If the spec says | you 've opened the door to UB, then unless your compiler | documentation says otherwise, it's permitted to generate code | that goes haywire, no? | msebor wrote: | Yes, it's undefined. It involves a read of an uninitialized | local variable. Except for the special case of unsigned char, | any uninitialized read is undefined. | emilfihlman wrote: | >Except for the special case of unsigned char, any | uninitialized read is undefined. | | Could you expand on this? | loeg wrote: | I'm guessing you were asking about this part rather than UB | in general: | | > Except for the special case of unsigned char, | | The SO article makes the bizarre claim that because | | (1) an unsigned char, per the standard, cannot have any | padding bits, it therefore cannot have a trap | representation. And | | (2) if it cannot have a trap representation, the use of an | uninitialized value isn't undefined. | | I'm willing to buy (1) but I don't remember (2) being | required for UB. I think (2) is the step that is harder to | follow intuitively. Admittedly, I have not read that part | of the standard closely in some time. | msebor wrote: | An object of any type, initialized or not, can be read by | an lvalue of unsigned char (or any character type). That | lets functions like memcpy (either the standard one or a | hand-rolled loop) copy arbitrary chunks of memory. | | There's some debate about the effects of reading an | uninitialized local variable of unsigned char (like whether | the same value must be read each time, or whether it's okay | for each read to yield a different value). | | This special exemption doesn't extend to any other types, | regardless of whether or not they have padding bits or trap | representations that could cause the read to trap. Few | types do, yet the behavior of uninitialized reads in | existing implementations is demonstrably undefined | (inconsistent or contradictory to invariants expressed in | the code of a test case), so any subtleties one might | derive from the text of the standard must be viewed in that | light. | MaxBarraclough wrote: | Thanks for your answers. A related question: this article | [0] appears to single out _memcpy_ and _memmove_ as being | special regarding effective type. Is it accurate? It | seems to be at odds with your suggestion that there 's | nothing stopping me writing my own memcpy provided I'm | careful to use the right types. | | [0] https://en.cppreference.com/w/c/language/object#Effec | tive_ty... | msebor wrote: | memcpy and memmove aren't special. The part that | discusses the copying of allocated objects is 6.5, p6, | quoted below: | | The effective type of an object for an access to its | stored value is the declared type of the object, if any. | If a value is stored into an object having no declared | type through an lvalue having a type that is not a | character type, then the type of the lvalue becomes the | effective type of the object for that access and for | subsequent accesses that do not modify the stored value. | If a value is copied into an object having no declared | type using memcpy or memmove, or is copied as an array of | character type, then the effective type of the modified | object for that access and for subsequent accesses that | do not modify the value is the effective type of the | object from which the value is opied, if it has one. For | all other accesses to an object having no declared type, | the effective type of the object is simply the type of | the lvalue used for the access. | MaxBarraclough wrote: | I see, so in short the article is failing to reflect this | excerpt: _or is copied as an array of character type_. | Thanks again. | AaronBallman wrote: | I think that may be inaccurate -- IIRC, in C, you can do | type punning via a union but not memcpy, and in C++ you | can do type punning via memcpy but not a union and this | incompatibility drives me nuts because it makes inline | functions in a header file shared between C and C++ | really messy. (Moral of the story: don't pun types.) | pascal_cuoq wrote: | The C standard also allows to use memcpy to do type | punning: If a value is copied into an | object having no declared type using memcpy or memmove, | or is copied as an array of character type, then the | effective type of the modified object for that | access and for subsequent accesses that do not modify the | value is the effective type of the object from | which the value is copied, if it has one | | Simply memcpy into a variable (as opposed to dynamically | allocated memory). | | https://port70.net/~nsz/c/c11/n1570.html#6.5p6 | AaronBallman wrote: | I must be remembering incorrectly then, thank you! | rseacord wrote: | Uninitialized Reads | https://queue.acm.org/detail.cfm?id=3041020 | [deleted] | ux wrote: | Is there any plan to deal with the locale fiasco at some point? | | Some hints on what I'm referring to can be found here: | https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02... | | Unrelated, but I also miss a binary constant notation (such as | 0b10101) | eqvinox wrote: | I haven't read most of that rant, but a thread-local | setlocale() would be a godsend. Not sure if that's ISO C or | POSIX though. | wahern wrote: | POSIX has added _l variants taking a locale_t argument to all | the relevant string functions. I can see how per-thread state | would be convenient, but it's not a comprehensive solution. | With the _l variants you can write your own wrappers that | pass a per-thread locale_t object. | r12477 wrote: | For binary constant notation, I have incorporated the following | macro into my projects: | https://gist.github.com/61131/009961b781f387ed1474ffaf19e375... | nickysielicki wrote: | In the same vein, I really like being able to use underscores | in binary and hex literals to denote subfields in hardware | registers. | | 0xDEADB_EEF | | 0b1_010_110111001001 | | etc. | jhallenworld wrote: | Should take Verilog binary construction syntax, like { | 12'd12, 16'hffee, 3'b101 } (or something similar that would | fit with C's syntax). | dirtydroog wrote: | Maybe not. | jhallenworld wrote: | Why not? If you have to combine bit fields now, it's a | mess of shifting and masking. | pascal_cuoq wrote: | Many C compilers offer, as an extension, the very binary | constant notation that you miss, as anyone who has worked on | the front-end of a C static analyzer would tell you. | ux wrote: | Yes I'm aware. But we can agree it would be welcome in the | standard, isn't it? | pascal_cuoq wrote: | Yes, if only so that we (as a category) do not have to | discover it exists when already facing C programs that use | it. | OnACoffeeBreak wrote: | I know that we're not voting, but I miss a binary literal very | much. I would also like a literal digit separator to improve | readability. Verilog Hardware Description Language does that | with an underscore [1]. For example, 0xad_beef to improve | readability of a hex literal, and 0b011_1010 to improve | readability of a binary literal. | | 1: http://verilog.renerta.com/mobile/source/vrg00020.htm | jfkebwjsbx wrote: | If they pick this up, they will likely use C++'s | syntax/rules. | magicbanana wrote: | Is there a chance to ever see C++-template-like features appear | in C? | | For instance, a lot of redundant code (or ugly macro business) | could be neatly replaced by function templates. Even just | template functions with only POD values allowed would be a great | readability improvement. | MiKom wrote: | It's already there. It's called C++ templates | pantalaimon wrote: | Will C eventually get something like C++' constexpr? | AaronBallman wrote: | C has some basic support for constant expressions already, but | there has not yet been a proposal to bring 'constexpr' over | from C++. Personally, I would _love_ this feature to be in C! | loeg wrote: | You and me both! | dktoao wrote: | This is really the only thing that I really want from C++. It | would be amazing if this could make the cut for a future spec. | | EDIT: I work on embedded systems, where C is king, and it seems | like a spend an inordinate amount of time working with code | generators that build simple tables. All of which could go away | with this feature. | pornel wrote: | Would you consider adding a built-in way to safely multiply two | numbers? | | Numeric overflows in things like calculation of buffer sizes can | lead to vulnerabilities. | | Signed overflow is UB, and due to integer promotion signs creep | in unexpected places. | | It's not trivial to check if overflow happened due to UB rules. A | naive check can make things even worse by "proving" the opposite | to the optimizer. | | And all of that is to read one bit that CPUs have readily | available. | DougGwyn wrote: | There are a lot of arithmetic conditions for which C could | generate special code. There are div_t-related functions for | the other direction. I for one would like a good way to obtain, | using some Standard C coding pattern, fast "carry" for | multiple-precision integer arithmetic. | | Several places in support functions, I have coded unusually to | avoid wrap-around etc. I bet you could devise something like | that for (unsigned) multiplication. | freemind wrote: | 1. What is the easiest way to build cross-platform (native) GUI | with C? | | 2. Why it is harder to find lgpl licenced libraries to access | windows directories over network like jcifs pysmb (and libraries | overall) when needed to close most part of software source to | sell small softwares to businesses? | | 3. If you needed to combo C with another language to do | everything you need to do forever and never look back what other | language would that be? | DougGwyn wrote: | Back from lunch. Any West Coasters? | hsivonen wrote: | Does the committee have any plans to document the rationale for | each kind of Undefined Behavior? | | Does the committee have any plans to make NULL pointer arguments | to memcpy non-UB when the size argument is 0? | AaronBallman wrote: | > Does the committee have any plans to document the rationale | for each kind of Undefined Behavior? | | In the C99 timeframe, we had a rationale document that was | separately maintained. My understanding (this predates my | joining the committee) is that this was prohibitively labor- | intensive and so we stopped doing it for C11. I don't know of | any plans to start doing this again, even in a limited sense | for justifying UB. That said, we do spend time considering | whether an aspect of a proposal requires UB or not, so the | rationale exists in the proposals and committee minutes. | | > Does the committee have any plans to make NULL pointer | arguments to memcpy non-UB when the size argument is 0? | | I have not seen such a proposal, and suspect that | implementations may be concerned about losing their | optimization opportunities from such a change. (Personally, I'd | be okay losing those optimization opportunities as this does | not seem like a situation where UB is necessary.) | natch wrote: | Can you please repeat this AMA at a later date and at a time of | day when people on the west coast of the USA are awake? | Alternatively, please keep it going for a few hours if you would | be able to be so generous with your time! Thank you for doing | this! | | Do you also answer questions about the standard libraries? This | is not so much a C question as a library question: | | I'm wondering if Apple's Grand Central Dispatch ever made it into | a more integrated role in C's libraries, or if it will forever | remain an outside add-on. And whether there is anything else at | that level (level in the sense of high versus low level) in the | standard libraries that plays such a role, that I should read up | on instead of GCD. | AaronBallman wrote: | > Alternatively, please keep it going for a few hours if you | would be able to be so generous with your time! | | We're remaining active while there are still people asking | questions, so the west coast folks should hopefully have the | chance to ask what they'd like. | | > Do you also answer questions about the standard libraries? | | Sure! | | > I'm wondering if Apple's Grand Central Dispatch ever made it | into a more integrated role in C's libraries, or if it will | forever remain an outside add-on. | | GCD has not been adopted into C yet, and I don't believe it's | even been proposed to do so by anyone (or an alternative to | GCD, either). | | It would be an interesting proposal to see fleshed out for the | committee, and there is a lot of implementation experience with | the feature, so I think the committee would consider it more | carefully than an inventive proposal with no real-world field | experience. | wahern wrote: | GCD relies on Blocks (closures) for ergonomics, and Blocks | have been proposed to WG14, for example N1451: | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1451.pdf | DougGwyn wrote: | Some simple instructions about how to use a thread for | conversation would be appreciated. Thanks! | rseacord wrote: | Nothing to it! Just hit the reply button on comments you want | to respond to. You can also upvote anything you like by | clicking on the up arrow to the left of the comment. | DougGwyn wrote: | Okay, is there a starting thread for today's C Experts panel? | I miss the old net newsgroups. | dang wrote: | The thread is | https://news.ycombinator.com/item?id=22865357, which is the | page you've been posting to. It's now listed on the front | page of the forum, https://news.ycombinator.com/, which is | a list of the stories people have upvoted today. | | You're not the only person who misses the old newsgroups! | The format that Hacker News uses is one that became sort of | standard on the web in the early 2000s. It works | differently than usenet did, but you get threaded comments | in the sense that replies are nested under the posts | they're replying to. | pascal_cuoq wrote: | There are very little formatting options when writing posts, | for better or for worse: https://news.ycombinator.com/formatdoc | stwcx wrote: | One feature of C which I do not use often is enums. Support for | constants beyond the range of an int is not portable. And I also | try to avoid is putting enums inside structs, because there is no | portable way to enforce the size or the alignment of the enum's | base type. | | Will this be addressed in future revisions of the C standard? | iamed2 wrote: | What's an example of a codebase where _Generic has had a notable | positive impact? | AaronBallman wrote: | Not necessarily a code base, but _Generic is what makes | <tgmath.h> implementable for the type-generic math functions. | teleonorax wrote: | What's up with `strlcpy` and `strlcat`? Are they getting | standardized? | AaronBallman wrote: | We've been considering proposals to add common POSIX APIs into | C, but I don't believe we've seen a proposal for strlcpy or | strlcat yet. I recall we agreed to add strdup to C given its | wide availability and usage. | DougGwyn wrote: | There are deficiencies in almost all proposals. Two new | functions which avoid the problems are supposed to be | published in C202x: strcasecmp and strncasecmp, added in | header strings.h (note: not string.h). | sramsay wrote: | strdup seems like a perfect example of "standardizing | existing practice." And it has never struck me as running | against the spirit of C. | DougGwyn wrote: | In fact I proposed strdup on a few occasions, but it wasn't | adopted. It seems that they didn't like for standard | library functions to use malloc. POSIX.1 specifies strdup. | rseacord wrote: | No one has proposed making these standard. I doubt they would | gain much support as they are similar to the Annex K Bounds | Checked Interface functions strcpy_s and strcat_s but not quite | as good IMHO. | teleonorax wrote: | > similar to the Annex K Bounds Checked Interface functions | strcpy_s and strcat_s but not quite as good IMHO. | | Err... I thought Annex K is deprecated and dead? Whereas | strl* seem very much alive, some compilers even give a | "strcpy/strncpy is unsafe, use strlcpy instead" warning. | AaronBallman wrote: | FWIW, Annex K is not currently deprecated. | eqvinox wrote: | It's not commonly available though, e.g. on Linux/BSD | systems... | AaronBallman wrote: | Correct -- it would be nice if the glibc maintainers | would reconsider their opinion of supporting the optional | Annex K functionality. There is definitely user demand | for the feature. | eqvinox wrote: | > _rseacord 22 minutes ago [-]_ | | > _The C Committee has taken two votes on this, and in | each case, the committee has been equally divided. | Without a consensus to change the standard, the status | quo wins._ | | The fact that it has only survived on status quo is a | pretty crass hint that things aren't well with Annex K. | beefhash wrote: | _And_ every BSD out there. _And_ whatever it is that | macOS does. Microsoft looks to be the outlier to me. | loeg wrote: | Microsoft does not even implement Annex K. | | > Microsoft Visual Studio implements an early version of | the APIs. However, the implementation is incomplete and | conforms neither to C11 nor to the original TR 24731-1. | | > As a result of the numerous deviations from the | specification the Microsoft implementation cannot be | considered conforming or portable. | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm | loeg wrote: | It should be. | loeg wrote: | Well, I am informally proposing making those standard :-). | | IMO they're a lot more ergonomic than the Annex K functions, | and do the thing most programmers think the strncat/strncpy | functions do (admittedly, not part of ISO C). | | Annex K should be forgotten as the mistake it is and we can | move on with existing real-world interfaces instead of | inventing features from whole-cloth. I thought that was | generally the C standard operating practice. | rseacord wrote: | There were a number of recent proposals to adopt various | POSIX functions by Martin Sebor into C including: | N2353 2019/03/17 Sebor, Add strdup and strndup to C2X | N2352 2019/03/17 Sebor, Add stpcpy, and stpncpy to C2X | N2351 2019/03/17 Sebor, Add strnlen to C2X | | He is lurking on this thread as well. These proposals can all | be found in the document log at http://www.open- | std.org/jtc1/sc22/wg14/www/wg14_document_log... | rseacord wrote: | The results (from the minutes http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2377.pdf) | | 6.33 Sebor, Add strnlen to C2X [N 2351] Result: No | consensus on putting N2351 into C2X. | | 6.34 Sebor, Add stpcpy, and stpncpy to C2X [N 2352] Result: | No consensus to put N2352 into C2X. | | 6.35 Sebor, Add strdup and strndup to C2X [N 2353] Result: | N2353 be put into C2X. The committee wants a proposal for | the wide character versions of any POSIX functions voted in | this meeting. | rmind wrote: | There have been some disagreements on strlcpy/strlcat (BSD | vs glibc crowd), although by now the debate has died off | and these functions are pretty widely used. Also, while | here, it would be lovely to have strchrnul() included. | loeg wrote: | glibc still refuses to add the functions because they are | not required by a standard. | orsenthil wrote: | Why is still the learning curve for C so high? | | * Why can't the learning curve be solved using tools? * Why don't | we actively promote more higher level languages which are | implemented in C (by fewer people)? | NickDunn wrote: | I think that C provides fewer layers of abstraction than other | languages. This requires the programmer to deal with memory | management, treat strings as an array of characters, and other | things that the majority of high-level languages conceptualise, | so that the human mind deals with it more easily. This provides | advantages and disadvantages, as it requires more thought and | understanding to write the code but also allows the use of low- | level features. The lack of tools to solve any learning issues | is probably down to the programmer needing the right conceptual | understanding and the requirements placed on anyone using the | various features of the language. | Tronic2 wrote: | Syntax of pointers. Easy to use high level languages make | extensive use of pointers (i.e. all their variables are | actually pointers) but beginners cope with them because no | stars or ampersands are required, with the help of GC. Of | course they'll get bitten soon and often because it is too easy | to create copies of pointers rather than copies of full data | structures, and without understanding pointers it's hard to | grasp why that happens. | bumblebritches5 wrote: | I taught myself C from just reading code and trying to | contribute to a few projects right out of high school, no | books, no school. | | So I don't think C has a very high learning curve, C++ on the | other hand... | pantalaimon wrote: | Do you find the learning curve for C to be high? I find it | quite the opposite. It's a simple language with only a few | concepts to learn, once you got those, that's it. There might | be some preprocessor tricks you'll pick up later, but the base | language and library is pretty comprehensive IMHO. | throw_m239339 wrote: | > It's a simple language with only a few concepts to learn | | I mean by that logic, Assembly could be deem even simpler, | yet writing OR reading programs in Assembly is absolutely not | simple at all. | | At the end of day, one has to write programs that solve | (complicated) problems, and learning how to do that in C is | difficult, thus the learning curve deemed higher when it | comes to writing professional C. | | I can guarantee you that writing professional Go or Java and | writing correct programs in both takes way less effort than | with C, for use cases that would make Go or Java viable. | quelsolaar wrote: | Modern assembly language has a huge set of instructions, | that make them hard to learn, but the concept is still easy | to learn. | DougGwyn wrote: | Many antique computers are simulated by SIMH. If you have | the corresponding software, you can operate on your | desktop a simulated computer's software development | system. For example, DEC VAX (VMS or Unix) has a | relatively simple and sane assembly language. | quelsolaar wrote: | I think, learning a tiny bit of assembler, even if in an | emulator, is very valuable to teach the basics. | orsenthil wrote: | C is indeed a very small language. But the expressive power | of C for real-world problems brings a huge learning curve in | terms of organization, tracking, and understanding. | darepublic wrote: | Coming from python/js I found it to be high. Mostly because | of the memory management/ making sure I call free correctly | etc. In many cases where I would plow ahead in programming, | with C I had to stop and would feel dread. A lifesaver for me | was using C/C++ Repl environments where I could quickly | prototype or sanity check things I was doing. | pantalaimon wrote: | The trick is to just not use `malloc()` and `free()` unless | absolutely necessary ;) | throw_m239339 wrote: | > The trick is to just not use `malloc()` and `free()` | unless absolutely necessary ;) | | The problem is that often C programmers have to deal with | API and libraries they didn't write themselves to solve | their problems, thus are forced to use constructors and | destructors even when they don't want to. | rwmj wrote: | Not a question, a request: Please make __attribute__((cleanup)) | or the equivalent feature part of the next C standard. | | It's used by a lot of current software in Linux, notably systemd | and glib2. It solves a major headache with C error handling | elegantly. Most compilers already support it internally (since | it's required by C++). It has predictable effects, and no impact | on performance when not used. It cannot be implemented without | help from the compiler. | rseacord wrote: | My idea was to add something like the GoLang defer statement to | C (as a function with some special compiler magic). The | following is an example of how such a function could be used to | cleanup allocated resources regardless of how a function | returned: int do_something(void) { FILE | *file1, *file2; object_t *obj; file1 = | fopen("a_file", "w"); if (file1 == NULL) { | return -1; } defer(fclose, file1); | file2 = fopen("another_file", "w"); if (file2 == NULL) | { return -1; } defer(fclose, file2); | obj = malloc(sizeof(object_t)); if (obj == NULL) { | return -1; } // Operate on allocated resources | // Clean up everything free(obj); // this could be | deferred too, I suppose, for symmetry return 0; | } | eqvinox wrote: | Cleanup on function return is not enough, it needs to be | scope exit. We're using this for privilege raising/dropping | (example posted above) and also mutex acquisition/release. | Both of these really "want" it on the scope level. | rwmj wrote: | Golang gets this wrong. It should be scope-level not | function-level (or perhaps there should be two different | types, but I have never personally had a need for a function- | level cleanup). | | Edit: Also please review how attribute cleanup is used by | existing C code before jumping into proposals. If something | is added to C2x which is inconsistent with what existing code | is already doing widely, then it's no help to anyone. | rseacord wrote: | Yes, we have discussed adding this feature at scope level. | A not entirely serious proposal was to implement it as | follows: #define DEFER(a, b, c) \ | for (bool _flag = true; _flag; _flag = false) \ | for (a; _flag && (b); c, _flag = false) int | fun() { DEFER(FILE *f1 = fopen(...), (NULL != f1), | mfclose(f1)) { DEFER(FILE *f2 = fopen(...), | (NULL != f2), mfclose(f2)) { DEFER(FILE *f3 = | fopen(...), (NULL != f3), mfclose(f3)) { | ... do something ... } } } | } | | We are also looking at the attribute cleanup. Sounds like | you should be involved in developing this proposal? | rwmj wrote: | Yes, I'll ask around in Red Hat too, see if we can get | some help with this. | AaronBallman wrote: | Funny you should mention that, as that feature has come up | recently in mailing list discussions. We have not seen an | actual proposal for adopting it yet, but features similar | semantics are being discussed as a possible idea (no promises). | | FWIW, I don't think it would wind up being spelled with | attribute syntax because we would likely want programmers to | have a guarantee that the cleanup will happen (and attributes | can be ignored by the implementation). | rwmj wrote: | I believe the last proposal was in 2008 (ignore the | try..finally stuff here): http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n1298.pdf | | So I guess it needs someone to take that and update it, also | to pull up a full list of current Linux software which is | using this feature (which as I say these days is a surprising | amount). | eqvinox wrote: | Here's our usage: https://github.com/FRRouting/frr/blob/mas | ter/lib/privs.h#L14... #define | frr_with_privs(privs) | \ for (struct zebra_privs_t *_once = NULL, | \ *_privs | __attribute__( \ | (unused, cleanup(_zprivs_lower))) = \ | _zprivs_raise(privs, __func__); \ | _once == NULL; _once = (void *)1) | | This gives us a block construct that guarantees elevated | privileges are dropped when the block is done: | frr_with_privs(privs) { ... whatever ... | break; /* exit block, drop privileges */ return; | /* return, drop privileges */ } | rwmj wrote: | We have a nice macro for acquiring locks that only | applies to the scope: | | https://github.com/libguestfs/nbdkit/blob/e58d28d65bfea3a | f36... | | You end up with code like this: | | https://github.com/libguestfs/nbdkit/blob/e58d28d65bfea3a | f36... | | It's so useful to be able to be sure the lock is released | on all return paths. Also because it's scope-level you | can scope your locks tightly to where they are needed. | loeg wrote: | We use it extensively in our proprietary codebases as well, | FWIW. Not real open data for me to point to, but: a few | million lines of C, and a handful of billion USD in | revenue. If that helps weigh in on "yes, please standardize | this common practice." | eqvinox wrote: | Hopefully it'd at least be syntactically similar, so we can | have an #ifdef __STDC_CLEANUP__ #define | my_cleanup(func) stdc_cleanup(func) #else #define | my_cleanup(func) __attribute__((cleanup(func))) #endif | | i.e. it would require that it at least goes in the same | places as an attribute. | neop1x wrote: | In my opinion C is good as it is. C++ is terrible complicated | mess, always have been and adding more and more "modern" | functionality isn't helping it much. There are great standard | functions, e.g. for strings in C, whereas it is often very | inconvenient or complicated to do simple things like uppercase | string in C++. I always ended up basically using C with just | basic OOP functionality from C++. But I am not writing in C/C++ | daily so my opinion is not very important... | eska wrote: | There's a compiler attribute in GCC to promise that a function is | pure, i.e. free from side effects and only uses its inputs. | | This is useful for parallel computations, optimizations and | readability, e.g. sum += f(2); sum += | f(2); | | can be optimized to x = f(2); sum += x; | sum += x; | | Would the current motto of the consortium forbid adding a feature | such as marking a function as pure, that would not just promise, | but also enforce that no side effects are caused (only local | reads/writes, only pure functions may be called), and no inputs | except for the function arguments are used? | oh_sigh wrote: | sum = 2*f(2) seems nicer than having sum= twice. | | If you were enforcing this with the compiler, you would also | need something that would suppress the enforcing, because the | millions of pre-existing functions would probably not get an | updated attribute marking it as pure. And once you do that, the | compiler can't really trust anything that function does, | because it may actually be calling a non-pure function. | pascal_cuoq wrote: | If you wrote down your proposal, which the C committee member | Robert Seacord is encouraging you to do here: | https://news.ycombinator.com/item?id=22870210 , you would have | to think carefully about functions that are pure according to | your definition (free from side effects and only uses its | inputs) but do not terminate for some inputs. | | There is at least one incorrect optimization present in Clang | because of this (function that has no side-effects detected as | pure, and call to that function omitted from a caller on this | basis, when in fact the function may not terminate). | temac wrote: | I thought the compiler was free to pretend loops without side | effects always terminate, and in that sense it is already a | "correct" optimization? Or is it only for C++, I'm not sure? | pascal_cuoq wrote: | That may be the case in C++, but in C infinite loops are | allowed as long as the controlling condition is a constant | expression (making it clear that the developper intends an | infinite loop). These infinite loops without side-effects | are even useful from time to time in embedded software, so | it was natural for the committee to allow them: | https://port70.net/~nsz/c/c11/n1570.html#6.8.5p6 | | And you now have all the details of the Clang bug, by the | way: write an infinite loop without side-effects in a C | function, then call the function from another C function, | without using its result. | kazinator wrote: | No enforcing! This is useful even when it's, strictly speaking, | a lie. | | Suppose I want to add some debug tracing into f(): | f.c: 42: f entered f:c: 43: returning 2 | | that's a side effect, right? But now the pure attribute tells a | lie. Never mind though; I don't care that some calls to f are | "wrongly" optimized away; I want the tracing for the ones that | aren't. | | In C++ there are similar situations involving temporary | objects: there is a freedom to elide temporary objects even if | the constructors and destructors have effects. | | Even a perfectly pure function can have a side effect, namely | this one: triggering a debugger to stop on a breakpoint set in | that function! | | If a call to f(2) is elided from some code, then that code will | no longer hit the breakpoint set on f. | | Side effect is all P.O.V. based: to declare something to be | effect-free in a conventional digital machine, you have to | first categorize certain effects as not counting. | gbear605 wrote: | Just offer a -Wpure flag for checking if functions are pure. | That way production/test releases can check while you can | still use it for debugging. | | Also, the problem with eliding breakpoints already exists | afaik, since the compilers already check for pure functions. | BeeOnRope wrote: | When deciding on the behavior of some operation that maps to | hardware [1], how do you weight the existing hardware behaviors? | | For example, if all past, current and contemplated hardware | behaves in the same way, I assume that the standard will simply | enshrine this behavior. | | However, what if 99% of hardware behaves one way and 1% another? | Do you set the behavior to "undefined" to accommodate the 1%? At | what point to you decide that the minority is too small and | you'll enshrine the majority behavior even though it | disadvantages minority hardware? | | --- | | [1] Famous examples include things like bit shift and integer | overflow behavior. | rseacord wrote: | I would say that the committee does pay attention to hardware | variations, even when there are no examples of existing | hardware that implement a feature (for example, a trap | representation for integers other than _Bool). Some of the | thinking is that "if it was ever implemented in hardware, it | could be again). I'm not crazy about this thinking, and I | largely think that language features for which there are no | existing hardware implementations should be eliminated and then | brought back if needed. However, the C Committee is much | smaller than the C++ committee so there is a labor shortage. | More people getting involved would certainly help. | | We have dropped support for sign and magnitude and one's | complement architectures from C2x (a decision Doug Gwyn does | not agree with). There was some concern that Unisys may still | use a one's complement architecture, but that this may only be | in emulation nowadays. | rseacord wrote: | Some example of hardware variation (since you mentioned | shifting and overflow): | | - signed integer overflow or division by zero occurs, a | division instruction traps on x86, while it silently produces | an undefined result on PowerPC - left-shifting a 32-bit one | by 32 bits yields 0 on ARM and PowerPC, but 1 on x86; - left- | shifting a 32-bit one by 64 bits yields 0 on ARM, but 1 on | x86 and PowerPC | BeeOnRope wrote: | On x86 it's actually mixed: scalar shifts behave as you | describe, but vectorised logical shifts flush to zero when | the shift amount is greater than the element size! | | So x86 actually has both behaviors in one box (three | behaviors if you could the 32-bit and 64-bit scalar things | you mentioned separately). | | This is an example of where UB for simple operations | actually helps even on a single hardware platform: it | allows efficient vectorization. | loeg wrote: | A good example might be 1's complement signed integers. They | were dead weight in the standard for a long time. | BeeOnRope wrote: | Yes, but that is a slightly different question: how long you | do you keep something in the standard after all the relevant | hardware has disappeared, e.g,. is there a framework for | periodically re-evaluating decisions in light of the changing | hardware landscape. | | My question was more about when behavior is being defined for | the first time, which admittedly doesn't happen that often | (but it could apply e.g., when thing fixed-width integer | types, uintX_t and friends were introduced). | DougGwyn wrote: | Original standard feature specifications were not meant to | obtain a 1-to-1 map from C onto hardware, but we used | practical experience to judge what overhead was acceptable | for the kinds of processors we had seen or thought were | reasonable choices that the architects might make in the | not too distant future. If a frequently-executed action had | to (for example) check for a special condition every time, | the overhead might increase by several percent, depending | on the instruction set architecture. So quite often we | argued that "if the programmer wants to test for that | condition, he can do so, but typically it is a waste of | cycles". There are a lot of such trade-offs; maybe we | should write a paper or book on this topic. | oreally wrote: | About time someone advocated for code in lower level styles of | programming. Hope it goes well! | | Anyway, here's some questions: | | - What kind of programs would you say C is a good fit for? | | - There is some catching up to do for C. Is there a roadmap for C | improvement, or even a recommendation of C++ things that fit | somewhat in the style/philosophy of C? For example, I'd recommend | not using the C++ smart pointers stuff, while still using C++ | threads and lambdas. | | Also, you should include programmers from other fields in your | committee. Game (engine) developers, HFT programmers are used to | lower level styles of coding and align with your perspective. | 0xDEEPFAC wrote: | Dear god, is the precedence of the "&" operator ever going to be | fixed? | rseacord wrote: | I can't imagine it will ever be changed, since this would be a | breaking change to the language. | 0xDEEPFAC wrote: | I disagree that this would be a "breaking" change as many | people have already resorted to using extra () and such a | change might actually may "fix" broken code which makes the | reasonable assumption that things like == have a higher- | order. | | https://ericlippert.com/2020/02/27/hundred-year-mistakes/ | | int x = 0, y = 1, z = 0; | | int r = (x & y) == z; // 1 | | int s = x & (y == z); // 0 | | int t = x & y == z; // 0 UGH | DougGwyn wrote: | If you're using parentheses, as has been recommended for | decades, there is no problem. Otherwise, it is likely that | such a change would adversely impact previously working | code. There just isn't a pressing need to change it. | 0xDEEPFAC wrote: | Besides the fact that its unintuitive and could lead to | low-level or hard-to-find bugs? | | It seems to me that C would benefit greatly to iron over | its many inconsistencies and exactly the kind of thing | people expect in new revisions of the language. | | Also, I dont see how it would impact previous working | code when compilers already do things like allow | selections between versions of languages a la C99, C2x, | etc. Users could just avoid the new version if they don't | feel like changing. | DougGwyn wrote: | I don't think most users of C want things changing | underfoot. Keeping track of all the version combinations | is infeasible, especially when you consider that an app | and its library packages are likely to have been | developed and tested for a variety of environments. To | the extent that existing correct code has to be scanned | and revised when a new compiler release comes out, one of | the primary goals of standardization has failed. | 0xDEEPFAC wrote: | I disagree with your view of standardization - as | restricting changes to be additions to the runtime seems | pointless as users could easily use other (often more | optimized) libraries. | | But, I do see the benefit of having a language "frozen in | time" which never really changes and can be mastered | painlessly without having to refresh on new versions. | Perhaps C is special/sacred in this regard. | [deleted] | watergatorman wrote: | Some random thoughts: | | I appreciate the original simplicity of K & R, "The C Programming | Language", 2nd Edition, and the relatively simple semantics of | ANSI C89/ISO C90 compared to C99 and later. | | You don't need complex parsing methods for ANSI C89/ISO C90 and | you do not need the "lexer hack" to handle the typedef-name | versus other "ordinary identifier" ambiguity. | | A surprising number of colleges still teach K & R 2nd Edition C. | | Whenever someone brags about using recursive-descent parsing | methods, I always ask, are they using predictive, top-down | parsing, or back-tracking? | | I hope C never loses sight of it's roots nor morphs into C++ | under the guise of creating a common subset, but which is really | a disguised superset of C and C++ | | Please prevent the ever increasing demand for new features from | overwhelming C's simplicity so it can no longer be parsed with | simple methods. | zabana wrote: | Is it worth it to learn C in 2020 ? Will it still be a prominent | language for systems programming in the future ? | rseacord wrote: | C also has renewed interest around IoT programming and mobile | devices | wolf550e wrote: | I believe C will continue to be used as lingua franca after no | one uses it to write software, and we're decades from even that | point. | | You need to know enough C to interface with the OS, and enough | C to talk about memory layout, memory management, dynamic | libraries, ABI, etc. | | Most higher language runtimes need C, even with a self hosting | compiler. Not being able to work on the C parts is limiting. | | You also need to know enough assembly to be able to understand | what the compiler did with your own code, even if you never | write assembly yourself. Not being able to compare the | disassembly to the high level language to understand why it | doesn't work (or is order of magnitude slower than expected) is | limiting. | stephencanon wrote: | Yes. | | - Languages like Rust will gain more mindshare over the next | decade, and be used in more and more new projects, but there | are billions of lines of existing code in C, and those aren't | going away. | | - Hardware architects, for better or worse, largely think about | software in terms of [a somewhat dated and idealized mental | model of] C. So if you want to be able to converse with | architects (which anyone doing systems programming should want | to do), you need to have some basic fluency with C. | wcarey wrote: | I'm teaching C to high schoolers as their first language, which | is quite the adventure. Do you have any good advice or resources | on how to introduce the way C treats the function stack and heap | allocated memory? Most of my students struggle (naturally) with | making sense of function scoped identifiers and pass-by-value | semantics. | pascal_cuoq wrote: | This service has been designed to try out small self-contained | C examples online (in a manner reminiscent of Compiler | Explorer): | | https://taas.trust-in-soft.com/tsnippet/ | | One advantage is that it identifies a LOT of undefined | behaviors during execution for which traditional compilation | and execution only give puzzling results. | | One drawback is that some of the undefined behaviors it | identifies are obscure, and for others the message may be | unusual. For instance, using a standard function without | including the appropriate header may result in a warning about | the mismatch between the type in the header and the type of the | arguments the standard function was applied to after arguments | promotions. | | Overall, you may still find it useful for teaching. | wcarey wrote: | Thanks! Definitely an interesting tool. Two of my students | are fascinated by the idea of undefined behavior right now | (having run into it in practice; the idea that off-by-one | errors sometimes crash their program and sometimes behave | "normally" is really odd to them), so I'll point them at this | to play with. | imglorp wrote: | Curious what were the requirements to select C as a first high | school language over many other choices? I imagine there's a | balance of practicality (after the class), and then the usual | questions about tooling, sharp edges, and ease of learning. | wcarey wrote: | It's a three year rotation: Python, C (Unix), C (Arduino). My | goal with the class is to teach ideas that will stand the | test of time. C (and Unix) certainly fit that bill. | | Happily, the tooling is the easiest part. Every student has a | rasberry-pi running debian, no mouse, no window server, and | no extraneous software. You can spool kids up on a nano-based | C toolchain in one class period with remarkably few sharp | edges. There's even some fun accidental learning the first | time they nano their executable file. | jayp1418 wrote: | Have you given ada language a thought ? Also there are lot of | competition your students can take part in | https://www.makewithada.org/ | wcarey wrote: | I haven't - is there a good tool chain you'd recommend me | checking out? What's the enduring idea in ada? | DougGwyn wrote: | Everybody seems to draw pictures of the raw memory (word- | oriented) data. | wcarey wrote: | I've been doing the same! It certainly helps for strings. | Pointer block diagrams (like K&R use) seem to help too. | Mostly what melts their brains is the idea that an identifier | can be "in two places at once" - for example, you can have a | variable x declared in some scope and a function one of whose | arguments is named x, and those are two different things. | DougGwyn wrote: | Try explaining the concept of "scope", starting with nested | blocks. It does require some practice. I suggest not | unnecessarily reusing identifiers associated with different | objects. | wcarey wrote: | Thanks! | quelsolaar wrote: | A few proposals: | | Why not mandate a warning every time the compiler detects and | makes use of UB? It would solve SO many issues. If you are | looking to improve security of C programs, then letting the user | know what the compiler does should be number one. | | Try to convert as many UB's to Platform specific, as possible | would also be a big help. | | I would love to see native vector types. Its time. Vector types | are now more common in hardware then float was when it was | included in the C spec. Time to make it a native type. Hoping the | compiler does the vectorization for you is not good enough. | | Allow for more then one break. | | for(i = 0; i < n; i++) for(j = 0; j < n; j++) if(array[i][j] == | x) break break; | | is equal to: | | for(i = 0; i < n; i++) for(j = 0; j < n; j++) if(array[i][j] == | x) goto found; found : | clarry wrote: | > Why not mandate a warning every time the compiler detects and | makes use of UB? It would solve SO many issues. | | Because that's hardly ever what happens, except when it | actually does, and compilers do an increasingly good job of | issuing diagnostics in that case. If you actually mandated it, | no compiler today would come close to being standards | compliant. This comes close to making the language | unimplementable. | | The most common issue with UB and optimizations is not that | "compiler detects UB and does something with it," it's that | compiler analyzes and optimizes code _with the assumption that | UB doesn 't actually happen._ It doesn't know whether it does | (and in general, it is impossible to tell whether it would | happen -- it's something that might or might not happen at run | time, and proving it one way or another amounts to solving the | halting problem), it just assumes it doesn't. | | And if one mandated compilers to report every time they make an | optimization that is valid under the assumption that the | program is well behaved, then you would never finish reading | compiler output. Or you would turn off optimizations. | quelsolaar wrote: | They need to do better then remove NULL checks silently. You | can read all about Linus rants on this. Every time the | compiler breaks things they blame the C standard for letting | them do what ever. Thats whats wrong with C today. The C | standard hasn't put its foot down. | clarry wrote: | I want my compiler to remove redundant checks (without any | noise), and that is why I pass it an optimization flag. If | you don't want such optimizations, then maybe you should | not ask the compiler to make them. | quelsolaar wrote: | This attitude is terrible! Its an attitude that says that | unless you know exactly every pit fall in the language by | heart you have no place writing code. I guess you dont | use a debugger either because you never write bugs right? | And you think that every software that helps the user is | for noobs right? | | There is an endless list of bugs that have been produced | by very competent C programmers, because the compiler has | silently removed things for some very shaky reasons. | clarry wrote: | Huh? I just want performant code. That's why I write C, | and that's why I use an optimizing compiler, and that's | why I ask my compiler to optimize. | | I also want to write code that is reasonably generic. | Thus, it will have checks and branches that cover | important corner cases; they are required for | completeness and correctness. But very often, all of | these checks turn out to be redundant in a specific | context, and an optimizing compiler can figure it out, | and eliminate these checks for me. | | So I don't manually need to go and write two or three | versions of each function like do_foo and | assume_x_is_not_null_and_do_foo and | assume_y_is_less_than_int_max_minus_sizeof_z_and_do_foo | and make damn sure not to call the wrong one. | | I just write one version, with the right checks in place, | and if after macro expansion, inlining, range analysis, | common subexpression elimination, and other inference | from context, with C's semantics at hand, the compiler | can figure out that some of these checks are redundant, | then it will optimize them out. | | I ask for it, and I'm glad compiler developers deliver | it. You don't need to ask for it. Just turn off these | optimizations (or, rather, don't enable them) if you | prefer slow and redundant code. | [deleted] | faehnrich wrote: | I've been waiting for a book on C from No Starch Press, so I'm | really excited for this one. | | This might not be too deep a question on the C language in | regards to this book, but I've been wondering, why did you decide | to have an eldritch horror as the book's cover? | rseacord wrote: | It's a longish story, but people do seem to like the cover. We | started equating the idea of C == Sea, so we had some early | drawings of the robot riding various undersea creatures | including a giant squid. I thought that looked overly phallic, | so I suggested the robot ride Cthulhu instead, an unofficial | mascot of NCC Group. | faehnrich wrote: | I like how Cthulhu is shown as kind of a guide for the robot. | | The C==Sea brings to mind the book Expert C Programming: Deep | C Secrets. | beardedwizard wrote: | Deep c secrets, a classic. | cptnapalm wrote: | Wait, they put Cthulhu on the cover of a programming book? I'm | buying it. | douglascorrea wrote: | I'm trying to learn C during this quarantine times. I'm looking | for good beginner-friendly opensource projects to learn from. Can | you please suggest some repositories to look into? | woodrowbarlow wrote: | (obviously, i'm not one of the panel members, just chiming in.) | | if you're interested in looking at how C can be used in | embedded realtime operating systems, i recommend diving into: | | https://github.com/ARMmbed/littlefs | | (i'm not affiliated.) | | it's a lean, logging flash filesystem implementation and i | recommend it because the research, rationales, documentation, | organization, codebase, test harness, and public API ergonomics | all impressed me a lot. it was written for the mbed OS, but it | is so well designed that i could integrate it into any realtime | OS without too much trouble. and the documentation is thorough | enough that after skimming the wikipedia article for | filesystems, and maybe an article on how flash chips read and | write data, you'll be able to work your way through it. i | learned a lot by reading through that repository. | begriffs wrote: | What does the presence or absence of __STDC_ISO_10646__ indicate | exactly? I found this part of the C99 spec obscure. | | For instance, the macOS clang environment does not define this | symbol. Is their implementation of wchar_t or <wctype.h> lacking | some aspect of Unicode support? | AaronBallman wrote: | If that macro is defined, then wchar_t is able to represent | every character from the Unicode required character set with | the same value as the short code for that character. Which | version of Unicode is supported is determined by the date value | the macro expands to. | | Clang defines that macro for some targets (like the Cloud ABI | target), but not others. I'm not certain why the macro is not | defined for macOS though (it might be worth a bug report to | LLVM, as this could be a simple oversight). | begriffs wrote: | Would the following be a correct way to determine whether | there's a problem? | | * First call setlocale(LC_CTYPE, "en_US.UTF-8") | | * Next feed the UTF-8 string representation of every Unicode | codepoint one at a time to mbstowcs() and ensure that the | output for each is a wchar_t string of length one | | * If all input codepoints numerically match the output | wchar_t UTF-32 code units, then the implementation is | officially good, and should define __STDC_ISO_10646__? | AaronBallman wrote: | I think this is correct, assuming that locale is supported | by the implementation and wchar_t is wide enough, but I am | by no means an expert on character encodings. | rseacord wrote: | Should work provided your wchar_t type is at least 21-bits | wide. | SaxonRobber wrote: | can we get compile time constant variables? something cleaner | than enums and defines | jcranmer wrote: | Are there any plans to add support for multiple register return | values to C? | rseacord wrote: | None that I'm aware of. | Javantea_ wrote: | Do you think that static analysis is a valuable tool for security | research? Do you recommend static analysis software to a single | developer with a limited budget or an amateur? | msebor wrote: | Yes, both :) There are a few in public domain that might be | helpful to experiment with. Clang has had a static analyzer for | a while and GCC 10 adds one as well (and the maintainer is | looking for help with implementing checkers so that's a good | way to gain experience with writing one). | knz42 wrote: | What is the story behind the removal of VLAs from C99 in later | revisions? | AaronBallman wrote: | VLAs are still present in C17 and have not been removed. They | are, however, an optional feature with a truly weird (IMHO) | feature testing macro. If '__STDC_NO_VLA__' is defined to 1, | then the implementation does not support VLAs. | | IIRC, this macro was added to C11 along with a batch of other | "these are optional" macros for atomics, complex, threads, etc. | However, I don't recall whether C99 adopted the features as | optional features and missed the feature testing macro, or if | they were required features in C99 that we made optional in | C11. | stephencanon wrote: | Complex and VLA were required by C99, but made optional in | C11. The others were new in C11. | wuxb wrote: | I can't live without it. | rseacord wrote: | So I spend a possibly unreasonable amount of time and page | space discussing VLAs in the Effective C book. I understand | there are some problems with them, but for what it is worth, I | really like the feature, particularly when used in function | prototype scope. | jedbrown wrote: | I usually don't let them leak into public interfaces, and | don't allocate VLAs, but really like VLA pointers for multi- | dimensional array processing such as [ _]: | double (*a)[N][P] = (double (*)[N][P])a_flat; for (i=0; | i<M; i++) for (j=0; j<N; j++) for (k=0; | k<P; k++) a[i][j][k] = f(i, j, k); | | The alternative would be | a_flat[(i*M+j)*P+k] = f(i, j, k); | | which is a lot more error-prone. I understand that some | implementation (MSVC) declined to implement VLAs, but I | really wish that at least VLA-pointers could have remained a | mandatory part of C11 and later standards. | | [_] Has there been any discussion of adding GCC's "typeof" to | the standard? | alerighi wrote: | They did not remove them, but made them optional. | | Is a controverial feature, that can produce bugs, and are | banned in a lot of project (one famouse, the Linux kernel). | DougGwyn wrote: | What removal? C11 section 6.7.6.2 specifies the semantics. | cesarb wrote: | What the parent comment probably meant is that support for | VLA was required in C99, but is no longer required in C11, so | while code written for C99 could use VLAs without any special | consideration, code written for C11 cannot depend on VLAs | since it might not be present in all compilers. | cperciva wrote: | When will C gain a mechanism for "do not leave this sensitive | information laying around after this function returns"? We have | memset_s but that doesn't help when the compiler copies data into | registers or onto the stack. | pascal_cuoq wrote: | This is an entire language extension, as you note. The last | time various people interested in this were in the same room | (it was in January 2020 in a workgroup called HACS), what | emerged was that the Rust people would try to add the "secret" | keyword to the language first, since their language is still | more agile than C, while the LLVM people would prepare LLVM for | the arrival of at least one front-end that understand secret | data. | | Is this enough to answer your question? I can look up the names | of the people that were involved and communicate them privately | if you are further interested. | loeg wrote: | (Not OP) I would appreciate any references you can provide. | An LLVM __attribute__((secret)) would be a great place to | start. | pascal_cuoq wrote: | Unfortunately I am out of useful information: | | https://news.ycombinator.com/item?id=22868999 | | I hope someone will provide the next link. | stephencanon wrote: | Also worth noting that a language extension may not be | sufficient for all cases. E.g. the OS stores register state | on a context switch; do you also need a flag for the system | to zero any memory used for this purpose following the state | restore, or is it OK to trust that it won't leak through some | mechanism? For some applications, there may be contractual or | regulatory requirements to have an erasing mechanism for | copies like this as well. | cperciva wrote: | I want to use this in the OS kernel too. ;-) | cperciva wrote: | Thanks for the update. I was encouraging some of the people | who were going to be at HACS to address this but I hadn't | heard the latest progress. Unfortunately I couldn't be there | myself. | pascal_cuoq wrote: | If I remember correctly, Chandler was the one writing down | the draft for LLVM developers to comment on LLVM-side. | Unfortunately, if you Google his name and the relevant | keywords, the results are full of his work on speculative | load hardening. | | Someone who read the LLVM mailing-list attentively should | have seen it and may have a link. | dpipemazo wrote: | One of my favorite features recently while developing C for | embedded systems has been the --wrap linker flag that allows me | to effectively test code that interacts with hardware without | modifying the source. | | By passing -Wl,--wrap=some_function at link time with test code | we can then define __wrap_some_function | | that will be called instead of some function. Within | __wrap_some_function one can also call __real_some_function which | will resolve to the original version if you still want to call | the original one. This is especially useful if trying to observe | certain function calls in tests that interact with hardware. | | Do you have any other recommendations/preferences to help with | unit-testing C code? | freemind wrote: | Do you think object oriented languages are better than C to | develop GUI-based cross-platform programs? | | The licenses of the majority of third-party libraries available | for C are GPL, do you think this makes harder reusing code to | sell software? | oscoder wrote: | A more chill question for you - What's your favourite string | library? | ebg13 wrote: | How accurate, relevant, and useful today is http://c-faq.com ? | lemaudit wrote: | Hi, | | Do you think Annex K of C11 will be widely adopted by programmers | or unused? Why aren't people adopting it? | | Do you see the use of any analysis tools that are particularly | effective for finding memory safety issues? | | C++ added in smart pointers to its specification. Are there any | plans to do something similar in future C specifications? | | Thanks! | AaronBallman wrote: | > Do you think Annex K of C11 will be widely adopted by | programmers or unused? Why aren't people adopting it? | | So far, it's not been widely adopted. Part of the issue is that | there are specification issues relating to threads and the | constraint handlers, and part of the issue is that popular libc | implementations have actively resisted implementing the annex. | | That said, I field questions about Annex K on a regular basis | and there are a few implementations in the wild, so there is | user interest in the functionality. | | > Do you see the use of any analysis tools that are | particularly effective for finding memory safety issues? | | <biased opinion>I think CodeSonar does a great job at finding | memory safety issues, but I work for the company that makes | this tool.</biased opinion> | | I've also had good luck with the memory and address sanitizers | (https://github.com/google/sanitizers) and tools like valgrind. | | > C++ added in smart pointers to its specification. Are there | any plans to do something similar in future C specifications? | | We currently don't have any proposals for adding smart pointers | to C. Given that C does not have constructors or destructors, | we would have to devise some new mechanism to implement or | replace RAII in C, which would be one major hurdle to overcome | for smart pointers. | hedora wrote: | I've had good luck (in C++) replacing the underlying memory | allocator with one that tracks leaks by allocation type | (which is fast enough for production use). | | This can be done in C, but the calling code has to spell | malloc and free differently. | | In debug mode, configuring malloc to poison (and add fences) | on allocation and free finds most of the remaining things. | | These techniques tend to have much lower runtime overhead | than valgrind (2-digit percentages vs 5-10x), so they can be | left on throughout testing and partially enabled in | production. | | They find >90% of the memory bugs that I write (assuming | valgrind finds 100%). YMMV. | jcelerier wrote: | > We currently don't have any proposals for adding smart | pointers to C. Given that C does not have constructors or | destructors, we would have to devise some new mechanism to | implement or replace RAII in C, which would be one major | hurdle to overcome for smart pointers. | | why would you _have_ to devise a new mechanism and not borrow | one from one of the thousand other mechanisms already | existing in PL litterature for this ? | loeg wrote: | Annex K isn't being adopted because it's unergonomic and | doesn't solve the problem it purports to. Even the proposer | (Microsoft) does not actually implement Annex K as specified in | the ISO. | rseacord wrote: | Microsoft originally implemented the Annex K Bounds checked | interfaces (e.g., the *_s functions) back in the 1990s in | response to well-publicized vulnerabilities. They proposed | standardization to the C Standards committee. The committee | made many changes to the proposal, possibly going too far | away from the original implementation. During this time, I | would say that Microsoft was very differential to the wishes | of the committee. | | By the time the ISO/IEC TR 24731-1:2007 was released, and | then later Annex K added to the C Standard, Microsoft had to | decide if they wanted to change the interfaces to conform to | the changed standard and re-implement their code bases. They | presumably decided that they did not, which I think is a | defensible decision. | | As to unergonomic, examples please? | rurban wrote: | Wrong. Many implemented them, Microsoft as first, followed by | Cisco, Watcom, Embarcadero, Huawei and Android. Widely used | in Windows, Embedded and phones. | | Microsoft just changed one bit of the proposal, but no one | followed them there. Currently it's the most widely used and | worst implemented. I tested all of them. | | It solves the bounds checking problem better than | _FORTIFY_SOURCE, ASAN and valgrind, because it does the | checks always, if compile-time or run-time, independent on | the optimizer, the used intrinsics, where valgrind fails, and | is much faster than ASAN. Also faster than glibc btw. | jmckinley wrote: | It is 2020. You are looking at a series of projects your company | has teed up. All are greenfield efforts - no legacy. What would | be the attributes of a project that would have you recommend C as | the programming language? | quelsolaar wrote: | Anything high performance: game engine, scientific computation, | deep packet inspection, image analysis, machine learning, | rendering engines, high frequency trading.... The list is long! | joefourier wrote: | For anything embedded you have practically no choice but to use | C (or assembly). Same goes for a lot of systems programming, | e.g. writing Linux drivers. | commandersaki wrote: | A few years ago I came across this article Pointers Are More | Abstract Than You Might Expect In C [1]. | | I followed the article which attempted to interpret the C | standard and come to a conclusion. The conclusion is: | | > The takeaway message is that pointer arithmetic is only defined | for pointers pointing into array objects or one past the last | element. Comparing pointers for equality is defined if both | pointers are derived from the same (multidimensional) array | object. Thus, if two pointers point to different array objects, | then these array objects must be subaggregates of the same | multidimensional array object in order to compare them. Otherwise | this leads to undefined behavior. | | Based on the above, I arrived at the conclusion after reading | this that comparing two distinct malloc()'d pointers for equality | itself is undefined behaviour since malloc() is likely to return | pointers to distinct objects that are not part of a sub-aggregate | object. | | I know this is incorrect, but I don't know why I'm wrong. | | [1]: https://stefansf.de/post/pointers-are-more-abstract-than- | you... | pascal_cuoq wrote: | The only thing that is not defined is comparing a pointer one- | past-the-end to a pointer to the very beginning of a toplevel | object. Apart from this rule, pointers of course do not need to | be derived from the same object in order to be compared with == | and !=. | | &a + 1 == &b is unspecified: it may produce 0 or 1, and it may | not produce the same result if you evaluate it several times. | | Similarly, if both the char pointers p and q were obtained with | malloc(10), after they have been tested for NULL, all these | operations are valid: p == q (false) p + | 1 == q (false) p + 1 == q + 1 (false) p + 10 == q + | 1 (false) | | Only p+10 == q and p == q+10 are unspecified (of the | comparisons that can be built without invoking UB during the | pointer arithmetic itself). | | I have no idea what led that person to (apparently) write that | &a==&b is undefined. This is plain wrong. I do not see any | ambiguity in the relevant clause | (https://port70.net/~nsz/c/c11/n1570.html#6.5.9p6 ). Yes, the | standard is in English and natural languages are ambiguous, but | you might as well claim that a+b is undefined because the | standard does not define what the word "sum" means | (https://port70.net/~nsz/c/c11/n1570.html#6.5.6p5 ). | azinman2 wrote: | Why is this undefined if it's all just pointers to addresses | in memory, regardless if the memory is valid for that object | or not? | pascal_cuoq wrote: | Here is an example I have at hand that shows that when you | are using an optimizing compiler, there is no such thing as | "just pointers to addresses in memory". There are plenty | more examples, but I do not have the other ones at hand. | | https://gcc.godbolt.org/z/Budx3n | joosters wrote: | I would guess that it is because it gives some freedom to | the compiler. e.g. If you have two pointers 'foo' and 'bar' | that point to two separate structures (e.g. two arrays of | ints), the compiler can always assume that the pointers, | even with some adds/subtracts, will never 'collide', i.e. | foo will never == bar, regardless of their relative memory | positions. | cormacrelf wrote: | That's quite precise, can you give a sense of why it's useful | to have? Does it translate as "you can never know whether two | mallocs are adjacent, so don't even try merging them"? | pascal_cuoq wrote: | One concrete reason why "unspecified" means "anything and | not always the same thing" is to enable the maximum of | optimizations. | | Write a function c that compares pointers in a compilation | unit, and in another compilation using, define: | int a, b; X1 = (&a == &b + 1); X2 = c(&a, | &b + 1); | | The compiler can optimize the computation of X1 on the | basis that comparing an offset of &a to an offset of &b | will always: - be false - or invoke | undefined behavior - or be unspecified | | But the optimization will not apply to the computation of | X2, so the two variables X1 and X2 can receive different | values when you execute this example, although they appear | to compute the same thing. | cormacrelf wrote: | I get why unspecified means that and it's good to know | what the limit is for applying an optimisation, but I was | asking about why the specific comparison of "one past the | end" with the beginning of another being unspecified | would be useful. It's cool you can optimise it out, but | what does a compiler gain from being able to do that? | | Imagine a standard stated that > and < character | comparisons involving '%' were unspecified. Why would | this be good? It wouldn't, so it's not in any standard. | But specifically it wouldn't because (a) nobody writes ch | < '%', and (b) if they did, compilers couldn't make | programs any faster, more portable, etc, because of its | inclusion. | | I guessed above that this is kinda like having hashmaps | iterate in a random order: compilers do spooky things | when you try to check whether two allocas/mallocs are | adjacent, so don't do it. Is that accurate? Or does it | mean that compilers can move things around on the stack | if they want, without worrying about updating the | registers or locations that store the pointers, i.e. this | is mainly to make compilers easier to write? If it's | that, I imagine I would want some other pointer | comparisons on the list. The reason it's in there is what | I wanted you to shed some light on. | pascal_cuoq wrote: | Oh, that was your question. In this case, the reason why | &a + 1 == &b is unspecified is that: | | - it's generally false--there is no reason for b to be | just after a in memory, so these two addresses compare | different. | | - it is sometimes true: when addresses are implemented as | integers, and compilers use exactly sizeof(T) bytes to | represent an object of type T, and do not waste precious | integers by leaving gaps between objects, and == between | pointers is implemented as the assembly instruction that | compares integers, sometimes that instruction produces | true for &a + 1 == &b, because b was placed just after a | in memory. | | In short, &a + 1 == &b was made unspecified so that | compilers could implement pointer == by the integer | equality instruction, and could place objects in memory | without having to leave gaps between them. Anything more | specific (such as "&a + 1 == &b is always false") would | have forced compilers to take additional measures against | providing the wrong answer. | _kst_ wrote: | Pointer _equality_ (the == and != operators) is well defined | for any pointers (of the same type) to any objects. | | Relational operators (< <= > >=) on pointers have undefined | behavior unless both pointers point to elements of the same | array object or just past the end of it. A single non-array | object is treated as a 1-element array for this purpose. | | (That's for object pointers. Function pointers can be compared | for equality, but relational operators on function pointers are | invalid.) | graycat wrote: | (1) Explain just how malloc() and free() work _under the covers_ | and the implications for multi-threading, _memory leaks_ , | virtual memory paging, etc. | | Maybe also cover some means, algorithms, and code for reporting | on the _state_ , status, etc. of the memory use by malloc() and | free(). | | By the way, I know and have known well for longer than most C | programmers have lived JUST what the _heap_ data structure, as | used in "heap sort", is. But what is the meaning of "the heap" | in C programming language documentation? | | (2) Cover in overwhelmingly fine detail the "stack" and the | chuckhole in the road, _stack overflow_. | | (3) Where to get a reliable package for a reasonable package of | code for handling character strings -- what I saw and worked with | in C is not reasonable. | | (4) From the C programming I did, it looks like a large C program | for significant work involves some hundreds, maybe tens of | thousands, of _includes_ , _inserts_ , whatever, and what a | linkage editor would call _external references_. There must | somewhere be some tools to help a programmer make sense of all | those includes and references, the resulting memory maps, issues | of locality of reference, _word boundary alignment_ , etc. | | (5)How can C exploit a processor with 64 bit addressing and main | memory in the tens of gigabytes and maybe terabytes? | | (6) How can C support, i.e., exploit, integers and IEEE floating | point in 64 and/or 128 bit lengths? | | (7) How to handle exceptional conditions with, say, non-local | gotos and without danger of memory leaks? | | (8) Sorry, but far and away my favorite programming language long | has been and remains PL/I, especially for its scope of names | rules, handling of aggregates with external scope, its _data | structures_ , and its exceptional conditional handling with non- | local gotos and freeing _automatic_ storage and, thus, avoiding | _memory leaks_. Of course I can 't use PL/I now, but the problems | PL/I solved are still with us, also when writing C code. So, how | to solve these problems with C code? | | (9) For C++, please explain how that works _under the covers_. | E.g., some years ago it appeared the C++ was defined as only a | source code pre-processor to C. Is this still the case? If so, | then explaining C++ _under the covers_ should be feasible and | valuable. | zokier wrote: | > But what is the meaning of "the heap" in C programming | language documentation? | | The C language standard does not contain the word "heap" | anywhere; as far as C is considered, there is no "heap" in | particular. | DougGwyn wrote: | It has been many years since a C++-to-C preprocessor has been | commonplace. There's just too much new stuff in recent C++ to | map it all easily into straight C. | mesarvagya wrote: | (7) Exactly. Please add how to free memory in standard way if | there's an exception, and how not to use GoTo in such cases. | aw1621107 wrote: | > Explain just how malloc() and free() work under the covers | and the implications for multi-threading, memory leaks, virtual | memory paging, etc. > > Maybe also cover some means, | algorithms, and code for reporting on the state, status, etc. | of the memory use by malloc() and free(). | | Strictly speaking, these are implementation details that the C | standard leaves unspecified. If you want to know how the memory | allocation functions work or methods for inspecting the state | of the heap you'll need to look at a specific implementation | (e.g., glibc, musl, jemalloc, etc.) since the details can vary | wildly between implementations. | | > Cover in overwhelmingly fine detail the "stack" and the | chuckhole in the road, stack overflow. | | Both these are not really specific to C, and there should be a | lot of resources you can find that explain these concepts ([0], | [1] for some example general explanations). Did you have more | specific questions in mind? | | > How can C exploit a processor with 64 bit addressing and main | memory in the tens of gigabytes and maybe terabytes? > How can | C support, i.e., exploit, integers and IEEE floating point in | 64 and/or 128 bit lengths? | | I think pointer/integer sizes are implementation details. C | specifies pointer behavior and minimum integer sizes (and | optional fixed-width types), but the precise widths are chosen | by the implementation. In the case of floating-point, the sizes | are specified by IEEE 754 widths. | | In other words, you don't really need to do anything special as | long as you pick the appropriate types as defined by your | implementation. | | > For C++, please explain how that works under the covers. | E.g., some years ago it appeared the C++ was defined as only a | source code pre-processor to C. Is this still the case? | | As far as I know no (production-quality?) C++ compiler has been | implemented as a source-level preprocessor for basically the | entirety of C++'s existence [2]. The very first "compiler" for | C++ was Cpre, back when C++ was still the C dialect "C with | classes" (around October 1979), and that was indeed a | preprocessor. That was replaced by the Cfront front end around | 1982-1983, about when "C with classes" started gaining new | features and got a new name. Cfront is a proper compiler front | end that output C code, and I think from that point on C++ | compilers used "standard" compiler tech. | | [0]: https://stackoverflow.com/questions/79923/what-and-where- | are... [1]: https://en.wikipedia.org/wiki/Stack_overflow [2]: | http://www.stroustrup.com/hopl2.pdf | graycat wrote: | Thanks. | | > Did you have more specific questions in mind? | | On stack overflow, my understanding was that could encounter | that fatal condition from suddenly a too deep _call stack_ , | that is, too many calls without a return. So, if the "stack" | is a, say, finite resource, then the programmer should know | in the code how much of that resource is being used and act | accordingly. | | For a preprocessor for C++, I IIRC at one point the | definition of C++ was in terms of a preprocessor -- I was | just thinking of the definition, that is, get a more explicit | definition of C++. I've always understood that always or | nearly so C++ implementation was usual _compilation_. The | issue is that at least at one time it seemed difficult to be | precise about C++ semantics, that is, what the code would do | and how it would do it. Maybe now C++ is beautifully | documented. | DougGwyn wrote: | (1) There are several implementations; most are based on | Knuth's "boundary tag" algorithms. As to "heap", a stack has | one accessible end, a heap is essentially random-accessible. | Nothing to do with the heap data structure. (2) Stack overflow | can occur even early within a program. I've campaigned for a | requirement that such overflows be caught and integrated into a | standard exception handler, to no avail. (3) Why not code your | own, so there won't be arguments about it. (4) There are lots | of tools for program development, but it's not standardized by | WG14. (5) Use wider integer types. (6) Use wider floating | representations. (7) Standard C doesn't specify such a | facility, but it has occasionally be suggested. (8) There were | a lot of books, e.g. on structured system analysis, during the | 1970s trying to apply lessons learned. C isn't special in that | regard, as many of the big problems don't involve syntax. (9) | C++ is now a big language and it takes a lot of work to master | its internals. | jfim wrote: | Out of curiosity, if there was anything you could change about C, | and not have to worry about breaking existing code or any other | practical concern, what would it be, and why? | mlvljr wrote: | How much UB does your own code contain, folks (and what practices | do you follow to avoid it)? | | Cheers from the shadowland :) | AvImd wrote: | What is your vision of C, its future and its past? What was it | supposed to become and did it become that thing? What is it now? | What will it involve into in the near and far future? | msebor wrote: | The C charter and the C committee's job is to standardize | existing practice. That means codifying features that emerge as | successful in multiple implementations (compilers or | libraries), and that are in the overall spirit of the language. | kazinator wrote: | CAN I HAZ UNNAMED UNUSED PARAM void callback(int | x, void *) // VOID STAR UNUZED, SO ANON { | foo(x); } | DougGwyn wrote: | Why is there a second argument which is not used? | wnoise wrote: | To match an API where it is sometimes used. | networkimprov wrote: | Has there been consideration of async/await semantics? | rand0mstring wrote: | will we ever see compile time programming in C like constexpr in | C++? | gautamcgoel wrote: | Curious what the committee members think of the new competitors | to C, e.g. Go, Rust, and Zig. Any comments? | VWWHFSfQ wrote: | go isnt a competitor to c | pjmlp wrote: | F-Secure apparently thinks otherwise, | | https://www.f-secure.com/en/consulting/foundry/usb-armory | | As does Google, | | https://github.com/google/gvisor | | https://github.com/google/gapid | | Naturally if one is talking about specific uses cases like | IoT with a couple of KBs, MISRA-C, or UNIX kernels, then yes | Go is not a competitor. | rmind wrote: | A lot C programmers prefer to keep structures within the C source | file ("module"), as a poor man's encapsulation. For example: | | component.h: struct obj; typedef struct | obj obj_t; obj_t *obj_create(void); // .. | the rest of the API | | component.c: struct obj { int | status; // .. whatever else }; | obj_t * obj_create(void) { return | calloc(1, sizeof(obj_t)); } | | However, as the component grows in complexity, it often becomes | necessary to separate out some of the functionality (in order to | re-abstract and reduce the complexity) into a another file or | files, which also operate on "struct obj". So, we move the | structure into a header file under #ifdef __COMPONENT_PRIVATE | (and/or component_impl.h) and sprinkle #define | __COMPONENT_PRIVATE in the component source files. It's a poor | man's "namespaces". | | Basically, this boils down to the lack | namespaces/packages/modules in C. Are you aware of any existing | compiler extensions (as a precedent or work in that direction) | which could provide a better solution and, perhaps, one day end | up in the C standard? | | P.S. And if C will ever grow such feature, I really hope it will | _NOT_ be the C++ 'namespace' (amongst many other depressing | things in C++). :) | msebor wrote: | The ELF visibility attributes solve the part of the problem at | the binary level (by hiding private library APIs from the | application). The rest should be doable by structuring the | project sources and headers in a suitable way. | loeg wrote: | ELF is very much not part of the C standard. | pascal_cuoq wrote: | I am sorry I do not have an answer to your question. It's a | very valid one and I would be interested in any pointer to an | answer. | | What I _can_ say while we are on the subject, is that I have | seen C code (most often C code that started its life in the | 1990s, to be fair) that instead of showing an abstract struct | in the public interface, showed a different struct definition. | | Please don't do this. Yes, when compiling nowadays, eventually | every compilation unit ends up as object files passed to a | linker that doesn't know about types, but this is undefined | behavior. It makes it difficult to find undefined behavior in | the rest of the code because there is a big undefined behavior | right in the middle of it. | rmind wrote: | I assume you mean something like that: | struct obj_impl { // real members ... | }; In public API header: struct obj | { unsigned char _private[N]; // -- where N is the | size of obj_impl }; | | I have seen such code too. It is also potentially error- | prone. Certainly not advocating for it. | beefhash wrote: | Wait, doesn't this mean that the BSD sockets API is | inherently dependent on UB, casing different socket types to | each other and sometimes only using the first few members, or | am I misunderstanding you? | pascal_cuoq wrote: | Yes and no. | | The thing I am describing is when you link a compilation | unit using: struct internal_state { int | dummy; } state; | | with another compilation unit that defined the same state | differently: struct internal_state { | int actual_meaningful_member_1; unsigned long | actual_meaningful_member_2; } state; | | As far as I know, BSD socked do not do this. Zlib was doing | this (https://github.com/pascal-cuoq/zlib- | fork/blob/a52f0241f72433... ), but I have had the privilege | of discussing this with Mark Adler, and I think the no- | longer-necessary hack was removed from Zlib. | | BSD sockets probably have a different kind of UB, related | to so-call "strict aliasing" rules, unless they have been | carefully audited and revised since the carefree times in | which they were written. I am going to have to let you read | this article for details (example st1, page 5): | https://trust-in-soft.com/wp- | content/uploads/2017/01/vmcai.p... | loeg wrote: | BSD sockets are weird in that the first struct's | (sockaddr) size wasn't big enough, so APIs all take a | nominal pointer to sockaddr but may require larger | storage (sockaddr_storage) depending on the actual | address. /* * Structure used by | kernel to store most * addresses. */ | struct sockaddr { unsigned char sa_len; | /* total length */ sa_family_t | sa_family; /* address family */ char | sa_data[14]; /* actually longer; address value */ | }; /* * RFC 2553: protocol- | independent placeholder for socket addresses */ | #define _SS_MAXSIZE 128U #define _SS_ALIGNSIZE | (sizeof(__int64_t)) #define _SS_PAD1SIZE | (_SS_ALIGNSIZE - sizeof(unsigned char) - \ | sizeof(sa_family_t)) #define _SS_PAD2SIZE | (_SS_MAXSIZE - sizeof(unsigned char) - \ | sizeof(sa_family_t) - _SS_PAD1SIZE - _SS_ALIGNSIZE) | struct sockaddr_storage { unsigned char | ss_len; /* address length */ | sa_family_t ss_family; /* address family */ | char __ss_pad1[_SS_PAD1SIZE]; | __int64_t __ss_align; /* force desired struct | alignment */ char | __ss_pad2[_SS_PAD2SIZE]; }; | wahern wrote: | struct sockaddr_storage is insufficient as well. A Unix | domain socket path can be longer than `sizeof ((struct | sockaddr_un){ 0}).sun_path`. That's a major reason why | all the socket APIs take a separate socklen_t argument. | Most people just assume that a domain socket path is | limited to a relatively short string, but it's not | (except possibly Minix, IIRC). | asveikau wrote: | > A Unix domain socket path can be longer than `sizeof | ((struct sockaddr_un){ 0}).sun_path` | | Hm, I didn't realize this, or if I knew this I had | forgotten. It makes sense because sun_path is usually | pretty small, I believe 108 chars is the most common | choice, and typically file paths are allowed to be much | longer. | | Do you have a citation for this behavior? I can't seem to | find it, though I'm not looking very hard. | | I guess you are right that any syscall taking a struct | sockaddr * also has a length passed to it... Some systems | have sa_len inside struct sockaddr to indicate length, | but IIRC linux does not. I've often thought that length | parameter was sort of redundant, because (1) some | platforms have sa_len, and (2) even without that, you | should be able to derive length from family. But your | Unix domain socket example breaks (2). Without being able | to do that, I start to imagine that the kernel would need | to probe for NUL chars terminating the C string anytime | it inspects a struct sockaddr_un, rather than block- | copying the expected size of the structure -- that would | be needlessly complicated. | wahern wrote: | So I just reran some tests on my existing VMs and it | turns out I remembered wrong. Here's the actual break | down: | | * Solaris 11.4: .sun_path: 108; bind/connect path | maximum: 1023. Length seems to be same as open. | Interestingly, open path maximum seems to be 1023 (judged | by trying ls -l /path/to/sock), although I always thought | it was unbounded on Solaris. | | * MacOS 10.14: .sun_path: 104, bind/connect path maximum: | 253. Length can be bigger than .sun_path but less than | open path limit. | | * NetBSD 8.0: .sun_path: 104, bind/connect path maximum: | 253. Same as MacOS. | | * FreeBSD 12.0: .sun_path: 104, bind/connect path | maximum: 104. | | * OpenBSD 6.6: .sun_path: 104, bind/connect path maximum: | 103 (104 - 1). | | * Linux 5.4: .sun_path: 108, bind/connect path maximum: | 108. | | * AIX 7.1: .sun_path: 1023, bind/connect path maximum: | 1023. Yes, .sun_path is statically sized to 1023! And | like Solaris, open path maximum seems to be 1023 (as | judged by trying ls -l /path/to/socket). Thanks to Polar | Home, polarhome.com, for the free AIX shell account. | | Note that all the above lengths are _exclusive_ of NUL, | and the passed socklen_t argument did not include a NUL | terminator. | | For posterity: on all these systems you can still create | sockets with long paths, you just have to chdir or use | bindat/connectat if available. My test code confirmed as | much. And AFAICT getsockname/getpeername will only return | the .sun_path path (if anything) used to bind or connect, | but that's a more complex topic (see https://github.com/w | ahern/cqueues/blob/e3af1f63/PORTING.md#g...) | asveikau wrote: | Linux also has the unusual extension of: if sun_path[0] | is NUL, the path is not a filesystem path and the rest of | the name buffer is an ID. I don't remember if that can | have embedded NULs in that ID. I believe so. | haberman wrote: | I'm curious what exactly makes this undefined behavior. | | And in particular, what about something like this? | struct Foo { #ifdef __cplusplus int | bar() const { return bar_; } private: | #endif int bar_; }; | | Or, taking this a step further: struct | _Foo; typedef struct _Foo Foo; // In | C "struct _Foo" is never defined. int | Foo_bar(const Foo* foo) { return *(int*)foo; } | void Foo_setbar(Foo* foo) { *(int*)foo; } Foo* | Foo_new() { return malloc(sizeof(int)); } | #ifdef __cplusplus struct _Foo { void | set_bar() { bar_ = bar; } int bar() const { | return bar_; } private: int bar_; | }; #endif | | The above isn't ideal but it does provide encapsulation | in a way that doesn't seem to violate strict aliasing | (the memory location is consistently read/written as | "int"). | pascal_cuoq wrote: | I think this is plenty ok. For one thing, If a struct as | a member of type T, it's ok to access it through a | pointer to T (and also the address of the struct is | guaranteed to be identical to the address of the first | member). For another, you are using dynamically allocated | memory, so the only thing that matters is the type of the | pointer when the access is finally made. It doesn't | matter that it was a Foo* before, if what you dereference | is an int*. | | This is different from pretending that the address of a | struct s { int a; double b; } is the address of a struct | t { int a; long long c; } and accessing it through a | pointer to that. If you do that, C compilers will (given | the opportunity) assume that the write-through-a-pointer- | to-struct-t does not modify any object of type "struct | s". This is what the example st1 in the article | illustrates. | | The latter is what I suspect plenty of socket | implementations still do (because there are several types | of sockets, represented by different struct types with a | common prefix). It is possible to revise them carefully | so that they do not break the rules, but I doubt this | work has been done. | loeg wrote: | Yeah, the BSD socket API is kind of terrible like that. You | could consider it an unspecified union type, or use | memcpy() exclusively to access it safely. | emilfihlman wrote: | Yeah, it depends on well agreed convention but which is ub | according to the standard. | cyber1 wrote: | How close C Standard Committee works with Linux Kernel | Developers? Is Linux Kernel development influence on C standard? | AaronBallman wrote: | There's not an official collaboration between the committee and | the kernel developers (that I'm aware of), but we do have | people on the committee who need to support Linux kernel | development (such as GCC maintainers), so there is some level | of indirect influence there. | clarry wrote: | Why can't I have flexible array members in union? Consider this: | struct foo { enum { t_char, t_int, t_ptr, /* .. */ } | type; int count; union { | char c[]; int i[]; void *p[]; | /* .. */ }; }; | | This isn't allowed, since flexible array members are only allowed | in structs (but the union here is exactly where you'd put a | flexible array member if you had only one type to deal with). | | Furthermore, you can't work around this by wrapping the union's | members in a struct because they must have more than one named | member: struct foo { enum { t_char, | t_int, t_ptr } type; int count; | union { /* not allowed! */ struct { char c[]; }; | struct { int i[]; }; struct { void *p[]; }; | }; }; | | But it's all fine if we either add a useless dummy variable or | move some prior member (such as _count_ ) into these structs: | struct foo { enum { t_char, t_int, t_ptr } type; | int count; union { /* this works but is silly | and redundant */ struct { int dumb1; char c[]; }; | struct { int dumb2; int i[]; }; struct { int | dumb3; void *p[]; }; }; }; | | Of course, you could have the last member be | union { char c; int i; void *p; } u[]; | | but then each element of u is as large as the largest possible | member which is wasteful, and u can't be passed to any function | that expects to get a normal, tightly packed array of one | specific type. | psherbet wrote: | I love how small of a language C is and get concerned when people | recommend adding feature x,y and z. | | What's the plan for C over the next 5 - 10 years? | DougGwyn wrote: | There is no grand goal that I know of. I wish more importance | were being placed on keeping existing well-written code | working, which includes continued support for what might be | considered near-obsolete. If one wanted to design a new (not | fully compatible) language, that could have lofty goals; just | don't call it "C". | ativzzz wrote: | Other than these experts, what kind of companies do C developers | work at? How does the compensation look like compared to doing | web development? | pascal_cuoq wrote: | I do not actually develop in C (other than short examples to | feed the C analyzer that I work on, which is not written in C) | but our customers do employ plenty of C developers. These | customers are developing embedded software that reads inputs | from sensors, process them, and send the final results of the | computations to actuators, in fields such as IoT, aeronautics, | rail, space, nuclear energy production, autonomous | transportation, ... | | The list is very much biased by the sort of analyzer we | provide. There are certainly plenty of non-embedded codebases | in C and of developers paid to maintain and extend them, it's | just that we currently do not work with them as much. | | I do not know about whether the compensation is better or worse | than for other technologies. | baybal2 wrote: | Hello, I coded in C as a high schooler. Now, 16 years later, I | have to code C again semiprofessionally after a very long break. | | Big question, how to start programming in C on a high | professional level for somebody self schooled in it? Is there a | way to cut the corner, without having to go through 10+ years | trial and error to gain experience? | | Anything for somebody ready to sit, study, and practice for a few | hours a day? | Nemerie wrote: | There was a nice discussion recently | https://news.ycombinator.com/item?id=22519876 | rseacord wrote: | Many of your remaining questions have devolved into "When will I | see my favorite feature xyz appear in the C Standard?" The answer | in most cases is "that depends on how long it takes you to submit | a proposal". Take a look at http://www.open- | std.org/jtc1/sc22/wg14/www/wg14_document_log... for previous | proposals and review the minutes to see which proposals have been | adopted. In general, the committee is not going to adopt | proposals for which there is insufficient existing practice or | haven't been fully thought out. There are cases where people have | come to a single meeting with a well-considered proposal that was | adopted into the C Standard. I wrote about one such case here: | https://www.linkedin.com/pulse/alignment-requirements-memory... | Alternatively, you can approach someone on the committee and ask | us to champion a proposal for you. It is likely that we'll agree | or at least provide you with feedback on your proposal. | billfruit wrote: | I do find that C is difficult use for large programs. It there | any thoughts that introducing features like namespaces. | | Another thing very cumbersome is to do in C is object creation; | creating instantiable objects is possible very cumbersome. Is | there some feature in the thoughy process to deal with it. To | make it clear, in C we can create a data structure like a Stack | or a queue easily. But if the program needs 10 stacks then | presently no simple way of achieving it. | DougGwyn wrote: | In BRL's MUVES project, we used a 2-character prefix indicating | category. E.g., all the external identifiers for our fancy | memory allocator began with "Mm", where Mm.h documented the | interface for the Mm package only. | | To minimize the external identifiers, one could make just the | name of a container structure the sole entry access handle, | with structure members pointing to the functions. Then use it | like: #include <Mm.h> if ((new = | Mm.allo(size)) == NULL) Er.abort("out of memory"); | jfkebwjsbx wrote: | Tip: you can use four leading spaces to write code. | Like this | [deleted] | steveklabnik wrote: | You only need two! like this | sgt wrote: | I did not know that, after spending years on HN. | while(1) fork(); | steveklabnik wrote: | https://news.ycombinator.com/formatdoc | DougGwyn wrote: | I tried, but two spaces yielded what you saw. | dang wrote: | Huh, it also needed an extra line break before the first | line of code. I didn't realize that! I've fixed it now. | steveklabnik wrote: | I didn't realize that either, but it's described in | formatdoc as such. So if you changed that behavior, | probably should change the docs too. | dang wrote: | I didn't change the behavior - I just added a newline. | Sorry that wasn't clear. | DougGwyn wrote: | You should be commended for the fast customer service! | beefhash wrote: | Now that C2x plans to make two's complement the only sign | representation, is there any reason why signed overflow has to | continue being undefined behavior? | | On a slightly more personal note: What are some undefined | behaviors that you would like to turn into defined behavior, but | can't change for whatever reasons that be? | msebor wrote: | Some instances of undefined behavior at translation time can | effectively be avoided in practice by tightening up | requirements on implementations to diagnose them. But strictly | speaking, because the standard allows compilers to continue to | chug along even after an error and emit object code with | arbitrary semantics, turning even such straightforward | instances into constraint violations (i.e., diagnosable errors) | doesn't prevent UB. | | It might seem like defining the semantics for signed overflow | would be helpful but it turns out it's not, either from a | security view or for efficiency. In general, defining the | behavior in cases that commonly harbor bugs is not necessarily | a good way to fix them. | klodolph wrote: | Just going to inject that this impacts a bunch of random | optimizations and benchmarks. Just to fabricate an example: | for (int i = 0; i < N; i += 2) { // } | | Reasonably common idea but the compiler is allowed to assume | the loop terminates precisely because signed overflow is | undefined. | | I'm not trying to argue that signed overflow is the right tool | for the job here for expressing ideas like "this loop will | terminate", but making signed overflow defined behavior will | impact the performance of numerics libraries that are currently | written in C. | | From my personal experience, having numbers wrap around is not | necessarily "better" than having the behavior undefined, and | I've had to chase down all sorts of bugs with wraparound in the | past. What I'd personally like is four different ways to use | integers: wrap on overflow, undefined overflow, error on | overflow, and saturating arithmetic. They all have their places | and it's unfortunate that it's not really explicit which one | you are using at a given site. | nickodell wrote: | Under C11, the compiler is still allowed to assume | termination of a loop if the controlling expression is non- | constant and a few other conditions are met. | | https://stackoverflow.com/a/16436479/530160 | alerighi wrote: | The compiler assumes that the loop will alwasy terminate and | that assumption is wrong, because in reality there is the | possibility that the loop will not terminate, since the | hardware WILL overflow. | | So it's not the best solution. If we want to make this | behaviour for optimizations (that are to me not worthed, | giving the risk of potentially critical bugs) we must make | that behavior explicit, not implicit: thus is the programmer | that has to say to the compiler, I guarantee you that this | operation will never overflow, if it does it's my fault. | | We can agree that having a number that wraps around is not a | particularly good choice. But unless we convince Intel in | some way that this is bad and make the CPU trap on an | overflow, so we can catch that bug, this is the behaviour | that we have because is the behaviour of the hardware. | coliveira wrote: | > I guarantee you that this operation will never overflow, | if it does it's my fault. | | This is exactly what every C programmer does, all the time. | klodolph wrote: | > The compiler assumes that the loop will alwasy terminate | and that assumption is wrong, because in reality there is | the possibility that the loop will not terminate, since the | hardware WILL overflow. | | The language is not a model of hardware, nor should it be. | If you want to write to the hardware, the only option | continues to be assembly. | iainmerrick wrote: | _the compiler is allowed to assume the loop terminates | precisely because signed overflow is undefined._ | | Just to be sure I understand the fine details of this -- what | would the impact be if the compiler assumed (correctly) that | the loop might not terminate? What optimization would that | prevent? | klodolph wrote: | > ...what would the impact be if the compiler assumed | (correctly) that the loop might not terminate? | | Loaded question--the compiler is absolutely correct here. | There are two viewpoints where the compiler is correct. | First, from the C standard perspective, the compiler | implements the standard correctly. Second, if we have a | real human look at this code and interpret the programmer's | "intent", it is most reasonable to assume that overflow | does not happen (or is not intentional). | | The only case which fails is where N = INT_MAX. No other | case invokes undefined behavior. | | Here is an example you can compile for yourself to see the | different optimizations which occur: | typedef int length; int sum_diff(int *arr, length | n) { int sum = 0; for (length i = | 0; i < n; i++) { sum += arr[2*i+1] - | arr[2*i]; } return sum; } | | At -O2, GCC 9.2 (the compiler I happened to use for | testing) will use pointer arithmetic, compiling it as | something like the following: int | sum_diff(int *arr, length n) { int sum = 0; | int *ptr = arr; int *end = arr + n; | while (ptr < end) { sum += ptr[1] - ptr[0]; | ptr += 2; } return sum; } | | At -O3, GCC 9.2 will emit SSE instructions. You can see | this yourself with Godbolt. | | Now, try replacing "int" with "unsigned". Neither of these | optimizations happen any more. You get neither | autovectorization nor pointer arithmetic. You get the | original loop, compiled in the most dumb way possible. | | I wouldn't read into the exact example here too closely. It | is true that you can often figure out a way to get the | optimizations back and still use unsigned types. However, | it is a bit easier if you work with signed types in the | first place. | | Speaking as someone who does some numerics work in C, there | is something of a "black art" to getting good numerics | performance. One easy trick is to switch to Fortran. No | joke! Fortran is actually really good at this stuff. If you | are going to stick with C, you want to figure out how to | communicate to the compiler some facts about your program | that are obvious to you, but not obvious to the compiler. | This requires a combination of understanding the compiler | builtins (like __builtin_assume_aligned, or | __builtin_unreachable), knowledge of aliasing (like use of | the "restrict" keyword), and knowledge of undefined | behavior. | | If you _need_ good performance out of some tight inner | loop, the easiest way to get there is to communicate to the | compiler the "obvious" facts about the state of your | program and check to see if the compiler did the right | thing. If the compiler did the right thing, then you're | done, and you don't need to use vector intrinsics, rewrite | your code in a less readable way, or switch to assembly. | | (Sometimes the compiler can't do the right thing, so go | ahead and use intrinsics or write assembly. But the | compiler is pretty good and you can get it to do the right | thing _most_ of the time.) | joosters wrote: | If the compiler knows that the loop will terminate in 'x' | iterations, it can do things like hoist some arithmetic out | of the loop. The simplest example would be if the code | inside the loop contained a line like 'counter++'. Instead | of executing 'x' ADD instructions, the binary can just do | one 'counter += x' add at the end. | iainmerrick wrote: | What I'm driving at is, if the loop really doesn't | terminate, it would still be safe to do that optimization | because the incorrectly-optimized code would never be | executed. | | I guess that doesn't necessarily help in the "+=2" case, | where you probably want the optimizer to do a "result += | x/2". | | In general, I'd greatly prefer to work with a compiler | that detected the potential infinite loop and flagged it | as an error. | tlb wrote: | Another approach would be a standard library of arithmetic | routines that signal overflow. | | If people used them while parsing binary inputs that would | prevent a lot of security bugs. | | The fact that this question exists and is full of wrong answers | suggests a language solution is needed: | https://stackoverflow.com/questions/1815367/catch-and-comput... | colanderman wrote: | You can enable this in GCC on a compilation unit basis with | `-fsanitize=signed-integer-overflow`. In combination with | `-fsanitize-undefined-trap-on-error`, the checks are quite | cheap (on x86, usually just a `jo` to a `ud2` instruction). | | (Note that while `-ftrapv` would seem equivalent, I've found | it to be less reliable, particularly with compile-time | checking.) | corndoge wrote: | And clang! | asveikau wrote: | Microsoft in particular has a simple approach to this with | things like DWordMult(). if | (FAILED(DWordMult(a, b, &product))) { // | handle error } | stephencanon wrote: | Clang and GCC's approach for these operations is even nicer | FWIW (__builtin_[add/sub/mul]_overflow(a, b, &c)), which | allow arbitrary heterogenous integer types for a, b, and c | and do the right thing. | | I know there's recently been some movement towards | standardizing something in this direction, but I don't know | what the status of that work is. Probably one of the folks | doing the AUA can update. | AaronBallman wrote: | We've been discussing a paper on this (http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2466.pdf) at recent | meetings and it's been fairly well-received each time, | but not adopted for C2x as of yet. | stephencanon wrote: | It feels like it would be a real shame to standardize | something that gives up the power of the Clang/GCC | heterogeneous checked operations. We added them in Clang | precisely because the original homogeneous operations | (__builtin_smull_overflow, etc) led to very substantial | correctness bugs when users had to pick a single common | type for the operation and add conversions. Standardizing | homogeneous operations would be worse than not addressing | the problem at all, IMO. There's a better solution, and | it's already implemented in two compilers, so why | wouldn't we use it? | | The generic heterogeneous operations also avoid the | identifier blowup. The only real argument against them | that I see is that they are not easily implementable in C | itself, but that's nothing new for the standard library | (and should be a non-goal, in my not-a-committee-member | opinion). | | Obviously, I'm not privy to the committee discussions | around this, so there may be good reasons for the choice, | but it worries me a lot to see that document. | wklieber wrote: | >the original homogeneous operations | (__builtin_smull_overflow, etc) led to very substantial | correctness bugs when users had to pick a single common | type for the operation and add conversions. | | Hi Stephen, thank you for bringing this to our attention. | David Svoboda and I are now working to revise the | proposal to add a supplemental proposal to support | operations on heterogeneous types. We are leaning toward | proposing a three-argument syntax, where the 3rd argument | specifies the return type, like: | ckd_add(a, b, T) | | where _a_ and _b_ are integer values and _T_ is an | integer type, in addition to the two-argument form | ckd_add(a, b) | | (Or maybe the two-argument and three-argument forms | should have different names, to make it easier to | implement.) | stephencanon wrote: | Glad to hear it, looking forward to seeing what you come | up with! The question becomes, once you have the | heterogeneous operations, is there any reason to keep the | others around (my experience is that they simply become a | distraction / attractive nuisance, and we're better off | without them, but there may be use cases I haven't | thought of that justify their inclusion). | wklieber wrote: | When David and I are done revising the proposal, we would | like to send you a copy. If you would be interested in | reviewing, can you please let us know how to get in touch | with you? David and I can be reached at | {svoboda,weklieber} @ cert.org. | | >once you have the heterogeneous operations, is there any | reason to keep the others around | | The two-argument form is shorter, but perhaps that isn't | a strong enough reason to keep it. Also, requiring a | redundant 3rd argument can provide an opportunity for | mistakes to happen if it gets out of sync with the type | of first two arguments. | | As for the non-generic functions (e.g., ckd_int_add, | ckd_ulong_add, etc.), we are considering removing them in | favor of having only the generic function-like macros. | rseacord wrote: | Take a look at N2466 2020/02/09 Svoboda, Towards Integer | Safety which has some support in the committee: | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2466.pdf | | (signal is a strong word... maybe indicate?) | cataphract wrote: | Signed overflow being undefined behavior allows optimizations | that wouldn't otherwise be possible | | Quoting http://blog.llvm.org/2011/05/what-every-c-programmer- | should-... | | > This behavior enables certain classes of optimizations that | are important for some code. For example, knowing that | INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true". | Knowing the multiplication "cannot" overflow (because doing so | would be undefined) allows optimizing "X*2/2" to "X". While | these may seem trivial, these sorts of things are commonly | exposed by inlining and macro expansion. A more important | optimization that this allows is for "<=" loops like this: | | > for (i = 0; i <= N; ++i) { ... } | | > In this loop, the compiler can assume that the loop will | iterate exactly N+1 times if "i" is undefined on overflow, | which allows a broad range of loop optimizations to kick in. On | the other hand, if the variable is defined to wrap around on | overflow, then the compiler must assume that the loop is | possibly infinite (which happens if N is INT_MAX) - which then | disables these important loop optimizations. This particularly | affects 64-bit platforms since so much code uses "int" as | induction variables. | rbultje wrote: | > for (i = 0; i <= N; ++i) { ... } | | The worst thing is that people take it as acceptable that | this loop is going to operate differently upon overflow (e.g. | assume N is TYPE_MAX) depending on whether i or N are signed | vs. unsigned. | JoeAltmaier wrote: | Is this a real concern, beyond 'experts panel' esoteric | discussion? Do folks really put a number into an int, that | is sometimes going to need to be exactly TYPE_MAX but no | larger? | | I've gone a lifetime programming, and this kind of stuff | never, ever matters one iota. | sdegutis wrote: | The very few times I've ever put in a check like that, I | always do something like i < MAX_INT - 5 just to be sure, | because I'm never confident that I intuitively understand | off-by-one errors. | btilly wrote: | Yes, people really do care about overflow. Because it | gets used in security checks, and if they don't | understand the behavior then their security checks don't | do what they expected. | | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475 shows | someone going hyperbolic over the issue. The technical | arguments favor the GCC maintainers. However I prefer the | position of the person going hyperbolic. | JoeAltmaier wrote: | That example was not 'overflow'. It was 'off by one'? | That seems uninteresting, outside as you say the security | issue where somebody might take advantage of it. | btilly wrote: | That example absolutely was overflow. The bug is, | "assert(int+100 > int) optimized away". | | GCC has the behavior that overflowing a signed integer | gives you a negative one. But an if tests that TESTS for | that is optimized away! | | The reason is that overflow is undefined behavior, and | therefore they are within their rights to do anything | that they want. So they actually overflow in the fastest | way possible, and optimize code on the assumption that | overflow can't happen. | | The fact that almost no programmers have a mental model | of the language that reconciles these two facts is an | excellent reason to say that very few programmers should | write in C. Because the compiler really is out to get | you. | JoeAltmaier wrote: | Sure. Sorry, I was ambiguous. The earlier example of ++i | in a loop I was thinking of. Anyway, yes, overflow for | small ints is a real thing. | gwd wrote: | So in a corner case where you have a loop that iterates over | all integer values (when does this ever happen?) you can | optimize your loop. As a consequence, signed integer | arithmetic is very difficult to write while avoiding UB, even | for skilled practitioners. Do you think that's a useful | trade-off, and do you think anything can be done for those of | us who think it's not? | zodiac wrote: | No, the optimizations referred to include those that will | make the program faster when N=100. | buckminster wrote: | N is a variable. It might be INT_MAX so the compiler cannot | optimise the loop for _any_ value of N. Unless you make | this UB. | andrepd wrote: | No, it's exactly the opposite. Without UB the compiler must | assume that the corner case may arise at any time. Knowing | it is UB we can assert `n+1 > n`, which without UB would be | true for all `n` except INT_MAX. Standardising wrap-on- | overflow would mean you can now handle that corner case | safely, at the cost of missed optimisations on everything | else. | vermilingua wrote: | I hadn't understood the utility of undefined behaviour | until reading this, thank you. | rbultje wrote: | I/we understand the optimization, and I'm sure you | understand the problem it brings to common procedures | such as DSP routines that multiply signed coefficients | from e.g. video or audio bitstreams: | | for (int i = 0; i < 64; i++) result[i] = inputA[i] * | inputB[i]; | | If inputA[i] * inputB[i] overflowed, why are my credit | card details at risk? The question is: can we come up | with an alternate behaviour that incorporates both | advantages of the i<=N optimization, as well as leave my | credit card details safe if the multiplication in the | inner loop overflowed? Is there a middle road? | qppo wrote: | Another problem is that there's no way to define it, | because in that example the "proper" way to overflow is | with saturating arithmetic, and in other cases the | "proper" overflow is to wrap. Even on CPUs/DSPs that | support saturating integer arithmetic in hardware, you | either need to use vendor intrinsics or control the | status registers yourself. | jononor wrote: | One could allow the overflow behavior to be specified, | for example on the scope level. Idk, with a #pragma ? | #pragma integer-overflow-saturate | gwd wrote: | I'd almost rather have a separate "ubsigned" type which | has undefined behavior on overflow. By default, integers | behave predictably. When people really need that extra 1% | performance boost, they can use ubsigned just in the | cases where it matters. | qppo wrote: | I don't know if I agree. Overflow is like uninitialized | memory, it's a bug almost 100% of the time, and cases | where it is tolerated or intended to occur are the | exception. | | I'd rather have a special type with defined behavior. | That's actually what a lot of shops do anyways, and there | are some niche compilers that support types with defined | overflow (ADI's fractional types on their Blackfin tool | chain, for example). It's just annoying to do in C, this | is one of those cases where operator overloading in C++ | is really beneficial. | gwd wrote: | > I don't know if I agree. Overflow is like uninitialized | memory, it's a bug almost 100% of the time, and cases | where it is tolerated or intended to occur are the | exception. | | Right, but I think the problem is that UB means | _literally anything_ can happen and be conformant to the | spec. If you do an integer overflow, and as a result the | program formats your hard drive, then it is acting within | the C spec. | | Now compiler writers don't usually format your hard drive | when you trigger UB, but they often do things like remove | input sanitation or other sorts of safety checks. It's | one thing if as a result of overflow, the number in your | variable isn't what you thought it was going to be. It's | completely different if suddenly safety checks get tossed | out the window. | | When you handle unsanitized input in C on a security | boundary, you must literally treat the compiler as a | "lawful evil" accomplice to the attackers: you must | assume that the compiler will follow the spec to the | letter, but will look for any excuse to open up a gaping | security hole. It's incredibly stressful if you know that | fact, and incredibly dangerous if you don't. | thayne wrote: | Have you considered adding intrinsic functions for | arithmetic operations that _do_ have defined behavior on | overflow. Such as the overflowing_* functions in rust? | _kst_ wrote: | > Now that C2x plans to make two's complement the only sign | representation, is there any reason why signed overflow has to | continue being undefined behavior? | | I presume you'd want signed overflow to have the usual | 2's-complement wraparound behavior. | | One problem with that is that a compiler (probably) couldn't | warn about overflows that are actually errors. | | For example: int n = INT_MAX; /* ... | */ n++; | | With integer overflow having undefined behavior, if the | compiler can determine that the value of n is INT_MAX it can | warn about the overflow. If it were defined to yield INT_MIN, | then the compiler would have to assume that the wraparound was | what the programmer intended. | | A compiler _could_ have an option to warn about detected | overflow /wraparound even if it's well defined. But really, how | often do you _want_ wraparound for signed types? In the code | above, is there any sense in which INT_MIN is the "right" | answer for any typical problem domain? | enriquto wrote: | > In the code above, is there any sense in which INT_MIN is | the "right" answer for any typical problem domain? | | There is no answer different that INT_MIN that would be right | and make sense, i.e. the natural properties of the + operator | (associativity, commutativity) are respected. Thus, by want | of another possibility, INT_MIN is precisely _the_ right | answer to your code. | | I read your code and it seems to me very clear that INT_MIN | is exactly what the programmer intended. | colanderman wrote: | Beside optimization (as others have pointed out), disallowing | wrapping of signed values has the important safety benefit that | it permits run-time (and compile-time) _detection_ of | arithmetic overflow (e.g. via -fsanitize=signed-integer- | overflow). If signed arithmetic were defined to wrap, you could | not enable such checks without potentially breaking existing | correct code. | wyldfire wrote: | Could we instead just have standard-defined integer types which | saturate or trap on overflow? | | Sometimes you're writing code where it really, really matters | and you're more than willing to spend the extra cycles for | every add/mul/etc. Having these new types as a portable idiom | would help. | rseacord wrote: | There was a proposal for a checked integer type that you | might want to look at: | | N2466 2020/02/09 Svoboda, Towards Integer Safety | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2466.pdf | | The committee asked the proposers for further work on this | effort. | | Integer types that saturate are an interesting idea. Because | signed integer overflow is undefined behavior, | implementations are not prohibited from implementing | saturation or trapping on overflow. | hyc_symas wrote: | Eh? I thought that would only be "legal" if it was | specified to be implementation-defined behavior. Which | would, frankly, be perfectly good. But since it is | specified as _undefined_ behavior, programmers are | forbidden to use it, and compilers assume it doesn 't | happen/doesn't exist. | | The entire notion that "since this is undefined behavior it | does not exist" is the biggest fallacy in modern compilers. | DougGwyn wrote: | The rule is: If you want your program to conform to the C | Standard, then (among other things) your program must not | cause any case of undefined behavior. Thus, if you can | arrange so that instances of UB will not occur, it | doesn't matter that identical code under different | circumstances could fail to conform. The safest thing is | to make sure that UB cannot be triggered _under any | circumstances_ ; that is, defensive programming. | rseacord wrote: | Maybe someone else can respond to this as well, but I feel like | the primary reason signed overflow is still undefined behavior | is because so many optimizations depend upon the undefined | nature of signed integer overflow. My advice has always been to | use unsigned integer types when possible. | | Personally, I would like to get rid of many of the trap | representations (e.g., for integers) because there is no | existing hardware in many cases that supports them and it gives | implementers the idea that uninitialized reads are undefined | behavior. | | On the other hand, I just wrote a proposal to WG14 to make | zero-byte reallocations undefined behavior that was unanimously | accepted for C2x. | [deleted] | BeeOnRope wrote: | When deciding on standardized behavior for C operations or data | representation that may favor some hardware over others [1], who | argues the side of the various hardware vendors, if they have no | members on the standardization committee? | | Is it fair to assume that hardware-related decisions occur in an | environment where members who are sponsored by vendors argue | their employers case, rather an a neutral one? | | --- | | [1] E.g., because some hardware's behavior may more naturally | implement the operation. | AaronBallman wrote: | > When deciding on standardized behavior for C operations or | data representation that may favor some hardware over others | [1], who argues the side of the various hardware vendors, if | they have no members on the standardization committee? | | The C committee has a number of implementation vendors on it | (GCC, Clang, IBM, Intel, sdcc, etc) and these folks do a good | job of speaking up about the hardware they have to support (and | in some cases, they're also the hardware vendor). If needed, we | will also research hardware from vendors who have no active | representation on the committee, but this is usually for more | broad changes like "can we require 2's complement?". | | > Is it fair to assume that hardware-related decisions occur in | an environment where members who are sponsored by vendors argue | their employers case, rather an a neutral one? | | In my experience, the committee members typically do a good job | of differentiating between "this is my opinion" and "this is my | employer's opinion" during discussions where that matters. | However, at the end of the day, each committee member is there | representing some constituency (whether it's themselves or | their company) and votes their own conscience. | BeeOnRope wrote: | Thanks for your quick and honest answer. | ori_b wrote: | What do you think of a variant on this? | | https://blog.regehr.org/archives/1180 | dang wrote: | pascal_cuoq cowrote it. Maybe we should ask him if his views | have changed since then. | | Btw, there was a thread about it at the time: | https://news.ycombinator.com/item?id=8233484. | pascal_cuoq wrote: | Thanks Dan, I missed this question in the heat of the moment. | pascal_cuoq wrote: | I still want to write at least one sequel to that post, on the | theme "Alright, can we make a Friendly C Compiler by disabling | the annoying optimizations, then?". | | Obviously the people who want a Friendly C Compiler do not want | to disable _all_ optimizations. This would be easy to do, but | these users do not want the stupid 1+2+16 expressions in their | C programs, generated through macro-expansion, to be compiled | to two additions with each intermediate result making a round- | trip through memory. | | So the question is: can we get a Friendly C Compiler by | enabling only the Friendly optimizations in an unfriendly | compiler? | | And for the answer to that, I had to write an entire other blog | post as preparation, to show that there are some assumptions an | optimizing compiler can do: | | - that may be used in one or several optimizations, but the | compiler authors did not really keep track of where they were | used, | | - that cannot be disabled and that the compiler maintainers | will not consider having an option to disable, | | - and that are definitely unfriendly. | | Here is the URL of the blog post that I had to write in | preparation for the upcoming blog post about getting ourselves | a Friendly C Compiler: https://trust-in- | soft.com/blog/2020/04/06/gcc-always-assumes... . I recommend | you take a look, I think it is interesting in itself. | | You will have guessed that I'm not optimistic about the | approach. We can try to maintain a list of friendly | optimizations for ourselves, though, even if the compiler | developers are not helping. This might still be less work that | maintaining a C compiler. | ori_b wrote: | > _Here is the URL of the blog post that I had to write in | preparation for the upcoming blog post about getting | ourselves a Friendly C Compiler:https://trust-in- | soft.com/blog/2020/04/06/gcc-always-assumes.... . I recommend | you take a look, I think it is interesting in itself._ | | So, it's definitely interesting -- I think a lot of odd stuff | you can do should probably be undefined. Eliminating pointer | accesses after a null check sounds A-ok to me, because your | program should never dereference null. | | Another interesting thought is requiring more of these things | that lead to miscompilation to produce compile time | diagnostics. | 0x09 wrote: | Not about the language exactly, so maybe not fair game, but: how | did you all find yourselves joining ISO? And maybe more | generally, what's the path for someone like a regular old | software engineer to come to participate in the standardization | process for something as significant and ubiquitous as the C | programming language? | AaronBallman wrote: | Great question! | | Joining the committee requires you to be a member of your | country's national body group (in the US, that's INCITS) and | attend at least some percentage of the official committee | meetings, and that's about it. So membership is not difficult, | but it can be expensive. Many committee members are sponsored | by their employers for this reason, but there's no requirement | that you represent a company. | | I joined the committees because I have a personal desire to | reduce the amount of time it takes developers to find the bugs | in their code, and one great way to reduce that is to design | features to make harder to write the bugs in the first place, | or to turn unbounded undefined behavior into something more | manageable. Others join because they have specific features | they want to see adopted or want to lend their domain expertise | in some area to the committee. | johannes1234321 wrote: | Related to that: C++ standards body seems to be quite open | allowing non-members to participate (outside official votes, | while respecting them when looking for consensus) is it just | due to my limited observation or is the C group less open? | Any plans in that regard? | msebor wrote: | Most of us on the committee would like to see more | participation from other experts. The committee's mailing | list should be open even to non-members. Attendance by non- | members at meetings might require an informal invitation (I | imagine a heads up to the convener should do it). | DougGwyn wrote: | I think that's right. These days, much of the discussion | occurs through study subgroups (like the floating-point | guys) and the committee e-mailing list. | AaronBallman wrote: | I would love to see more open interactions between the | broader C community and the WG14 committee. One of the | plans I am currently working on is an update to the | committee's webpage to at least make it more obvious as to | how you can get involved. The page isn't ready to go live | yet, but will hopefully be in shape soon. | Lucasoato wrote: | Is there any new programming language that you particularly love? | Do you like the way programming is evolving? | pascal_cuoq wrote: | As a member of the development team for a C static analyzer, I | use OCaml, which is also my favorite programming language, but | that is because I'm from the generation in which it was the new | thing (I learnt it when it had the same level of maturity as | Rust, at a time when Rust didn't exist). It helps that it's | perfect for writing compilers and static analyzers. | | There are a lot of problems that seem a good match for Rust, | and Rust is first in my list of programming languages I will | never find the time to learn but wish I could. | artursapek wrote: | Why won't you ever find time? It should only take a good 20 | hours of reading and playing with code before you start to | grok it. | rseacord wrote: | I spent the early part of my career bragging about how many | programming languages I knew, and the later part of my | career complaining about how I don't know any of them well | enough. | artursapek wrote: | I certainly wouldn't go for quantity there, but if you | really want to learn Rust you should. It brings some | groundbreaking new ideas to programming and is more than | "just another language". | hsivonen wrote: | What's the current committee thinking on providing locale- | independent conversions from potentially-invalid UTF-8 to valid | UTF-8, from potentially-invalid UTF-8 to valid UTF-16, and from | potentially-invalid UTF-16 to valid UTF-8 (i.e. replacing ill- | formed sequences with yhe REPLACEMENT CHARACTER)? | DougGwyn wrote: | If you changed UTF-16 to UTF-32 or UCS-4 I'd support it. I | think there are already implementations that use the | replacement character for all "impossible" codes. | brainzap wrote: | How do you join three float values into a comma separated string, | and then split it again? | emilfihlman wrote: | Not sure what you mean but would s8 | buf[enoughspace]; snprintf(buf, sizeof(buf), "%f,%f,%f", | your, three, values); sscanf(buf, "%f,%f,%f", &your, | &three, &values); | | Do the job? | jpfr wrote: | Quite a few new languages generate C code for the "backend" of | their compiler. For example ATS and the ZZ language. | | This helps bringing these languages to embedded targets with | closed toolchains (with an existing C compiler). | | Will there be developments to use a subset of C as a "portable | assembly" in a standard way? Like there is WebAssembly for | JavaScript. | msebor wrote: | That doesn't seem likely. There have been no proposals for | anything like it and there is a general resistance to | subsetting either C or C++ (the exception being making support | for new features optional). | Tronic2 wrote: | Why is the struct tm* returned by localtime() not thread-local | like errno and other similar variables are (at least in | implementations)? Do you have any plans to improve calendar | support for practical uses? | pascal_cuoq wrote: | Both question would get better answers if they were asked to a | panel of experts on POSIX (which could including members of the | POSIX standardization committee). | | For the first one, I can attempt a guess: maybe it was feared | that making the result of localtime thread-locale would break | some programs? You could build such a program on purpose, | although I am not clear how frequently one would write one by | accident. | | Anyway, localtime_r is the function that one should use if one | is concerned by thread-safety. A more likely answer is that no | Unix implementation bothered to fix localtime because the | proper fix was for programs to call localtime_r. | emilfihlman wrote: | Thank you for taking time to take questions! | | Have you ever considered or will you consider deprecating char, | int, long, (s)size_t, float, double and etc in favour of specific | length types? | | Will you ever add / have you considered adding [su]\d+ and f\d+ | as synonyms for those mentioned stdint.h? | | Since char is signed on most platforms, arm eabi being an | exception and even there it's really just a matter of compile | time flags, will you ever just drop char from being able to be | either and just say it's signed, as int is also signed? | | Will you ever define / have you considered defining signed | overflow behaviour? | rseacord wrote: | I don't think we'll ever deprecate char, int, long, float, | double, or size_t. ssize_t is not part of the C Standard, and | hopefully never will be as it is a bit of an abomination. The | main driver behind the evolution of the C Standard is not to | break existing code written in C, because the world largely | runs on C programs. | | C does provide fixed width types like uint8_t, uint16_t, | uint32_t, and uint64_t. These are optional types because they | can't be implemented on implementations that don't have the | appropriate word sizes. We also have required types such as | | uint_least16_t uint_least32_t uint_least64_t uint_least8_t | emilfihlman wrote: | >The main driver behind the evolution of the C Standard is | not to break existing code written in C, because the world | largely runs on C programs. | | If not deprecate, then at least make fixed width types as | equivalent members to them, ie all char based apis should | accept s8 (typedef signed char s8) and all int based apis | should accept s32. | rseacord wrote: | Well, there are number of problems with this proposal. For | example, if your implementation defines int as a 16-bit | type (which is permitted for by the standard) and you pass | an int32_t, the value you pass maybe truncated if it is | outside of the range of the narrower type. When | programming, it is best to match the type of the API of the | function you are calling for portability. | [deleted] | nicoburns wrote: | Are there any plans to "clean up C"? A lot of effort has been put | into alternative languages, which are great, but there is still a | lot of momentum with C, and it seems that a lot of improvements | that could be done in a backwards compatible way and without | introducing much in the way of complexity. For example: | | - Locking down some categories of "undefined behaviour" to be | "implementation defined" instead. | | - Proper array support (which passes around the length along with | the data pointer). | | - Some kind of module system, that allows code to be imported | with the possibility of name collisions. | metalforever wrote: | What does clean up c mean? | ori_b wrote: | > - Some kind of module system, that allows code to be imported | with the possibility of name collisions. | | That doesn't particularly need modules -- just some form of | namespace foo { } | dooglius wrote: | You can very easily make a struct consisting of a pointer and | length, is adding such a thing to the standard really a big | deal? Personally, I don't see a problem with passing two | arguments. | Someone1234 wrote: | - In your example there's no guarantee that the length will | be accurate, or that the data hasn't been modified | independently elsewhere in the program. | | - In other words you've created a fantastic shoe-gun. One | update line missed (either length or data, or data re-used | outside the struct) and your "simple" struct is a huge | headache, including potential security vulnerabilities. | | - Re-implementing a common error prone thing is exactly what | language improvements should target. | rwmj wrote: | I mean, this is C so "fantastic shoe-gun" is part of the | territory. But in C you can wrap this vector struct in an | abstract data type to try to prevent callers from breaking | invariants. | dooglius wrote: | >In your example there's no guarantee that the length will | be accurate, or that the data hasn't been modified | independently elsewhere in the program. | | And having a special data-and-length type would make these | guarantees... how? You're ultimately going to need to be | able to create these objects from bare data and length | somehow, so it's a case of garbage-in-garbage-out. | gbear605 wrote: | Declaring it with a custom struct: int | raw_arr[4] = {0,0,0,0}; struct SmartArray arr; | arr.length = 4; arr.val = raw_arr; | some_function(arr); | | Smart declaration with custom type: (assume that they'll | come up with a good syntax) | smart_int_arr arr[4] = {0,0,0,0}; | some_function(arr); | | With the custom struct, it requires the number `4` to be | typed twice manually, while in the second it only needs a | single input. | FpUser wrote: | In Delphi/FreePascal there are dynamic arrays (strings | included) that are in fact fat pointers that hide inside | more info than just length. All opaque types and work just | fine with automatic lifecycle control and COW and whatnot. | msebor wrote: | There are "projects" underway to clean up the spec where it's | viewed as either buggy, inconsistent, or underspecified. The | atomics and threads sections are a coupled of example. | | There are efforts to define the behavior in cases where | implementations have converged or died out (e.g., twos | complement, shifting into the sign bit). | | There have been no proposals to add new array types and it | doesn't seem likely at the core language level. C's charter is | to standardize existing practice (as opposed to invent new | features), and no such feature has emerged in practice. Same | for modules. (C++ takes a very different approach.) | bear8642 wrote: | >clean up the spec | | Would this involve further specification of bitfields? Feel | implementation defined nature of bitfields limits potential | DougGwyn wrote: | Actually there was no need to disenfranchise non-twos- | complement architectures. Now that SIMH has a CDC-1700 | emulation, I had planned on producing a C system for it as an | example for students who have never seen such a model. | rkangel wrote: | > C's charter is to standardize existing practice (as opposed | to invent new features) | | Passing a pair of arguments (pointer and a length) is surely | one of the more universal conventions among C programmers? | cperciva wrote: | When they say "existing practice" they mean things already | implemented in compilers -- not existing practice among | developers. | apotheon wrote: | This seems like a poor way to establish criteria for | standardization. It essentially encourages non-standard | practice and discourages portable code by saying that to | improve the language standard we have to have mutually | incompatible implementations. | | It has been said that design patterns (not just in the | GOF sense of the term) are language design smells, | implying that when very common patterns emerge it is a de | facto popular-uprising call for reform. That, to me, is a | more ideal criterion for updating a language standard, | but practiced conservatively to avoid too much movement | too fast or too much language growth. | | On the other hand, I think you might be close to what | they meant by "existing practice". I'm just disappointed | to find that seems like the probable case (though I think | it might also include some convergent evolutionary | library innovations by OS devs as well as language | features by compiler devs). | cperciva wrote: | One of the principles for the C language is that you | should be able to use C on pretty much any platform out | there. This is one of the reasons that other languages | are often written in C. | | In order to uphold that principle, it's important that | the standard consider not just "is this useful" but "is | this going to be reasonably straightforward for compiler | authors to add". Seeing that people have already | implemented a feature helps C to avoid landing in the | "useful feature which nobody can use because it's not | widely available" trap. (For example, C99 made the | mistake of adding floating-point complex types in | <complex.h> -- but these ended up not being widely | implemented, so C11 backed that out and made them an | optional feature.) | jschwartzi wrote: | What is your definition of "portable"? Are you using that | term to mean "code I write for one platform can run | without modification on other platforms" or "the language | I use for one platform works on other platforms"? | | I think when you get down to the level of C you're | looking at the latter much more than the former. C is | really more of a platform-agnostic assembler. It's not a | design smell to have conventions within the group of | language users that are de-facto language rules. For | reference, see all the PEP rules about whitespace around | different language constructs. These are not enforced. | | The whole point of writing a C program is to be close to | the addressable resources of the platform, so you'd | probably want to expose those low-level constructs unless | there's a compelling reason not to. Eliminating an | argument from a function by hiding it in a data structure | is not that compelling to me since I can just do that on | my own. And then I can also pass other information such | as the platforms mutex or semaphore representation in the | same data structure if I need to. | | By the way, that convenient length+pointer array requires | new language constructs for looping that are effectively | syntactic sugar around the for loop. Or you need a way to | access the members of the structure. And syntactic sugar | constrains how you can use the construct. So I'm not sure | that it adds anything to the language that isn't already | there. And the fact that length+pointer is such a common | construct indicates that most people don't have any | issues with it at all once they learn the language. | nabla9 wrote: | > no such feature has emerged in practice | | Arrays with length constantly emerge among C users and | libraries. They are just all incompatible because without | standardization there is no convergence. | rseacord wrote: | Sounds like a good use of standardization. If there is | existing implementation practice, please go ahead and | submit a proposal. I would be happy to champion such a | proposal if you can't attend in person. | nabla9 wrote: | It was an observation, not suggestion. | | When the language standardization body has not managed to | add arrays with length in 48 years, I don't think it | should be added at this point. The culture is backward | looking and incompatible with modern needs and people | involved are old and incompatible with the future (no | offense, so am I). | | C standardization effort should focus on finishing the | language, not developing it to match modern world. I have | programmed with C over 20 years, since I was a teenager. | It's has long been the system programming language I'm | most familiar with. For the last 10 years I have never | written an executable. Just short callable functions from | other languages. Python, Java, Common Lisp, Matlab, and | 'horrors or horrors' C++. | | I think Standard C's can live next 50 years in gradual | decline as portable assembler called from other languages | and compilation target. | | If I would propose new extension to C language, I would | propose completely new language that can be optionally | compiled into C and works side by side with old C code. | apotheon wrote: | > If I would propose new extension to C language, I would | propose completely new language that can be optionally | compiled into C and works side by side with old C code. | | There are a few somewhat popular languages that fit that | description already, and none of them are suitable | replacements for C (as far as I've seen). That's not to | say there couldn't be a suitable replacement -- just that | nobody in a position to do something about it wants the | suitable replacement enough for it to have emerged, | apparently. | | I suspect the first really suitable complete replacement | for C would be something like what Checked C [1] tried to | be, but a little more ambitious and willing to include | wholly new (but perhaps backward-compatible) features | (like some of those you've proposed) implemented in an | interestingly new enough way to warrant a whole new | compile-to-C implementation. Something like that could | greatly improve the use cases where a true C replacement | would be most appreciated, and still fit "naturally" into | environments where C is already the implementation | language of choice via a piecemeal replacement strategy | where the first step is just using the new language's | compiler as the project compiler front end's drop-in | replacement (without having to make any changes to the | code at all for this first step). | | 1: https://www.microsoft.com/en- | us/research/project/checked-c/ | xtian wrote: | Sounds like you are describing Zig. https://ziglang.org | simias wrote: | I think the problem is that C is simply ill-suited for | these "high level" constructs. The best you're likely to | get is an ad-hoc special library like for wchar_t and | wcslen and friends. Do we really want that? | | I'd argue that linked list might make a better candidate | for inclusions, because I've seen the kernel's list.h or | similar implementations in many projects and that's stuff | is trickier to get right than stuffing a pointer and a | size_t in a struct. | ATsch wrote: | typedef struct {uint8_t *data; size_t len;} ByteBuf; is the | first line of code I write in a C project. | kkdwivedi wrote: | Another option is a struct with a FAM at the end. | typedef struct { size_t len; uint8_t | data[]; } ByteBuf; | | Then, allocation becomes ByteBuf *b = | malloc(sizeof(*b) + sizeof(uint8_t) * array_size); | b->len = array_size; | | and data is no longer a pointer. | ATsch wrote: | Well, your ByteBuf is still a pointer. You also now need | to dereference it to get the length. It also can't be | passed by value, since it's very big. You can also not | have multiple ByteBufs pointing at subsections of the | same region of memory. | | Thing is, you rarely want to share just a buffer anyway. | You probably have additional state, locks, etc. So what I | do is embed my ByteBuf directly into another structure, | which then owns it completely: typedef | struct { ... ByteBuf mybuffer; | ... } SomeThing; | | So we end up with the same amount of pointers (1), but | with some unique advantages. | saagarjha wrote: | sizeof(ByteBuf) == sizeof(size_t), and you _can_ pass it | by value; I just don 't think you can do anything useful | with it because it'll chop off the data. | kkdwivedi wrote: | Right, totally depends on what you're doing. My example | is not a good fit for intrusive use cases. | kevin_thibedeau wrote: | This will an alignment problem on any platform with data | types larger than size_t. You'd need an | alignas(max_align_t) on the struct. At which point some | people are going to be unhappy about the wasteful padding | on a memory constrained target. | enriquto wrote: | That's a really bizarre layout for your struct. Why don't | you put the length first? | ATsch wrote: | I'm not sure if it matters. It might be better for some | technical reason, such as speeding up double | dereferences, because you don't need to add anything to | get to the pointer. But to be honest I just copied it out | of existing code. | saagarjha wrote: | Most platforms have instructions for dereferencing with a | displacement. | twic wrote: | Why would it matter? The bytes aren't inline, this is | just a struct with two word-sized fields. | | A possible tiny advantage for this layout is that a | pointer to this struct can be used as a pointer to a | pointer-to-bytes, without having to adjust it. Although | i'm not sure that's not undefined behaviour. | scythe wrote: | >There have been no proposals to add new array types and it | doesn't seem likely at the core language level. | | One alternative to adding types is to allow enforcing | consistency in some structs with the trailing array: | struct my_obj { const size_t n; //other | variables char text[n]; }; | | where for simplicity you might only allow the first member to | act as a length (and it must of course be constant). The | point is that then the initializer: struct | my_obj b = {.n = 5}; | | should produce an object of the right size. For heap | allocation you could use something like: | void * vmalloc(size_t base, size_t var, size_t cnt) { | void *ret = malloc(base + var * cnt); if (!ret) | return ret; * (size_t *) ret = cnt; | return ret; } | jschwartzi wrote: | I would love this. | 7532yahoogmail wrote: | I agree which brought me into looking at Zig. A future version | of C might disallow macros, preprocessor, disallow circular | libraries, include a module system, but allow importing legacy | libs like Zig. Also something like llvm so we can automatically | do static analysis, transforms would be great. | rseacord wrote: | I think we are always looking at ways to "clean up C" but that | this has to be done very carefully not to break existing code. | For example, the committee recently voted to remove support for | function definitions with identifier lists from C2x | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf At | least one vendor was not very happy with this decision. | | Undefined behaviors tend to be undefined for a reason and | shouldn't be thought of as defects in the standard. In my years | on the committee, I have always argued to define as much | behavior as possible and to as narrowly define undefined | behaviors as possible. | | We also had a recent discussion about adding additional name | spaces (when discussing reserved identifiers), but it didn't | gain much traction. | bumblebritches5 wrote: | Looks like that proposal is dropping support for K&R function | declarations, is that right? | rseacord wrote: | yes, that is correct. | tropo wrote: | C has strayed very far from the original intent because | compiler authors prioritized benchmark results at the expense | of real-world use cases. This bad trend needs to be reversed. | | Consider signed integer overflow. | | The intent wasn't that the compiler could generate nonsense | code if the programmer overflowed an integer. The intent was | the the programmer could determine what would happen by | reading the hardware manual. You'd wrap around if the | hardware naturally would do so. On some other hardware you | might get saturation or an exception. | | In other words, all modern computers should wrap. That | includes x86, ARM, Power, Alpha, Itanium, SPARC, and just | about everything else. I don't believe you can even buy non- | wrapping hardware with a C99 or newer compiler. Since this is | likely to remain true, there is no longer any justification | for retaining undefined behavior that is getting abused to | the detriment of C users. | 3JPLW wrote: | Does it concern you how aggressively compiler teams are | exploiting UB? | Spivak wrote: | You do have to understand that compiler teams aren't saying | something like "this triggers UB, quick just replace it | with noop." It's just something that naturally happens when | you need to reason about code. | | For example, consider a very simple statement. | let array[10]; let i = some_function(); | print(array[i]); | | The function might not even be known to the compiler at | compilation time if it was from a DLL or something. | | But the compiler is like "hey! you used the result of this | function as an index for this array! i must be in the range | [0, 10)! I can use that information!" | gwd wrote: | > But the compiler is like "hey! you used the result of | this function as an index for this array! i must be in | the range [0, 10)! I can use that information!" | | As a developer who has seen lots of developers (including | himself) make really dumb mistakes, this seems like a | very strange statement. | | Imagine if you hired a security guard to stand outside | your house. One day, he sees you leave the house and | forget to lock the door. So he reasons, "Oh, nothing | important inside the house today -- guess I can take the | day off", and walks off. That's what a lot of these "I | can infer X must be true" reasonings sounds like to me: | they assume that developers don't make mistakes; and that | all unwanted behavior is exactly the same. | | So suppose we have code that does this: | int array[10]; int i = some_function(); | /* Lots of stuff */ if ( i > 10 ) { return | -EINVAL; } array[i] = newval; | | And then someone decides to add some optional debug | logging, and forgets that `i` hasn't been sanitized yet: | int array[10]; int i = some_function(); | logf("old value: %d\n", array[i]); /* Lots of | stuff */ if ( i > 10 ) { return | -EINVAL; } array[i] = newval; | | Now _reading_ `array[i]` if `i` > 10 is certainly UB; | but in a lot of cases, it will be harmless; and in the | worst case it will crash with a segfault. | | But suppose a clever compiler says, "We've accessed | array[i], so I can infer that i < 10, and get rid of the | check entirely!" Now we've changed an out-of-bounds | _read_ into an out-of-bounds _write_ , which has changed | worst-case a DoS into a privilege escalation! | | I don't know whether anything like this has ever | happened, but 1) it's certainly the kind of thing allowed | by the spec, 2) it makes C a much more dangerous language | to deal with. | asveikau wrote: | > in a lot of cases, it will be harmless; and in the | worst case it will crash with a segfault. | | I am not sure if a segfault is always the worst case. It | could be by some coincidence that array[i] contains some | confidential information [maybe part of a private key? 32 | bits of the user's password?] and you've now written it | to a log file. | | I know it's hard to imagine a mis-read of ~32 bits would | have bad consequences of that sort, but it's not out of | the question. | btilly wrote: | Per https://lwn.net/Articles/575563/, Debian at one point | found that 40% of the C/C++ programs that they have are | vulnerable to known categories of undefined behavior like | this which can open up a variety of security holes. | | This has been accepted as what to expect from C. All | compiler authors think it is OK. People who are aware of | the problem are overwhelmed at the size of it and there | is no chance of fixing it any time soon. | | The fact that this has become to be seen as normal and | OK, is an example of _Normalization of Deviance_. See | http://lmcontheline.blogspot.com/2013/01/the- | normalization-o... for a description of what I mean. And | deviance will continue to be normalized right until | someone writes an automated program that walks through | projects, finds the surprising undefined behavior, and | tries to come up with exploits. After project after | project gets security holes, perhaps the C language | committee will realize that this really __ISN 'T __okay. | | And the people who already migrated to Rust will be | laughing their asses off in the corner. | msebor wrote: | This is a good example. Let me flesh it out a bit more to | illustrate a specific instance of this problem: | int a[2][2]; int f (int i, int j) { | int t = a[1][j]; a[0][i] = 0; // | cannot change a[1] return a[1][j] - t; // | can be folded to zero } | | The language says that elements of the matrix a must only | be accessed by indices that are valid for each bound, so | compilers can and some do optimize code based on that | requirement (see https://godbolt.org/z/spSF8e). | | But when a program breaks that requirement (say, by | calling f(2, 0)) the function will likely return an | unexpected value. | Spivak wrote: | But I don't know what you want to happen in this case? If | you actually call f(2,0) then the program makes no sense. | How can you have an expected value for a function call | that violates its preconditions? | rseacord wrote: | I would say that there is a lot of concern in the committee | about how compilers are optimizing based on pointer | providence. There has been a study group looking at this. | It now appears that they are likely to publish their | proposal as a Technical Report. | _kst_ wrote: | "based on pointer providence" | | I think you meant "provenance" (mentioning it for the | sake of anyone who wants to search for it). | rseacord wrote: | Yes, my mistake--I was thinking of Rhode Island. I wrote | a short bit about this at | https://www.nccgroup.trust/us/about-us/newsroom-and- | events/b... if anyone is interested. | revertts wrote: | What's the best way to keep an eye out for that TR? | Periodically checking http://www.open- | std.org/jtc1/sc22/wg14/ ? | | I can't ever tell if I'm looking in the right place. :) | AaronBallman wrote: | If you're interested in the final TR, I would imagine | we'd list it on that page you linked. If you're | interested in following the drafts before it becomes | published, you'd fine them on http://www.open- | std.org/jtc1/sc22/wg14/www/wg14_document_log... (A draft | has yet to be posted, though, so you won't find one there | yet.) | msebor wrote: | This is a common misconception (or poor way of phrasing it, | sorry). Compiler implementers don't go looking for | instances of undefined behavior in a program with the goal | of optimizing it in some way. There is little value in | optimizing invalid code. The opposite is the case. | | But we must write code that relies on the same rules and | requirements that programs are held to (and vice versa). | When either party breaks those rules, either accidentally | or deliberately, bad things happen. | | What sometimes happens is that code written years or | decades ago relies on the absence of an explicit guarantee | in the language suddenly stops working because a compiler | change depends on the assumption that code doesn't rely on | the absence of the guarantee. That can happen as a result | of improving optimizations, which is often but not not | necessarily always motivated by improving the efficiency of | programs. Better analysis can also help find bugs in code | or avoid issuing warnings for safe code. | ori_b wrote: | There are rules and requirements documented in the spec, | and there are de-facto rules and requirements that | programs expect. Not only that, but when they _do_ | exploit these rules, often the code generated is | obviously incorrect, and could have been flagged at | compile time. | | Right now, it seems like compiler vendors are playing a | game of chicken with their users. | cwzwarich wrote: | > This is a common misconception (or poor way of phrasing | it, sorry). Compiler implementers don't go looking for | instances of undefined behavior in a program with the | goal of optimizing it in some way. There is little value | in optimizing invalid code. The opposite is the case. | | Compilers do deliberately look to optimize loops with | signed counters by exploiting UB to assume that they will | never wrap. | Leherenn wrote: | Well yes, they assume they never wrap because that is not | allowed by the language, by definition. UB are the | results of broken preconditions at the language level. | qznc wrote: | I'd say both statements are correct. | | Compiler implementers are happy when they don't have to | care about some edge case because then the code is | simpler. Thus, only for unsigned counters there is the | extra logic to compile them correctly. | | That is my interpretation of "The opposite is the case". | Writing a compiler is easier with lots of undefined | behavior. | ximeng wrote: | Why would a vendor be unhappy about that? They have a large | library using this deprecated syntax? Or many customers? It | seems like a relatively easy fix to existing code. | AaronBallman wrote: | The usual argument is: once you've verified some piece of | code is correct, changing it (even when there should be no | functional change in the semantics) carries risk. Some | customers have C89-era code that compiles in C17 mode and | they don't want to change that code because of these risks | (perhaps the cost of testing is prohibitively expensive, | there may be contractual obligations that kick in when | changing that code, etc). | rmind wrote: | Well, one argument is that the vendors should not compile | C89 code as C17. If you write C89, then stick with | -std=c89 (or upgrade to the latest officially compatible | revision). | | It makes sense to preserve language compatibility within | several language revisions, gradually sunsetting some | features, but why do that for the eternity? Gradual de- | supporting would push the problem to the compilers, but | while it is no fun supporting, let's say, C89 and a | hypothetical incompatible language C3X, this is where the | effort should go (after all, companies with the old | codebases can stick with older compilers). There is a | great value in paving a way for a more fundamental C | language simplifications and clean ups. | apotheon wrote: | These are all good points, and I don't see a legitimate, | technical reason to avoid deprecating and eliminating | identifier list syntax in new C standards (but then, I'm | not as much of an expert as some people, so I might be | missing something important). | | That having been said, a compiler _vendor_ has, almost | _by definition_ as its _first priority_ , an undeniable | interest in keeping customers happy while, at the same | time, ensuring strong reasons to see value in a version | upgrade. When dealing with corporate enterprise | customers, that often means offering new features without | deprecating old features, because the customers want the | new features but don't want to have to rewrite _anything_ | just because of a compiler upgrade. | | They'll want C17 (and C32, for that matter) hot new | features, but they will not want to pay a developer to | "rewrite code that already works" (in the view of middle | managers). | | That's why I think they'd most likely complain. Their | concerns about removing identifier lists likely have | _nothing at all_ to do with good technical sense. | Ideally, if you don 't want to rewrite your rickety old | bit-rotting shit code, you should just continue compiling | it with an old compiler, and if you want new language | features you should use them in new language standard | code, period, but business (for pathological, perhaps, | but not really upstream-curable reasons) doesn't | generally work that way. | ximeng wrote: | One alternative at that point is to just ignore the fact | that the deprecated feature is now removed and continue | supporting it in your compiler. Maybe you hide standards | compliance behind a flag. Annoying and more overhead, but | saves your clients from spending dollars on upgrading | their obsolete code. | pcr910303 wrote: | Or... deprecating unsafe or not-well-designed (but this is a | bit subjective) ideas. Like... deprecating locales. (For why | locales aren't well-designed ideas: https://github.com/mpv- | player/mpv/commit/1e70e82baa9193f6f02...) | cesarb wrote: | > Proper array support (which passes around the length along | with the data pointer). | | I second this one. One of the best things from Rust is its "fat | pointers", which combine a (pointer, length) or a (pointer, | vtable) pair as a single unit. When you pass an array or string | slice to a function, under the covers the Rust compiler passes | a pair of arguments, but to the programmer they act as if they | were a single thing (so there's no risk of mixing up lengths | from different slices). | loeg wrote: | Fat pointers in C would involve an ABI break for existing | code, in that uintptr_t and uintmax_t would probably need to | double in size. | rkangel wrote: | It would presumably involve a new type that didn't exist in | the current ABI. Those pointers would stay the same, and | the new (twice as big) pointers would be used for the array | feature. | professoretc wrote: | The point of uintptr_t is that it's an integer type to | which _any_ pointer type can be cast. If you introduce a | new class of pointers which are not compatible with | uintptr_t, then suddenly you have pointers which are not | pointers. | loeg wrote: | Ditto uintmax_t. We do not want a uintmax2_t. | _kst_ wrote: | No, uintptr_t is an integer type to which any _object_ | pointer type can be converted without loss of | information. (Strictly speaking, the guarantee is for | conversion to and from void*.) And if an implementation | doesn 't have a sufficiently wide integer type, it won't | define uintptr_t. (Likewise for intptr_t the signed | equivalent.) | | There's no guarantee that a function pointer type can be | converted to uintptr_t without loss of information. | | C currently has two kinds of pointer types: object | pointer types and function pointer types. "Fat pointers" | could be a third. And since a fat pointer would | internally be similar to a structure, converting it to or | from an integer doesn't make a whole lot of sense. (If | you want to examine the representation, you can use | memcpy to copy it to an array of unsigned char.) | kazinator wrote: | You would be shocked by this language called C++ which is | highly compatible with C and has "pointer to member" | types that don't fit into a uintptr_t. | | (Spoiler: no, there is no uintptr2_t). | kazinator wrote: | On a given platform, the fat pointer type could have an | easily defined ABI expressible in C90 declarations (whose | ABI is then deducible accordingly). | | For instance, complex double numbers can have an ABI | which says that they look like struct { double re, im; }; | cesarb wrote: | Existing code would be using normal pointers, not fat | pointers, so there would be no ABI break. New code using | fat pointers would know that they fit into a _pair_ of | uintptr_t, so the size of uintptr_t would not need to | change either. | loeg wrote: | I don't think we want a uintptr_t and uintptr2_t. | monocasa wrote: | IDK, it's not like it'd be an auto_ptr situation where | you just don't use uintptr_t anymore and call the other | one uintptr2_t. THere's different enough semantics that | they both still make sesne. | | Like, as someone who does real, real dirty stuff in Rust, | usize as a uintptr equivalent gets used still even though | fat pointers are about as well supported as you can | imagine. | kazinator wrote: | The C family has already evolved in this direction decades | ago. Have you heard of C++ (Cee Plus Plus)? | | It is production-ready; if you want a dialect of C with | arrays that know their length, you can use C++. If you wanted | a dialect of C in 1993 with arrays that know their length for | use in a production app you could also have used C++ then. | | The problem with all these "can we add X to C" is that there | is always an implicit "... but please let us not add Y, Z and | W, because that would start to turn C into C++, which we all | agree that we definitely don't want or need." | | The kicker is that _everyone wants a different X_. | | Elsewhere in this thread, I noticed someone is asking for | _namespace { }_ and so it goes. | | C++ _is_ the result --- is that version of the C language --- | where most of the crazy "can you add this to C" proposals | have converged and materialized. "Yes" was said to a lot of | proposals over the years. C++ users had to accept features | they don't like that other people wanted, and had to learn | them so they could understand C++ programs in the wild, not | just their own programs. | apotheon wrote: | C++ introduces a shit-ton of stuff that one often doesn't | want, and even Bjarne Stroustrup (who many content has | never seen a language feature he didn't want) has been a | little alarmed at the sheer mass of cruft being crammed | into recent updates to the standard. I know many C++ people | think C++ is pure improvement over C in all contexts and | manners, but it's not. It's different, and there are | features implemented in C++ and not in C that could be | added to C without damaging C's particular areas of | greatest value, and many other features in C++ that would | be pretty bad for some of C's most important use cases. | | C shouldn't turn into C++, or even C++ Lite(tm), but it | shouldn't remain strictly unchanging for all eternity, | either. It should just always strive to be a better C, | conservatively, because its niche is one where conservative | advancement is important. | | Some way to adopt programming practices that guaranteee | consistent management of array and pointer length -- not | just write code to check it, but actually _guarantee_ it -- | would, I think, perfectly fit the needs of conservative | advancement suitable to C 's most important niche(s). It | may not take the form of a Rust-like "fat pointer". It may | just be the ability to tell the compiler to enforce a | particular constraint for relationships between specific | struct fields/members (as someone else in this discussion | suggested), in a backward-compatible manner such that the | exact same code would compile in an older-standard compiler | -- a _very_ conservative approach that should, in fact, | solve the problem as well as "fat pointers". | | There are ways to get the actually important upgrades | without recreating C++. | kazinator wrote: | > _C++ introduces a shit-ton of stuff that one often | doesn 't want_ | | The point in my comment is that every single item in C++ | was wanted and championed by _someone_ , exactly like all | the talk about adding this and that to C. | | > _C shouldn 't turn into C++_ | | Well, C _did_ turn into C++. The entity that gave forth | C++ is C. | | Analogy: when we say "apes turned into humans", we don't | mean that apes don't exist any more or are not continuing | to evolve. | | Since C++ is here, there is no need for C to turn into | another C++ _again_. | | A good way to have a C++ with fewer features would be to | trim from C++ rather than add to C. | twic wrote: | > if you want a dialect of C with arrays that know their | length, you can use C++ | | C++ doesn't have arrays which know their length. | zokier wrote: | What's std::array then? | | > combines the performance and accessibility of a C-style | array with the benefits of a standard container, such as | knowing its own size | | https://en.cppreference.com/w/cpp/container/array | kevin_thibedeau wrote: | They're objects that mostly behave like arrays. You can't | index element two of std::array foo as 1[foo] since it | isn't an actual C array. | kazinator wrote: | C++ has features in its syntax so that you can write | objects that behave like arrays: support [] indexing via | operator [], and can be passed around (according to | whatever ownershihp discipline you want: duplication, | reference counting). C++ provides such objects in its | standard library, such as: std::basic_string<T> and | std::vector<T>. There is a newer std::array also. | pjmlp wrote: | Microsoft's "Checked C" seems to be the last attempt to fix C | security flaws. | | From the outside, after Annex K adoption failure, WG14 doesn't | seem to be willing to make C safer in any way. | | Are there any plans to take efforts like Checked C in | consideration regarding the future of ISO C? | Bambo wrote: | What is your favourite design pattern? | wpietri wrote: | As experts, where do you see C going? In particular, given the | many languages now out there built on decades of learnings from | C, where will C have unique strengths? What projects starting | today and hoping to run for 20 years should definitely pick C? | rseacord wrote: | I don't really see C going anywhere. It's not going away, and | it's not going to evolve into Java. It's going to remain | especially useful for memory constrained and performance | critical applications such as IoT and embedded. | wpietri wrote: | That sounds reasonable, but the resource-constrained space | seems to me to be an ever-shrinking share of the field. So is | it fair to say you see C becoming a specialist niche language | going forward? | rand0mstring wrote: | is there no way to make C "memory-safe" during compilation? | zzzcpan wrote: | There are a bunch of research projects that did just that. And | even just compiling with address sanitizer makes it "memory- | safe" to a significant degree. | rand0mstring wrote: | can you link any to check out? | rbultje wrote: | I'd love your opinion on the abundance of "undefined behaviour" | (as opposed to implementation-defined, or some new incantation | such as "unknown result in variable but system is safe") for | relatively trivial things such as signed (but not unsigned) | integer overflows. I've heard that this is to allow for non-twos- | complement implementations. However, in practice, you notice that | most people use ugly workarounds which lead to ugly code that | (because of e.g. casting to unsigned and allowing the same | overflow to happen anyway) only work correctly on twos-complement | anyway. Is this intended to be addressed in the future in some | way? | stephencanon wrote: | > (because of e.g. casting to unsigned and allowing the same | overflow to happen anyway) only work correctly on twos- | complement anyway | | Unsigned arithmetic never overflows, and guarantees | two's-complement behavior, because unsigned arithmetic is | always carried out modulo 2^n: | | > A computation involving unsigned operands can never overflow, | because a result that cannot be represented by the resulting | unsigned integer type is reduced modulo the number that is one | greater than the largest value that can be represented by the | resulting type. (6.2.5, Types) | | Doing the computation in unsigned always does the "right | thing"; the thing that one needs to be careful of with this | approach is the conversion of the final result back to the | desired signed type (which is very easy to get subtly wrong). | jimktrains2 wrote: | Interesting. I guess most/many arch's overflow flag is set | when the sign bit changes and the carry flag when the result | rollsover the word size. | | I think most people colloquially call going A + 1 = B where B | < A an overflow. Interesting. I knew they're different | things, but never really thought about my word choice. | rini17 wrote: | And are there standard primitives to do this correctly | (signed-unsigned-signed conversion) that never invoke | undefined behavior? | stephencanon wrote: | Signed to unsigned conversion is fully defined (and does | the two's complement thing): | | > Otherwise, if the new type is unsigned, the value is | converted by repeatedly adding or subtracting one more than | the maximum value that can be represented in the new type | until the value is in the range of the new type (6.3.1.3 | Signed and unsigned integers) | | Unsigned to signed is the hard direction. If the result | would be positive (i.e. in range for the signed type), then | it just works, but if it would be negative, the result is | implementation-defined (but note: _not_ undefined). You can | further work around this with various constructs that are | ugly and verbose, but fully defined and compilers are able | to optimize away. For example, `x <= INT_MAX ? (int)x : | (int)(x + INT_MIN) + INT_MIN` works if int has a twos- | complement representation (finally guaranteed in C2x, and | already guaranteed well before then for the intN_t types), | and is optimized away entirely by most compilers. | _kst_ wrote: | A quibble on wording: Unsigned overflow is not "twos- | complement". It gives you the same bit patterns that typical | two's-complement overflow gives you, but strictly speaking | two's-complement is a representation for _signed_ values. | shawnz wrote: | Wrapping around the modulus to me is an "overflow", although | maybe the spec doesn't use the word that way | GuB-42 wrote: | There is also a difference in x86 assembly, and probably | others. | | For unsigned operations the carry flag is used, and for | signed operations, the overflow flag is used. | kwillets wrote: | Most compilers will translate unsigned (x + y < x) to CF | usage. | _kst_ wrote: | Right, there are (at least) two ways to describe this. | | One is that unsigned arithmetic can overflow, and the | behavior on overflow is defined to wrap around. | | Another is to say that unsigned arithmetic cannot overflow | because the result wraps around. | | Both correctly describe the way it works; they just use the | word "overflow" in different ways. | | The C standard chooses the second way of describing it. | a-bit-of-code wrote: | Any chance that we could have an STL equivalent in C. Of course, | templating and other features being absent it won't be as generic | as CPP. However, having even something close to STL will help in | the long run. Thanks! | rseacord wrote: | There is always a chance. We would need to see a proposal based | on experience with an existing implementation. | polishdude20 wrote: | What is your favorite language other than C and why? | pascal_cuoq wrote: | I answered a similar question in another thread: | https://news.ycombinator.com/item?id=22866242 | ender1235 wrote: | Hi I took an amazing course in college that focused heavily on C. | Do you have any recent examples of small side projects you've | worked on using C? | DougGwyn wrote: | How about a Sudoku solver? Send me a request via e-mail. | dang wrote: | Doug, the email address in your account is private by | default, but you can make it public by putting it in the | About field of your profile at | https://news.ycombinator.com/user?id=DougGwyn. | | ender1235, if you don't see an email address there, email | hn@ycombinator.com and I'll put you in touch. | DougGwyn wrote: | Okay, check my About text. I'll soon remove it, to avoid | getting a lot of spam. | tzs wrote: | If an old timer who used to be good with C wanted to use C again, | would they have to learn a whole bunch of weird new stuff or | could they pretty much use it like they did back in the stone age | (i.e., the 20th century)? | | Back in the '80s and '90s I was pretty good at C. I don't think | there was anything about the language or the compilers than that | I did not understand. I used C to write real time multitasking | kernels for embedded systems, device drivers and kernel | extensions for Unix, Windows, Mac, Netware, and OS/2. I did a | Unix port from swapping hardware to paging hardware, rewriting | the processes and memory subsystems. I tricked a friend into | writing a C compiler. I could hold my own with the language | lawyers on comp.lang.c. | | Somewhere in there I started using C++, but only as a C with more | flexible strings, constructors, destructors, and "for (int i = | ...)", and later added STL containers to that. | | Sometime in the 2000s, I ended up spending more and more time on | smaller programs that were mostly processing text, and Perl | became my main tool. Also I ended up spending a lot of helping | out less experiences people at work who were doing things in PHP, | or JavaScript, or Java. My C and C++ trickled to nothing. | | I've occasionally looked at modern C++, but it is so different | from what I was doing back in '90s or even early '00s I sometimes | have to double check that I'm actually looking at C++ code. | | Is modern C like that, or is it still at its core the same | language I used to know well? | AaronBallman wrote: | I'd put it this way -- as someone who writes both C and C++ and | has for a long while, I find that the difference between "best | practice" C89 and C17 code is not as wide as the difference | between "best practice" C++98 and C++17 code. However, this is | subjective and may be specific to what kinds of projects I work | on, so YMMV. | msebor wrote: | C17 doesn't look much different than C89. If you are used to | K&R C there may be some adjustment but I would expect it to be | manageable. | | What might perhaps be more challenging is adjusting to the | changes in compilers. They tend to optimize code more | aggressively and so writing code that closely follows the rules | of the language (rather than making assumptions about the | underlying hardware, even valid ones) is more important today | than it was back in the 80's. | rmind wrote: | Given the above, it is worth pointing out that the compilers | are also much much better in verification and useful | warnings/errors. Back in the (very old) days, there was a | motivation to cut down PCC (Portable C Compiler) and give the | birth to Lint as a separate application (because cutting the | compilation time was a greater priority). The current trends | are completely the opposite: compilers are getting | increasingly more powerful built-in static analyzers and | sanitizers by default. | | I think the lack of powerful tools in 1990s-2000s contributed | to the thought by some that C is 'diffcult' in terms of | safety. However, things have moved on. | pjmlp wrote: | As additional info, | | > Although the first edition of K&R described most of the | rules that brought C's type structure to its present form, | many programs written in the older, more relaxed style | persisted, and so did compilers that tolerated it. To | encourage people to pay more attention to the official | language rules, to detect legal but suspicious | constructions, and to help find interface mismatches | undetectable with simple mechanisms for separate | compilation, Steve Johnson adapted his pcc compiler to | produce lint [Johnson 79b], which scanned a set of files | and remarked on dubious constructions. | | -- https://www.bell-labs.com/usr/dmr/www/chist.html | x0re4x wrote: | Take a look at "Modern C" :) | | https://gforge.inria.fr/frs/download.php/latestfile/5298/Mod... | | (Homepage: https://modernc.gforge.inria.fr/ ) | DougGwyn wrote: | The main editing needed to bring "old C" source code up to | snuff using a "modern C" compiler is to make sure that the | standard header-defined types are used. No more assuming that a | lot of things are, by default, int type. A second, related | editing pass is to make sure all functions are declared as | prototypes, no longer K&R style; K&R style is slated to be | deprecated by the next version of the Standard. (There are some | rare uses for non-prototyped functions, but evidently the | committee thinks there is more benefit in forcing prototypes.) | adrianmonk wrote: | I'm sort of in the same boat, although I didn't do as much C. | (And my interest in getting back into it is more hypothetical.) | | Aside from understanding how the language itself has changed, | maybe something else to put on the list is how to apply more | modern programming practices in C. | | In the 90s, I don't think I ever saw C code with unit tests. | Any kind of automated testing was pretty rare. I've become | convinced that testing in some form is a good thing. If I were | going back to C, I'd want to understand the best way to go | about that. | | People also didn't care (or know) much about security back | then. C has some obvious pitfalls (buffer overflows, etc.), and | it is pretty important to know good ways to minimize risk. I'd | want to understand best practices and techniques for this. | | Also, back then build tools were very simple, and some of them | were not my favorite things to use (Imake, I'm looking at you). | Build tools have advanced a lot since then. Features like | reliable, deterministic incremental builds exist now. Some | things could be less tedious to configure and maintain. There | are probably best practices and preferred choices in build | tools, but what exactly they are is another thing I'd want to | know. | | These are probably not questions that necessarily need an | answer from people whose expertise is the language itself, | though, so I guess this is a tangent. | pcr910303 wrote: | As there are a lot of C-masters lurking in this thread: | | How can one process unicode (UTF-8) properly in C? As a CJK | person, I wish there was a robust solution. Are there any | standardized ways or proposals? (Using wchar doesn't count.) | begriffs wrote: | Your best bet is probably to use a library like ICU. | | Here are examples of working with unicode in C: | https://begriffs.com/posts/2019-05-23-unicode-icu.html | pascal_cuoq wrote: | As a reviewer for Robert's upcoming C book "Effective C", I | thought that this aspect was better covered than in existing | manuals for learning C. | | However, the book only describes the available standard | functions, so even doing better than other manuals, everything | it has to say on this subject fits in one chapter and feel | underpowered. | Tronic2 wrote: | Ignore all character support in the standard library and handle | UTF-8 as opaque binary buffers. If you need complex string | algorithms, decode into UCS-4 (UTF-32). You'll find short | encoding and decoding functions on StackOverflow. For case- | insensitive comparisons and sorting, use an external library | that knows the latest Unicode standard. | barbegal wrote: | Except that not all binary data is valid UTF-8 so you also | need functions that check if a binary buffer is valid UTF-8. | Tronic2 wrote: | The decoding phase will do that, if needed. Also note that | in many cases you must process it as opaque binary, even | though it _should be_ valid UTF-8. This is in particular | with filenames on POSIX systems because otherwise you could | not access any files that happen to have invalid UTF-8 in | their names. | ori_b wrote: | https://bitbucket.org/knight666/utf8rewind/src/default/ | DougGwyn wrote: | UTF-8 encoding works "as is" based on byte strings (char[]). | The latest versions of the draft standard provide somewhat more | support. | | I recommend heading toward a future where only UTF-8 encoding | is used for multibyte characters and UCS-2 or similar for | wchar_t. There is no need to support several different | encodings. | [deleted] | rseacord wrote: | Aaron Ballman even got a u8 character prefix added to C2x: | | N2198 2018/01/02 Ballman, Adding the u8 character prefix | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2198.pdf | ori_b wrote: | UCS-2 is a bad choice -- it fails to represent most unicode | characters. If you meant UTF-16, that's also a bad choice, | because UTF-16 is _also_ a variable width encoding, forcing | programmers to use a some for of "extra-wide char". | | I'm of the opinion that wchar_t should become an alias for | char32_t. | DougGwyn wrote: | Yes, I meant the 31-bit code point value (more than 16, | anyway). It is the most useful width for doing things with | wide characters. | loeg wrote: | What sort of processing do you want to do? | oreally wrote: | Check this: http://utf8everywhere.org/ | | Basically store the text as char arrays, and convert them when | needed. Meanwhile, you could use this single file header: | https://github.com/RandyGaul/cute_headers/blob/master/cute_u... | loeg wrote: | Has Annex K been axed yet, and if not, why not? | rseacord wrote: | It has not. The C Committee has taken two votes on this, and in | each case, the committee has been equally divided. Without a | consensus to change the standard, the status quo wins. | | Sounds like you don't care for Annex K. What don't you like | about it? | loeg wrote: | I think my complaints are summed up nicely in some of your | coauthors' report: | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm | | (1) runtime constraint handler callbacks are a terrible API. | | (2) The additional boilerplate doesn't buy us anything -- the | user can still specify the wrong size. | | (3) The Annex invents a feature out of whole cloth, rather | than standardizing existing practices. There are no | performant real-world implementations that anyone uses. | Microsoft's similar functionality is non-standard. | commandersaki wrote: | Will Effective C cover the strict aliasing rule and also why the | BSD sockets API seems to get away with it (e.g. (sockaddr *) | &sockaddr_in)? | DougGwyn wrote: | I thought we had fixed the BSD socket aliasing a long time ago? | AaronBallman wrote: | I don't think the book covers strict aliasing, at least not in | detail. | cyber1 wrote: | I think C is an exceptional good language for a long time, but | the world is changing and maybe C must evolve with new trends, | new researches in programming languages. | | In my view C and C++ now almost different languages with a | different philosophy of programming, different future, and | different language design. | | It will be sad if "modern" C++ almost replace C. Many C++ | developers use "Orthodoxy C++" | https://gist.github.com/bkaradzic/2e39896bc7d8c34e042b, and this | shows that people will be more comfortable with C plus some | really useful features(namespaces, generics, etc), but not modern | C++. I very often hear from my job colleagues and from many other | people who work with C++ is how terrible modern C++ | (https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/, | https://www.youtube.com/watch?v=9-_TLTdLGtc) and haw will be good | to see and use new C but with some extra features. Maybe time to | start thinking about evolution C, for example: - | Generics. Something like generics in Zig, Odin, Rust. etc. | - AST Macros. For example Rust or Lisp macroses, etc. - | Lambda - Defer statement - Namespaces | | What do you think? | | https://ziglang.org/documentation/master/#Generic-Data-Struc... | | https://odin-lang.org/docs/overview/#parametric-polymorphism | | https://doc.rust-lang.org/rust-by-example/generics.html | eqvinox wrote: | What's the best way to deal with "transitive const-ness", i.e. | utility functions that operate on pointers and where the return | type should technically get const from the argument? | | (strchr is the most obvious, but in general most search/lookup | type functions are like this...) | | Add to clarify: the current prototype for strchr is | char *strchr(const char *s, int c); | | Which just drops the "const", so you might end up writing to | read-only memory without any warning. Ideally there'd be | something like: maybe_const_out char | *strchr(maybe_const_in char *s, int c); | | So the return value gets const from the input argument. Maybe | this can be done with _Generic? That kinda seems like the | "cannonball at sparrows" approach though :/ (Also you'd need to | change the official strchr() definition...) | DSMan195276 wrote: | The straight-forward approach is just two functions, one with | `const` and one without (You can make one of them `static | inline` around the other and do some casting to avoid | implementing the same thing twice). | | With that, selecting the correct function via `_Generic` should | be possible (`_Generic` is a bit fiddly, but matching on `const | char * ` and `char * ` should work just fine for this), and for | the most part this is actually an/the intended use case for | `_Generic` - it's basically the same as the type-generic math | functions, more or less. | msebor wrote: | The committee has reviewed a proposal (document N2360) to for | const-correct string functions. | | But making function signatures const-correct solves only a | small part of the problem. A new API can only be used in new | code, and casts can remove the constness from pointers leaving | open the possibility that poorly written code will | inadvertently change the const object. An attempt to change a | global variable declared const will in all likelihood crash, | but changing a local const can cause much more subtle bugs. | | In my view, a more complete solution must include improving the | detection of these types bugs in compilers and other static and | even dynamic analyzers even without requiring code changes. | It's not any more difficult to do that detecting out of bounds | accesses. (In full generality it cannot be done just by relying | on const; some other annotation is necessary to specify that a | function that takes a const pointer doesn't cast the constness | away and modify the object regardless.) | DougGwyn wrote: | Many uses of strchr do write via a pointer derived from a non- | const declaration. When we introduced const qualifier it was | noted that they were actually declaring read-only access, not | unchangeability. The alternative was tried experimentally and | the consequent "const poisoning" got in the way. | coliveira wrote: | I believe C is doing the right thing. Const as immutability | is a kludge to force the language to operate at the level of | data structure/API design, something that it cannot do | properly. | moonchild wrote: | Have you ever used a high-level statically-typed language, | e.g. haskell? | pascal_cuoq wrote: | Speaking as someone who is not in the committee but has | observed trends since 2003 or so, I would say that solving this | problem is way beyond the scope of evolutions that will make it | in C2a or even the next one. | | There are plenty of programing languages that distinguish | strongly between mutable and immutable references, and that | have the parametric polymorphism to let functions that can use | both kinds return the same thing you passed to them, though. C | will simply just never be one of them. | pwdisswordfish2 wrote: | One proposal solved this by doing exactly that: | | http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2068.pdf | _kst_ wrote: | strchr() is one of several C library functions that have this | issue. | | C++ solved this by overloading strchr(): | const char *strchr(const char *s, int c); char | *strchr(*char *s, int c); | | C of course doesn't have overloading. | | One solution could have been to define two functions with | different names, perhaps "strchr" and "strcchr". The time to do | that would have been 1989, when the original ANSI C standard | was published. | | I suppose a future C standard could leave strchr() as it is | (necessary to avoid breaking existing code) and add two new | functions. | JoshTriplett wrote: | What are the chances of typeof, or statement expressions, finding | their way into the C standard? They're already widely | implemented. | msebor wrote: | Several of us discussed typeof and I'd expect a proposal for a | feature along these lines to be well received. (I recall | someone even saying they're working on one but that shouldn't | stop anyone from submitting one of their own.) | JoshTriplett wrote: | I'm glad to hear that. | | What about statement expressions? They're quite useful, and | supported by multiple independent compilers. | msebor wrote: | I'm not aware of recent proposals for those but we have | discussed ideas along those lines (closures: N2030, C++ | lambdas, Apple Blocks: N1451, and I think there was one | from Cilk). I think there was interest but not enough | support for the details and likely also concerns from | implementers. | rseacord wrote: | So what do people think about having a feature in the C language | akin to the defer statement in GoLang? | | The GoLang defer statement defers the execution of a function | until the surrounding function returns. The deferred call's | arguments are evaluated immediately, but the function call is not | executed until the surrounding function returns. It looks like an | interesting mechanism for cleaning up resources. | [deleted] | bokwoon wrote: | How about deferring until the surrounding block scope ends? In | Go you can get around the limitation of defer only executing at | the end of a function by wrapping any arbitrary section of code | inside an immediately executed anonymous function. But in C I'm | not sure that's possible so maybe one could declare a new block | scope instead to control when defer kicks in. | NickDunn wrote: | It could be very useful for cleaning resources. I've never used | GoLang, but can see how that could be useful in various | circumstances. As we're talking about C, I suspect a feature | like that, with the potential to make things safer, would also | enable the unwary to shoot themselves in the foot more easily. | majke wrote: | I personally don't like golang's defer. For me it obscures the | flow of the program. For example when I acquire a lock, I like | to see where exactly it's released. | | For me "defer" only makes sense in the context of exceptions, | basically as an equivalent to "finally". This is a slippery | slope though, since golang's exceptions are, for a reason, | rudimentary. | smasher164 wrote: | I would love to see defer in the language. It helps keep | cleanup code close to the resource that is acquired. | | Would the proposed defer statement apply to loops as well? How | would one implement such defers without dynamic allocation? | pascal_cuoq wrote: | It sounds like the __attribute__((cleanup(...))) already | offered by GCC is similar to this. I probably won't have time | to investigate the differences while the AMA is ongoing though. | LHopital wrote: | I'll pass! Thanks though. | dang wrote: | Please don't do this here. | [deleted] | jboschpons wrote: | Hola! | gnachman wrote: | Since 1999, a lot of undefined behavior has been added to the | language to improve compilers' ability to optimize. For example, | pointer aliasing rules. How have you measured the benefit? ___________________________________________________________________ (page generated 2020-04-14 23:00 UTC)