[HN Gopher] GCC 13 Supports New C2x Features, Including Nullptr,... ___________________________________________________________________ GCC 13 Supports New C2x Features, Including Nullptr, Enhanced Enumerations Author : mikece Score : 86 points Date : 2023-05-15 15:06 UTC (7 hours ago) (HTM) web link (www.infoq.com) (TXT) w3m dump (www.infoq.com) | zabzonk wrote: | nullptr is since c++11 | | https://en.cppreference.com/w/cpp/language/nullptr | | sorry - thought the post was re c++ - my bad | mananaysiempre wrote: | In C++. C only copied it in C23 (not yet ratified). | zabzonk wrote: | sorry, misread the post | whatever1 wrote: | I wonder whether LLMs could help with smarter code optimizations, | especially, since they can be context aware. | Gibbon1 wrote: | We should have a ten year moratorium on optimizations and force | the compiler maintainers to work on other things. | cryptonector wrote: | > This proposal also recommends adoption of Unicode normalization | form C (NFC) for identifiers to ensure that when compared, | identifiers intended to be the same will compare as equal. Legacy | encodings are generally naturally in NFC when converted to | Unicode. Most tools will, by default, produce NFC tex. | | Er, a much better approach is to allow unnormalized Unicode in | source code and use form-insensitive matching of symbol names so | that all forms of a symbol are equivalent. This can be done by | normalizing during the parse, or by implementing form-insensitive | string comparison and hashing functions that normalize glyph by | glyph as needed -- the latter can be very fast for all-ASCII and | mostly-ASCII symbols! | | The reason this is a better way is that there's too many places | that don't produce NFC. For example, HFS+ uses NFD, so if you | cut-n-paste a file name from HFS+ into other contexts, you'll be | pasting NFD unless the cut-n-paste system normalizes to NFC. | Also, while it's true that input modes typically produce NFC, | it's more that they produce NFC for a small subset of Unicode, | not that they will normalize other forms seen on input. Using | form-insensitive string comparison/hashing/matching yields a | better user experience at not that much implementation cost: | you're gonna need a Unicode library, and that library will need | to have normalization support, so you can implement form- | insensitivity. | wahern wrote: | > Er, a much better approach is to allow unnormalized Unicode | in source code and use form-insensitive matching of symbol | names so that all forms of a symbol are equivalent. | | Linkers will often be blissfully unaware of Unicode or any form | of localization. This was the impetus for UTF-8, so that the | bulk of software which is 8-bit clean or which operates on | opaque, NUL-terminated strings can continue working as-is. This | can't be changed without breaking backwards ABI compatibility; | therefore, it's very unlikely to change. | | There are countless half-measures that could be taken, but few | if any are suitable for standardization. If the history of | software localization is any guide, in the face of strict, | forward-looking specifications various vendors and ecosystems | will likely go there own way, with the one sure thing being a | failure to fully adopt or properly implement the specification. | cryptonector wrote: | Yes, the compiler should normalize symbols before writing | object files, no doubt. I'm talking about the inputs though | -the source files- which should not have to be normalized. | quesomaster9000 wrote: | Nobody else is admiring typed enumerations? | | Particularly when using structs this removes a lot of ambiguity | if you ignore the indirection to find out the underlying type of | the enum (or encode it in the name hungarian style). | enum D : uint8_t { A = 0, B = 1, | C = 2 } typedef struct { D f; | } __attribute__((packed)) E; assert(sizeof(E)==1); | | etc. could make grokking protocol declarations with enums less | onerous and requiring one less level of indirection. | maccard wrote: | I'm a c++ programmer and finding it hard to be excited about | things we added to the language 12 years ago. | quesomaster9000 wrote: | As a D programmer, why haven't you caught up yet? | maccard wrote: | I write a reasonable amount of kotlin these days and it's | night and day. | ishvanl wrote: | As a rust programmer... etc etc. | loeg wrote: | As a sneering C++ programmer, why are you even reading / | commenting on a new C standard? This is basically a "if you | don't have anything nice to say, don't say it" situation. | tom_ wrote: | The same goes for C programmers with a chip on their | shoulder! | | There is a downvote button. | david2ndaccount wrote: | Clang has had it as an extension for a long time | jwilk wrote: | Related: | | https://news.ycombinator.com/item?id=35813821 ("New C features in | GCC 13", 11 days ago, >260 comments) | twic wrote: | Who is the audience for new features in C? And who is driving | stuff through the standardisation process? Is this stuff likely | to make its way through to embedded toolchains? Or is this for | people who are maintaining existing codebases? | nitrix wrote: | Changes to the Standard usually happens as a result of defect | reports (confusing details that implementation writers want | clarity on) or vast enough general adoption (unifying how | implementations were differently achieving the same thing). | | You can read #13 of the Charter https://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2086.htm | | As for the audience, it's all the C developers, the open-source | and commercial compiler implementations, vendors of libraries, | tooling, services, learning material and everything else built | in C; which is just innumerable. | | Each Standard version released supersedes and obsoletes the | previous versions. Intentionally, the versions are meant to be | as backwards compatible as possible so that one can mix and | match C89/C99/C11 codebases with minimum effort. | | C has gained only a handful of features in the last 40 years. | Compared to the great many things that are improved w.r.t. | undefined/implementation-specific/unspecified behaviors, or | removed to keep up with modern times (e.g. Trigraphs, Two's | Complement integer representations, etc). | | I'd say: (1) upgrading is not the spooky thing people make it | out to be. Go, Rust, they all move much faster than this and | have very ambitious big design ideas on their mind. (2) It's | necessary to take good care of C as it, and the things built in | it, will realistically outlive many of us. | cozzyd wrote: | There is plenty of new C code. | pantalaimon wrote: | Most embedded toolchains are ARM or RISC-V GCC these days, they | get all the features. | hgs3 wrote: | > Who is the audience for new features in C? | | Folks like myself who use C to write system software. | | > Is this stuff likely to make its way through to embedded | toolchains? | | Embedded toolchains based on GCC or Clang will presumably see | these features one day. | quesomaster9000 wrote: | The early adopters are usually transpilers (or code | generators) which can quickly take advantage of new features | without the effort of rewriting an entire codebase. | | In the same way that Rust used underlying `const` attributes | in LLVM (and found all the weird edge cases), and Nim used C | as an intermediate as have many other lisp or object-ish | languages. | nrclark wrote: | Yes, I'd expect they will. Most embedded toolchains these days | are built around GCC. So as GCC grows new features, embedded | toolchains will get them too. | cjensen wrote: | nullptr has been such a Godsend for C++. Good to see it coming to | C. | | If you ever see the macro NULL in code, be afraid. There are two | valid ways of defining the macro and that cause weird issues when | porting code. For example, in the statement printf ("%p %s\n", | NULL, "Hello world!"), one of the definitions leads to NULL being | interpreted as the null pointer, and the other leads to NULL | being interpreted as an integer. The latter may crash if integer | and pointer are different sizes. | | It also causes problems with C++ overloading if one overload | takes a pointer and another takes an integer. | LordShredda wrote: | But does C need a nullptr keyword? If you're programming in C, | you usually define 0 as an invalid value, or a null value. C | doesn't have the insane type system C++ has and doesn't have a | very strong need to make a distinction between a pointer or an | integer, since they're all in the end numbers. | | The printf example you gave is an example of garbage in, | garbage out. If NULL is a macro not defined as a pointer sized | integer, then you're at fault here. | mananaysiempre wrote: | > If NULL is a macro not defined as a pointer sized integer, | then you're at fault here. | | If it was you who wrote stdlib.h, sure; otherwise, if you're | on a platform where NULL is traditionally defined as 0 and | not (void *)0, you're stuck. A conformant implementation is | free to use either definition. | | If you want to language-lawyer more heavily, C does not | require there to be pointer-sized integers (uintptr_t is | optional), does not require that all zero bytes represent a | null pointer in memory (unlike for integers), does not | require that the implementation choose to store an integer | with value zero as all zero bytes (there may be other valid | representations), and in any case does not require an | implementation to do anything reasonable at all if the caller | passes an integer but a vararg callee looks for a pointer | (think separate integer and pointer registers). | | [I'm not entirely sure if (void *)(void *)0 is a null pointer | constant (though it's certainly an expression that evaluates | to a null pointer)--does it count as a zero-valued integer | constant expression cast to a pointer to void? So you might | not even be able to use (void *)NULL as a hedge against bad | platform headers.] | kazinator wrote: | You're allowed to do: #undef NULL | #define NULL ((void *) 0) | | Just don't do it prior to the inclusion of any standard | header (C or POSIX). | mananaysiempre wrote: | I don't think you are? Redefining a reserved identifier | is UB per ISO C (any version) 7.1.3p2, and per 7.1.3p1, | | > Each macro name in [the standard library] is reserved | for use as specified if any of its associated headers is | included; unless [you're #undef'ing a function also | provided as a macro]. | | The general idea seems to be that standard headers are | allowed to use macros they define, _even in other macros | they define_ , and because macro names are late-bound | (ugh), even if the user only redefines the name | afterwards, every macro that uses it will then be | affected. | | As a silly example, a valid part of stdlib.h could be | #define NULL 0 #define EXIT_FAILURE (NULL) | #define EXIT_SUCCESS (EXIT_FAILURE+1) | | and now after your redefinitions EXIT_SUCCESS becomes a | constraint violation. | | (For an implementor to actually do this would of course | be dumb, but you did say "allowed", and that's what the | standard says here.) | | Or did I misunderstand "use as specified"? | kevin_thibedeau wrote: | 0 never should have been overloaded in C to refer to the NULL | pointer. With pointer assignment and comparison it transforms | to the platform's encoding for NULL which isn't necessarily | all zeros. No other literal has this sort of magic. | kazinator wrote: | The C language existed before there was a preprocessor | which made it possible to define NULL. | kevin_thibedeau wrote: | This has nothing to do with the preprocessor. The concept | of NULL existed before the macro was standardized. | Literal zeros were the way to refer to it which was a | design mistake. | WalterBright wrote: | This works in Standard C: int *p = 3; | Gibbon1 wrote: | I like to point out that AVR micro's reading and writing | to address 0 is legit. | | On the ARM Cortex I think address 0 is the initial stack | pointer value. | | My opinion is NULL being something special in the | language is mathy CS academics trying to turn C into a | mathy abstract language. Which it ain't. | jcelerier wrote: | C is officially a mathy abstract language since 1989, the | compilers just took some time to catch up. | | > 2.1.2.3 Program execution | | > The semantic descriptions in this Standard describe the | behavior of an abstract machine [...] | kevin_thibedeau wrote: | The address is still 3 which has valid applications. C is | permissive enough to run on platforms that don't use | address 0 for NULL. With pointer operations the compiler | will change the encoding from 0 to that platform's NULL | address. int *p = 0; intptr_t i = | (intptr_t)p; if(i == 0) ... // Isn't always | true | LegionMammal978 wrote: | That's a compiler extension. In C17, 6.5.16.1 (Simple | assignment) implies that the RHS of an assignment to a | pointer must either have pointer type or be a null | pointer constant (i.e., an integer constant equal to 0, | or such a constant casted to pointer type), and 6.7.9 | (Initialization) states that "the same type constraints | and conversions as for simple assignment apply" to | expressions used as initializers. | bluGill wrote: | Assuming 0 is an invalid value is not always correct. 0 is a | perfectly valid pointer, and making it impossible to refer to | that location is bad. Of course if you are not writing an OS | or embedded system you won't ever have a pointer value of 0 | anyway as the OS can put things elsewhere with no problem (if | you are you need to see your CPU docs, some CPUs 0 is | invalid, some it is not). | kazinator wrote: | Umm, no. 0 is the null pointer constant, same as nullptr. | It is not a location, but an abstraction. If a platforms | null pointer happens to be the address 0xFFFFFFFF, then 0 | will produce that. | | There is no difference between char *p = | nullptr; | | and char *q = 0; | | other than the variable name; the two have to compare | equal: (p == q). | | What's wrong with 0 is that when it's not in an expression | where it's being converted to integer type, it's just an | integer. | bluGill wrote: | The problem is if 0 is a valid pointer and I write | | volatile int* x=0; x=0x1234; | | Did I just deference the null pointer or make a valid | write to that memory location? There is no way to know | for sure, you can only apply heuristics to make a guess. | | Of course if the lines are that closely spaced you can | guess, but in real code they can be in different | translation units. | roqi wrote: | > But does C need a nullptr keyword? | | Yes, it does. | | > If you're programming in C, you usually define 0 as an | invalid value, or a null value. | | That was also the usual pattern in C++ when there was no | alternative. Once nullptr was introduced in C++, NULL or 0 | quickly became a code smell. | | > C doesn't have the insane type system C++ has and doesn't | have a very strong need to make a distinction between a | pointer or an integer, since they're all in the end numbers. | | C++'s type system is far from insane. It's actually one of | it's killer features. | | You're both entirely oblivious to the need to not conflate | pointers with integers and failing to present any case in | favour of the legacy and broken use of NULL, and in the | process failing to address all family of known error patterns | involving it. | | > The printf example you gave is an example of garbage in, | garbage out. If NULL is a macro not defined as a pointer | sized integer, then you're at fault here. | | Again, you seem to be completely oblivious to the problem | domain. NULL is not a macro as far as C or C++ compilers are | concerned. NULL is a magic constant that's resolved at | preprocessing time. Replacing NULL with nullptr means a magic | constant is replaced by a concrete type, and thus whole | family of errors can be avoided with compile time checks. | Claiming that the developers who wrote in bugs are at fault | for inadvertently adding bugs makes no sense at all because | it does not solve any problem at all, and instead is just | cynical finger pointing. I take compile-time checks over | unhelpful finger pointing all day every day. | colonwqbang wrote: | NULL is a macro. | | The original mistake by the standards committee was | allowing implicit conversions from integer to pointer. I.e. | allowing NULL to be defined as simply 0. | | If NULL had been defined always as ((void *) 0) then I | don't see that we would have had a problem. | | But that's all history now and in this situation I can see | that adding nullptr becomes a reasonable way out. | | It's ironic though that the fix for the different ways to | write null is to add yet another way. | roqi wrote: | > NULL is a macro. | | You're missing the whole point. | | As per the C standard, NULL is an implementation-defined | null Pointer constant. | | Macros are resolved in the preprocessing step. The | compiler does not know what a macro is. What the compiler | knows is whatever the preprocessor passes off in place of | the macro. This means the compiler only sees a constant, | and has no way to tell what that constant means. | | If instead of passing random pointer constants you pass | an actual type, now the compiler can tell more things. | | > If NULL had been (...) | | Irrelevant. The whole point is that it wasn't the | committee looked at the problem, and it determined that | using a dedicated type is safer, more powerful, and more | elegant than passing magic numbers around. | LegionMammal978 wrote: | Well, the fault depends on who "you" are: the NULL macro | generally comes from one's libc, and allegedly some libc | maintainers have been very obstinately against changing their | NULL macros to have pointer type. | throwway120385 wrote: | Aren't there platforms where pointers have additional type or | space information encoded that is orthogonal to the numeric | address? It's only by convention that NULL == 0 because on | platforms like Intel & ARM you would typically not use the | first page. But that's only a convention, and you could just | as easily put a null page at the top of your address space, | especially in systems with an MMU where mappings can be | added, removed, or remapped as-needed. | mananaysiempre wrote: | > It's only by convention that NULL == 0 [...] and you | could just as easily put a null page at the top of your | address space [...]. | | Technically NULL == 0 always because the standard special- | cases zero-valued integer constant expressions; | (uintptr_t)NULL == 0 or NULL == *(void **)calloc(1, | sizeof(void *)) is another matter :) | | Language lawyering aside, a non-all-zeroes representation | of NULL will probably blow up most C programs [e.g. static- | storage-duration initialization is now not the same as | calloc or memset(,0,) and is even type-specific]. Like | CHAR_BIT, that's a joint that technically exists but has | been rusted for decades (pun not intended). | kazinator wrote: | There is no problem with static initializations with a | null pointer that is not all zero bits, or a floating- | point 0.0 that is not all zero bits. | | Those values just cannot participate in the "BSS" trick, | whereby everything that is zero-initialized is put into a | special section that doesn't actually exist in the | program image, and is only provided on startup. | | Those values would go into the initialized data section. | | The problem with 0.0 or null pointers not being all zero | bits is all the code that uses calloc or memset zero. | | If this is on some specialized platform (e.g. DSP chip), | it might not matter that vast quantities of C code are | not portable. | | In general, compiler (and to a great extent instruction | set architecture!) designers are quite hamstrung by the | expectations of C programmers and programs; that has been | the situation for some thirty years now. | | Today, you could not sucessfully introduce a system in | which pointers to bytes (void _, char_ ) have a different | representation from other pointers (let alone different | size, lord forbid). | pjmlp wrote: | If we if ignore ongoing efforts on hardware memory | tagging since SPARC ADI. | kazinator wrote: | * * * | LegionMammal978 wrote: | In C, the integer 0 is explicitly defined to convert to a | null pointer for all assignments, casts, comparisons, etc., | regardless of what the pointer's "actual" value is. The | only time where you can see that a null pointer doesn't | have numeric value 0 is when you manipulate its object | representation with memset, memcpy, etc. The compiler is | also at liberty to return whatever it wants when you | convert a null pointer to an integer, except that | converting it back must produce a null pointer (if it's at | least as wide as intptr_t). | kazinator wrote: | The problems are that: | | - NULL is idiomatic: using NULL is entrenched in C programming | and it is not going away. | | - In spite of nullptr existing now, NULL is _still_ (quite | stupidly) not required to just expand to nullptr, but to an | implementation-defined null pointer constant, rather than | #define NULL nullptr. (According to the N2596 draft). | | - They had over 30 years to tighten the requirements on how | NULL can be defined; what's the matter? C99 could already have | required NULL to be ((void *) X) where X is an integer-typed | constant expression evaluating to zero. | | I'm not going to start using nullptr. It's not idiomatic C. I'm | going to hold out hope that NULL will be fixed so that it | expands to nullptr. | | -- | | Also, it's possible for a compiler to diagnose when a constant, | zero-valued expression is used as the argument of a variadic | function. The diagnostic can be confined to cases when such a | constant expression is the result of macro expansion: | printf_like_function("fmt ...", ... 0, ...); // OK | #define FOO 0 printf_like_function("fmt ...", ... | FOO, ...); // compiler diagnostic | printf_like_function("fmt ...", ... (int) FOO, ...); // OK | cogman10 wrote: | > In spite of nullptr existing now, NULL is still (quite | stupidly) not required to just expand to nullptr, but to an | implementation-defined null pointer constant, rather than | #define NULL nullptr. (According to the N2596 draft). | | This is so silly. I sort of get why not (can't break the dork | that decided to do int i = NULL; | i++; | | ) | | But, at the same time... I almost feel like this is a "you | are being a dork, go fix your code." moment. This isn't the | sort of break where someone would see it and go "Oh yeah, | assuming NULL is anything other than nullptr is dumb!" | kazinator wrote: | > _can 't break the dork that decided to_ | | Why not? We've broken the dork who used undeclared | functions, void main, gets ... | | (It's the same funking dork anyway. You know who you are, | I'm looking at you!) | | Note that int x = ((void *) 0); | | will actually work in GCC and get you a zero into x, just | with a conversion warning. The dork is unaffected; their | code works and they don't read warnings. | bigbillheck wrote: | > NULL is idiomatic | | It is today, but idioms are a human concept and who knows how | things will be in 2033? | kllrnohj wrote: | Well hopefully at some point we'll stop writing C at all | pjmlp wrote: | Once upon a time K&R C function declarations were idiomatic, | in C23 they are out. | ori_b wrote: | > _If you ever see the macro NULL in code, be afraid. There are | two valid ways of defining the macro_ | | Not on a Posix system, where the only valid definition of it is | `(void*)0`. C could have adopted this definition. | | Nullptr is needed in C++ because `0` is the only definition of | `NULL` that works with the type system, due to the lack of | implicit `void*` conversions. | | C doesn't have this problem. | | Adopting the Posix definition of NULL in the standard would | have been sufficient -- and unlike `nullptr`, would have solved | bugs in existing programs. | blackpill0w wrote: | >C doesn't have this problem | | I don't think a strong type system is a `problem`, implicit | conversions can lead to so many annoying and hard to find | bugs. | ori_b wrote: | That's a fine general sentiment. However, in this context | it's a problem if you want to assign NULL to a pointer | without a cast, which is why C++ added the magically | convertible nullptr in addition to the magically | convertible `0` constant. char *x = 0; | // ok in C and C++ char *y = (void*)0; // ok in C, | error in C++ char *z = nullptr; // ok in C++ | | therefore: #define NULL ((void*)0) // | Required by Posix C, invalid C++ #define NULL 0 // | Pre-nullptr, the only valid C++ definition | | C++ can't define NULL the safe way that Posix C does. | | I don't understand why it's more acceptable to allow magic | `0` conversions than magic `(void*)0` conversions, given | that the latter is far less likely to happen by accident -- | but here we are. | rightbyte wrote: | > I don't understand why it's more acceptable to allow | magic `0` conversions than magic `(void*)0` conversions | | In the end you don't have to chose between '0' and | 'nullptr' anyways. char *x = (decltype | (nullptr)) 0; | kzrdude wrote: | Case in point, integer arithmetic in C. Reasoning about | types there is just tiring. | Dylan16807 wrote: | They didn't say the type system is a problem, they said it | caused a problem. | josefx wrote: | > The latter may crash if integer and pointer are different | sizes. | | Apparently some compilers specified a special __null extension | to handle that case before nullptr was a thing. | kazinator wrote: | Since NULL expands to an implementation-defined null pointer | constant, it is valid for those implementations to go as far | as #define NULL __null. | anonymousDan wrote: | Are there any good resources on writing compiler | optimization/instrumentation passes in gcc (as opposed to LLVM)? ___________________________________________________________________ (page generated 2023-05-15 23:00 UTC)