[HN Gopher] Plain C API design, the real world Kobayashi Maru test ___________________________________________________________________ Plain C API design, the real world Kobayashi Maru test Author : jmillikin Score : 96 points Date : 2023-04-16 15:17 UTC (7 hours ago) (HTM) web link (nibblestew.blogspot.com) (TXT) w3m dump (nibblestew.blogspot.com) | nanofortnight wrote: | This seems perfect for _Generic. | | https://en.cppreference.com/w/c/language/generic | jesse__ wrote: | I've done a few APIs similar in spirit to this one (and one | similar in functionality, too), and I've found using a blend of a | few of the mentioned methods to be pretty effective. | | I start with the basics and implement everything as explicitly | (read: type safe) as possible: struct | bob_params {...} struct line_params {...} | pdf_page_cmd_bob(bob_params *params) { ... } | pdf_page_cmd_line(line_params *params) { ... } | | Then, when that works and I want to add patterns, which are a | superset, I'd add something that is literally a superset of those | functions. enum pattern_type { | patterntype_Bob, patterntype_Line, } | struct pattern { pattern_type type; union { | bob_params Bob; line_params Line; } | } pdf_pattern_cmd(pattern *pattern) { | switch (pattern->type) { case patterntype_Bob: /* do | bob drawing a bunch of times, or whatever */ } | } // The pattern struct has a lot of different | names (sum type, discriminated union, algebraic datatype, tagged | union .. probably more), but the idea is the 'type' tag tells you | which one of the union values to use. | | I've tried everything and, as far as I can tell, this gets you | the best of all worlds. It's pretty much as type-safe as things | get in C. It's extremely flexible; you just mix the fundamentals | together as you like when adding higher-order functions. It's | fast; the compiler can see exactly what's going on, so you pay | minimal runtime cost. | | Yes, it's a bunch of typing to get the functions all spelled out, | but it's really not that bad considering how easy/obvious the | code actually is. | | I use this pattern so much I actually wrote a little | metaprogramming language that is capable of generating a lot of | the boilerplate for you. Link in my bio, if anyone's interested | in looking at it. | twisteriffic wrote: | That style speaks to me. Thanks! | cpeterso wrote: | If you're designing a stable API for external users, you might | want to lock down your API even more by forward declaring the | struct types as opaque in the public header file and only | defining the struct members in a private library header file. | This prevents users from messing with your library's private | state and allows you to change implementation details later | without breaking binary compatibility. The disadvantage is that | users can't control how the structs are allocated or embed them | in their own structs. /* foo.h */ struct | foo_object; struct foo_object* foo_create(int flags, | ...); /* foo.c */ #include "foo.h" | struct foo_object { int flags, ... }; | lelanthran wrote: | Funnily enough that's the most common pattern I see in my | personal C code (for example, my little lisp interpreter - http | s://github.com/lelanthran/csl/blob/master/src/parser/ato...) | but I still recommend using the `Generic` keyword in C. | | For the next time I do a pattern like this, I'll be using | `Generic` keyword to make the dispatch a compile-time match, | not check at runtime. | jesse__ wrote: | Can you elaborate slightly on how you're planning on using | _Generic to turn runtime dispatches into compile time ones? | I'm not quite putting together how you can do that. | lelanthran wrote: | The example from the wikipedia page for C11 (https://en.wik | ipedia.org/wiki/C11_(C_standard_revision)#Chan...) is | compile time determination of the function to call: | #define cbrt(x) _Generic((x), long double: cbrtl, \ | default: cbrt, \ float: | cbrtf)(x) | | In code you'll write `cbrt(foo)` and the correct function | will be called for the type of foo. As I understand it, the | `_Generic` selection is performed at compile time. | [deleted] | JonChesterfield wrote: | Recommend `__attribute__((overloadable))` instead of | _Generic. The former opts into C++ style name mangling, which | is a bit of a mess in C, but interacts well with `static` as | forwarding wrappers in a header. The latter is a ridiculous | mess invented by the C committee. | KerrAvon wrote: | I think you may be disappointed. Every single time I've tried | to use `_Generic`, I've found that it's more trouble than | it's worth. They seem to have made it be useful for a very | narrow case -- tgmath.h -- and not bothered to make it | general enough to be applicable to a wide variety of things | that you might like to use it with. | gabereiser wrote: | can we just agree that PDF sucks? Sometime's the API must | operate on a spec that is unruly to begin with. I like this | approach. It's about the best you can get when dealing with | this without completely reinventing the storage format to | support the API. | jesse__ wrote: | 0MgZz yea PDF has got to be one of the worst specs ever | created. It's amazing PDF viewers work at all. | mintplant wrote: | Composition? typedef struct { /* ... */ } | context; typedef struct { context ctx; | /* ... */ } foo_context; typedef struct { | context ctx; /* ... */ } bar_context; | void some_general_method(context* ctx, int a, int b); | void some_foo_specific_method(foo_context* foo_ctx, int c, int | d); foo_context* foo_ctx; | some_foo_specific_method(foo_ctx, 0, 0); | some_general_method(&foo_ctx->ctx, 1, 1); | | For checked downcasts, you could include an enum type tag inside | `context` and have something like `foo_context* | as_foo_context(context* ctx)` for downcasts, which, if `context` | is at the beginning of `foo_context`, could just check the tag | and cast the `ctx` pointer to `foo_context*` (or do a little | pointer arithmetic if `context` is somewhere else in the | containing struct). Return `NULL` (or assert) if the tag doesn't | match. | uecker wrote: | I use this technique a lot and it is very powerful. | dromtrund wrote: | +1, this approach will also highlight cases where you're trying | to generalize something that isn't as generic as you thought. | | I'd argue that the need for downcasting in a method working on | the inner context would also be a code smell, and that you | might want to reconsider the context split. In some cases, | there's no way around it (like async events), but it might be | more appropriate to pass additional context or a callback | instead, to avoid a circular dependency. | joeatwork wrote: | I know it isn't really appropriate to the spirit of the article, | but it seems like in this case there is a right answer, and it's | "Fully separate object types" - it's explicit, prevents errors, | is complete, and while it requires a lot of typing to implement | it doesn't require much complexity. | ablob wrote: | In which of the listed requirements of the article is this | approach better, and why? | cozzyd wrote: | Yeah I agree. Macros can be used to avoid some of the typing in | defining the interface when the implementation really is | common, but unfortunately that makes it harder to document the | generated interface. I wish doxygen had some way of supporting | comments for macro-generated interfaces. | | In defining the implementation, it's easy enough to do | something like this (if the implementation really is common): | static int pdf_ll_foo_impl(pdf_ll_ctx_t c, pdf_ll_type_t t, | ...) { //real implementation goes here, | perhaps switching on t } int | pdf_page_foo(pdf_page_ctx_t c, ..) { | pdf_ll_foo_impl((pdf_ll_ctx_t) c, PDF_PAGE_TYPE, ..); } | int pdf_pattern_foo (pdf_pattern_ctx_t c, ..) { | pdf_ll_foo_impl(pdf_ll_ctx_t) c, PDF_PATTERN_TYPE, ..); } | | which you can also use macros to help generate if you want to. | CyberDildonics wrote: | This is a few paragraphs about the C API of cairo, postscript and | pdf and it doesn't seem all that insightful. | | I wish blog posts wouldn't use some random movie reference as | clickbait when they could explain what their post is about | instead. | fpoling wrote: | What is not considered in the article is to replace pointers to | pages and patterns with handles that are tagged indexes into | internal arrays. | | The big plus is the index tag allows to detect use after free and | other memory safety bugs in wast majority of cases greatly | improving memory safety. | | If one then expose this handle as a generic typedef type over | some integer type, then the API will be not type-safe, but the | type mismatch will be detected very early. | | Another option is to wrap the handle into separated structs for | type safety. Then the caller will need to convert from specific | handle for page, pattern etc. to the base handle when calling | common draw operations. But that will be a simple operation like | page.base or pattern.base, not MYLIB_PATTERN_TO_BASE(pattern). | The drawback is that the caller will be able to construct wrong | handles via struct initialization, but that is a big abuse and | the type mismatch will still be detected at runtime. | Dwedit wrote: | COM (Component Object Model) is compatible with C. It's still | obviously C++ code being shoehorned into C by explicitly | declaring the layout of the object (VTable member), but it does | work. You get inheritance that way, and inter-module compatible | cleanup. The downside is so much boilerplate code to declare the | object, and the performance cost of virtual calls. | | You don't necessarily need the complete COM system. For example, | you could remove use of the Windows Registry, use of IDL files, | remove the `QueryInterface` method, remove the use of GUIDs, | remove the class factories (when just a simple 'create' function | would do), remove Windows API functions related to cross-module | memory management (not using `CoTaskAlloc`). Then it would be | portable to systems that aren't Windows. One thing you can't | remove is specifying a calling convention for the class methods, | because C++ `__thiscall` is not compatible with C code on | Windows. | kelnos wrote: | This is more or less what GObject is, which the author mentions | in passing. It's an OO system for C, but it does require quite | a lot of boilerplate, and you need to manually initialize | vtables when creating new subclasses. You also need to manually | chain up to the superclass in virtual methods, in many places | where it's easy to forget. It's a decent system, all things | considered, but it's just a reminder that C's type system is | very weak and implementing advanced features all but requires | abuse of the preprocessor to avoid unreadable code. | Dwedit wrote: | While you do need to manually initialize a vtable pointer, | that pointer can simply point to a const struct which lives | in the read-only data section. You don't need to allocate a | new vtable with each object or anything like that. | | And you can do COM objects in C without using the | preprocessor at all (outside of the "are we C++ or not" | condition, then you could use real classes instead) | morelisp wrote: | This is basically what the author is referring to with GObject. | asveikau wrote: | I think Mozilla has historically had a lot of COM outside of | Windows. But there's been a goal to remove it: | https://wiki.mozilla.org/Gecko:DeCOMtamination | | I believe VirtualBox is another project with COM on other | platforms. I see lots of GUIDs and HRESULTs in their error | messages. | | I actually really like the COM style when done well. HRESULT, | the idea of somewhat standardized error handling that sub- | divides the space of a 32-bit integer into various subsystem- | specific error codes, is one of my favorite ideas from there. | | Some things are not so nice. For example, everything being a | virtual call is not good for performance. Reference counting is | also great but over-use of it is also not great for performance | (for example, in C++ it's considered poor form to make | everything std::shared_ptr<> when you can get away with less). | Dwedit wrote: | The goal to remove it applied to internal use, for things | which aren't exported. Which makes perfect sense, COM-like | interfaces are only needed when you cross module boundaries, | and need an fixed ABI. | pavlov wrote: | _> "Then it would be portable to systems that aren 't Windows"_ | | I've seen this type of "COM Lite" used for cross-platform | plugin and driver APIs. For example Blackmagic Design, a | manufacturer of pro video capture hardware, provides an SDK | that is essentially identical on Windows, Linux and Mac using | this design. | spacechild1 wrote: | Another example would be the VST3 SDK. | tedunangst wrote: | I think the answer is function pointers. | zabzonk wrote: | > you can have functions like pdf_page_cmd_l(page, x, y) | | oh, no, please don't. | | just use c++ and you can have namespaces or classes to control | visibility - there is no need to take on all the other c++ stuff | if you don't want it. | cozzyd wrote: | Aside from the interlanguage interop issues, c++-like | visibility control (via private) makes ABI compatibility | essentially impossible. | lelanthran wrote: | But then it stops being a general tool that is used by python, | PHP, java, lisp, C++, rust, nim, zig, lisp and becomes a | specific tool for C++ programs. | synergy20 wrote: | C is truly the only universal common denominator for all | other languages, C++ indeed is only for itself. | wruza wrote: | How would you use that in this case? These cmd_l ops are just | similar ops to different objects, like pen on paper vs. brush | on canvas. They don't operate on a single type of object. | | I'd say that C++ way is a bad idea here, because it usually | begs for some compoinherimorphism with operator overloads that | makes things 10x worse for the cost of one additional | signature. | zabzonk wrote: | namespace pdf_page { dunno cmd_l (whatever); } | wruza wrote: | This is purely cosmetic and doesn't save you anything. | Also, Cairo API (and most C APIs in general) already use | <lib>_<class>_<method> naming scheme. E.g. | cairo_svg_surface_create(), gtk_container_get_children(). | [deleted] | zabzonk wrote: | it is not cosmetic or a naming scheme - the language and | compiler enforce it. | morelisp wrote: | None of the concerns are about the name. | qsort wrote: | Even if you're using C++ internally, you're likely exposing a C | API behind extern C, so you don't have access to those features | at the API boundary. | ar-nelson wrote: | I've had a lot of ideas for cross-language libraries that would | need a C API, and this issue always comes up. The idea I had | several years ago---but never implemented, because most of the | projects I'd use this for are in limbo because I never seem to | finish anything---is an API with only one function, which takes | JSON and returns JSON, possibly via JSON-RPC. Basically a library | that pretends it's a remote service. Slow, yes, but not as slow | as some alternatives, and it makes FFI setup with other languages | easy. | CyberDildonics wrote: | Software that takes in text and outputs text is literally every | command line program. | ar-nelson wrote: | Yes, but it's rare to see linked libraries that use this as | their API, even though it would greatly simplify FFI. | CyberDildonics wrote: | That's the exact point. Why would someone use a linked | library if speed doesn't matter and they are passing text | back and forth to be parsed? | | That's a terrible way to use a FFI and you would still deal | with all the tricky parts. | | If that's what you need, you would write a separate stand | alone program and call that. | lelanthran wrote: | > is an API with only one function, which takes JSON and | returns JSON, possibly via JSON-RPC. | | I've done this one, and once only. I wouldn't do it again | because the pain point is the lack of typing. | | Yeah, yeah, I know, you've read all these blogs everywhere | about how C is not type-safe, how C is weakly-typed, etc, but | it's a damn sight better than runtime errors because something | emitted valid JSON that missed a field, or has the field in the | wrong child, or the field is of the incorrect type, etc. | | If you're sending and receiving messages to another part of the | program, using an untyped interface with runtime type-checking | is the worst way to do it; the errors will not stop coming. | | Every single time your FFI functions are entered, the function | must religiously type-check every parameter, which means that | every FFI call made has to now handle an extra possible error | that may be returned - invalid params. | | Every single time your FFI function return, the caller must | religously type-cechk the response, which means that _the | caller itself_ may return an extra possible error - bad | response. | | Having the compiler check the types is so much better. C | enforces types on everything[1], almost everywhere. Take the | type enforcement. | | [1] Unless the type check is explicitly and intentionally | disabled by the programmer | twic wrote: | This is a nice concrete example of a situation where inheritance | is useful for program design. | | I think i'd go for the "object oriented" approach, but with | convenience functions to avoid explicit upcasts. Start with three | types: cairo_t /* a generic context, could be a | page or a pattern */ cairo_page_t cairo_pattern_t | | Functions only defined on pages take a page: void | pdf_page_cmd_bob(cairo_page_t* ctx); | | Functions defined on both take a generic context: | void pdf_ctx_cmd_l(cairo_t* ctx, int x, int y); | | Then you need some way to upcast from the child types to the | parent (which would be implicit in C++ etc): | cairo_t* pdf_page_to_ctx(cairo_page_t* ctx); cairo_t* | pdf_pattern_to_ctx(cairo_pattern_t* ctx); | | So a call looks like: | pdf_ctx_cmd_l(pdf_page_to_ctx(page), 10, 20); | | But we can generate this: void | pdf_page_cmd_l(cairo_page_t* ctx, int x, int y) { | pdf_ctx_cmd_l(pdf_page_to_ctx(ctx), x, y); } | | Which lets users write this: pdf_page_cmd_l(page, | 10, 20); | | The convenience functions could even be macros. There would be no | loss of type safety from using macros that way. There would need | to be a lot of convenience functions or macros, but they are | trivial, and so could be generated by a script (or another | macro!). | fanf2 wrote: | I have implemented this using the gcc/clang "transparent union" | extension, which eliminates the need for explicit casting or | helpers. | | https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.ht... ___________________________________________________________________ (page generated 2023-04-16 23:00 UTC)