[HN Gopher] Plain C API design, the real world Kobayashi Maru test
       ___________________________________________________________________
        
       Plain C API design, the real world Kobayashi Maru test
        
       Author : jmillikin
       Score  : 96 points
       Date   : 2023-04-16 15:17 UTC (7 hours ago)
        
 (HTM) web link (nibblestew.blogspot.com)
 (TXT) w3m dump (nibblestew.blogspot.com)
        
       | nanofortnight wrote:
       | This seems perfect for _Generic.
       | 
       | https://en.cppreference.com/w/c/language/generic
        
       | jesse__ wrote:
       | I've done a few APIs similar in spirit to this one (and one
       | similar in functionality, too), and I've found using a blend of a
       | few of the mentioned methods to be pretty effective.
       | 
       | I start with the basics and implement everything as explicitly
       | (read: type safe) as possible:                   struct
       | bob_params {...}         struct line_params {...}
       | pdf_page_cmd_bob(bob_params *params) { ... }
       | pdf_page_cmd_line(line_params *params) { ... }
       | 
       | Then, when that works and I want to add patterns, which are a
       | superset, I'd add something that is literally a superset of those
       | functions.                   enum pattern_type {
       | patterntype_Bob,           patterntype_Line,         }
       | struct pattern {            pattern_type type;            union {
       | bob_params Bob;              line_params Line;            }
       | }               pdf_pattern_cmd(pattern *pattern) {
       | switch (pattern->type) {              case patterntype_Bob: /* do
       | bob drawing a bunch of times, or whatever */             }
       | }               // The pattern struct has a lot of different
       | names (sum type, discriminated union, algebraic datatype, tagged
       | union .. probably more), but the idea is the 'type' tag tells you
       | which one of the union values to use.
       | 
       | I've tried everything and, as far as I can tell, this gets you
       | the best of all worlds. It's pretty much as type-safe as things
       | get in C. It's extremely flexible; you just mix the fundamentals
       | together as you like when adding higher-order functions. It's
       | fast; the compiler can see exactly what's going on, so you pay
       | minimal runtime cost.
       | 
       | Yes, it's a bunch of typing to get the functions all spelled out,
       | but it's really not that bad considering how easy/obvious the
       | code actually is.
       | 
       | I use this pattern so much I actually wrote a little
       | metaprogramming language that is capable of generating a lot of
       | the boilerplate for you. Link in my bio, if anyone's interested
       | in looking at it.
        
         | twisteriffic wrote:
         | That style speaks to me. Thanks!
        
         | cpeterso wrote:
         | If you're designing a stable API for external users, you might
         | want to lock down your API even more by forward declaring the
         | struct types as opaque in the public header file and only
         | defining the struct members in a private library header file.
         | This prevents users from messing with your library's private
         | state and allows you to change implementation details later
         | without breaking binary compatibility. The disadvantage is that
         | users can't control how the structs are allocated or embed them
         | in their own structs.                 /* foo.h */       struct
         | foo_object;            struct foo_object* foo_create(int flags,
         | ...);            /* foo.c */       #include "foo.h"
         | struct foo_object {         int flags,         ...       };
        
         | lelanthran wrote:
         | Funnily enough that's the most common pattern I see in my
         | personal C code (for example, my little lisp interpreter - http
         | s://github.com/lelanthran/csl/blob/master/src/parser/ato...)
         | but I still recommend using the `Generic` keyword in C.
         | 
         | For the next time I do a pattern like this, I'll be using
         | `Generic` keyword to make the dispatch a compile-time match,
         | not check at runtime.
        
           | jesse__ wrote:
           | Can you elaborate slightly on how you're planning on using
           | _Generic to turn runtime dispatches into compile time ones?
           | I'm not quite putting together how you can do that.
        
             | lelanthran wrote:
             | The example from the wikipedia page for C11 (https://en.wik
             | ipedia.org/wiki/C11_(C_standard_revision)#Chan...) is
             | compile time determination of the function to call:
             | #define cbrt(x) _Generic((x), long double: cbrtl, \
             | default: cbrt, \                                    float:
             | cbrtf)(x)
             | 
             | In code you'll write `cbrt(foo)` and the correct function
             | will be called for the type of foo. As I understand it, the
             | `_Generic` selection is performed at compile time.
        
           | [deleted]
        
           | JonChesterfield wrote:
           | Recommend `__attribute__((overloadable))` instead of
           | _Generic. The former opts into C++ style name mangling, which
           | is a bit of a mess in C, but interacts well with `static` as
           | forwarding wrappers in a header. The latter is a ridiculous
           | mess invented by the C committee.
        
           | KerrAvon wrote:
           | I think you may be disappointed. Every single time I've tried
           | to use `_Generic`, I've found that it's more trouble than
           | it's worth. They seem to have made it be useful for a very
           | narrow case -- tgmath.h -- and not bothered to make it
           | general enough to be applicable to a wide variety of things
           | that you might like to use it with.
        
         | gabereiser wrote:
         | can we just agree that PDF sucks? Sometime's the API must
         | operate on a spec that is unruly to begin with. I like this
         | approach. It's about the best you can get when dealing with
         | this without completely reinventing the storage format to
         | support the API.
        
           | jesse__ wrote:
           | 0MgZz yea PDF has got to be one of the worst specs ever
           | created. It's amazing PDF viewers work at all.
        
       | mintplant wrote:
       | Composition?                   typedef struct { /* ... */ }
       | context;         typedef struct {           context ctx;
       | /* ... */         } foo_context;         typedef struct {
       | context ctx;           /* ... */         } bar_context;
       | void some_general_method(context* ctx, int a, int b);
       | void some_foo_specific_method(foo_context* foo_ctx, int c, int
       | d);              foo_context* foo_ctx;
       | some_foo_specific_method(foo_ctx, 0, 0);
       | some_general_method(&foo_ctx->ctx, 1, 1);
       | 
       | For checked downcasts, you could include an enum type tag inside
       | `context` and have something like `foo_context*
       | as_foo_context(context* ctx)` for downcasts, which, if `context`
       | is at the beginning of `foo_context`, could just check the tag
       | and cast the `ctx` pointer to `foo_context*` (or do a little
       | pointer arithmetic if `context` is somewhere else in the
       | containing struct). Return `NULL` (or assert) if the tag doesn't
       | match.
        
         | uecker wrote:
         | I use this technique a lot and it is very powerful.
        
         | dromtrund wrote:
         | +1, this approach will also highlight cases where you're trying
         | to generalize something that isn't as generic as you thought.
         | 
         | I'd argue that the need for downcasting in a method working on
         | the inner context would also be a code smell, and that you
         | might want to reconsider the context split. In some cases,
         | there's no way around it (like async events), but it might be
         | more appropriate to pass additional context or a callback
         | instead, to avoid a circular dependency.
        
       | joeatwork wrote:
       | I know it isn't really appropriate to the spirit of the article,
       | but it seems like in this case there is a right answer, and it's
       | "Fully separate object types" - it's explicit, prevents errors,
       | is complete, and while it requires a lot of typing to implement
       | it doesn't require much complexity.
        
         | ablob wrote:
         | In which of the listed requirements of the article is this
         | approach better, and why?
        
         | cozzyd wrote:
         | Yeah I agree. Macros can be used to avoid some of the typing in
         | defining the interface when the implementation really is
         | common, but unfortunately that makes it harder to document the
         | generated interface. I wish doxygen had some way of supporting
         | comments for macro-generated interfaces.
         | 
         | In defining the implementation, it's easy enough to do
         | something like this (if the implementation really is common):
         | static int pdf_ll_foo_impl(pdf_ll_ctx_t c, pdf_ll_type_t t,
         | ...)         {              //real implementation goes here,
         | perhaps switching on t         }               int
         | pdf_page_foo(pdf_page_ctx_t c, ..) {
         | pdf_ll_foo_impl((pdf_ll_ctx_t) c, PDF_PAGE_TYPE, ..); }
         | int pdf_pattern_foo (pdf_pattern_ctx_t c, ..) {
         | pdf_ll_foo_impl(pdf_ll_ctx_t) c, PDF_PATTERN_TYPE, ..); }
         | 
         | which you can also use macros to help generate if you want to.
        
       | CyberDildonics wrote:
       | This is a few paragraphs about the C API of cairo, postscript and
       | pdf and it doesn't seem all that insightful.
       | 
       | I wish blog posts wouldn't use some random movie reference as
       | clickbait when they could explain what their post is about
       | instead.
        
       | fpoling wrote:
       | What is not considered in the article is to replace pointers to
       | pages and patterns with handles that are tagged indexes into
       | internal arrays.
       | 
       | The big plus is the index tag allows to detect use after free and
       | other memory safety bugs in wast majority of cases greatly
       | improving memory safety.
       | 
       | If one then expose this handle as a generic typedef type over
       | some integer type, then the API will be not type-safe, but the
       | type mismatch will be detected very early.
       | 
       | Another option is to wrap the handle into separated structs for
       | type safety. Then the caller will need to convert from specific
       | handle for page, pattern etc. to the base handle when calling
       | common draw operations. But that will be a simple operation like
       | page.base or pattern.base, not MYLIB_PATTERN_TO_BASE(pattern).
       | The drawback is that the caller will be able to construct wrong
       | handles via struct initialization, but that is a big abuse and
       | the type mismatch will still be detected at runtime.
        
       | Dwedit wrote:
       | COM (Component Object Model) is compatible with C. It's still
       | obviously C++ code being shoehorned into C by explicitly
       | declaring the layout of the object (VTable member), but it does
       | work. You get inheritance that way, and inter-module compatible
       | cleanup. The downside is so much boilerplate code to declare the
       | object, and the performance cost of virtual calls.
       | 
       | You don't necessarily need the complete COM system. For example,
       | you could remove use of the Windows Registry, use of IDL files,
       | remove the `QueryInterface` method, remove the use of GUIDs,
       | remove the class factories (when just a simple 'create' function
       | would do), remove Windows API functions related to cross-module
       | memory management (not using `CoTaskAlloc`). Then it would be
       | portable to systems that aren't Windows. One thing you can't
       | remove is specifying a calling convention for the class methods,
       | because C++ `__thiscall` is not compatible with C code on
       | Windows.
        
         | kelnos wrote:
         | This is more or less what GObject is, which the author mentions
         | in passing. It's an OO system for C, but it does require quite
         | a lot of boilerplate, and you need to manually initialize
         | vtables when creating new subclasses. You also need to manually
         | chain up to the superclass in virtual methods, in many places
         | where it's easy to forget. It's a decent system, all things
         | considered, but it's just a reminder that C's type system is
         | very weak and implementing advanced features all but requires
         | abuse of the preprocessor to avoid unreadable code.
        
           | Dwedit wrote:
           | While you do need to manually initialize a vtable pointer,
           | that pointer can simply point to a const struct which lives
           | in the read-only data section. You don't need to allocate a
           | new vtable with each object or anything like that.
           | 
           | And you can do COM objects in C without using the
           | preprocessor at all (outside of the "are we C++ or not"
           | condition, then you could use real classes instead)
        
         | morelisp wrote:
         | This is basically what the author is referring to with GObject.
        
         | asveikau wrote:
         | I think Mozilla has historically had a lot of COM outside of
         | Windows. But there's been a goal to remove it:
         | https://wiki.mozilla.org/Gecko:DeCOMtamination
         | 
         | I believe VirtualBox is another project with COM on other
         | platforms. I see lots of GUIDs and HRESULTs in their error
         | messages.
         | 
         | I actually really like the COM style when done well. HRESULT,
         | the idea of somewhat standardized error handling that sub-
         | divides the space of a 32-bit integer into various subsystem-
         | specific error codes, is one of my favorite ideas from there.
         | 
         | Some things are not so nice. For example, everything being a
         | virtual call is not good for performance. Reference counting is
         | also great but over-use of it is also not great for performance
         | (for example, in C++ it's considered poor form to make
         | everything std::shared_ptr<> when you can get away with less).
        
           | Dwedit wrote:
           | The goal to remove it applied to internal use, for things
           | which aren't exported. Which makes perfect sense, COM-like
           | interfaces are only needed when you cross module boundaries,
           | and need an fixed ABI.
        
         | pavlov wrote:
         | _> "Then it would be portable to systems that aren 't Windows"_
         | 
         | I've seen this type of "COM Lite" used for cross-platform
         | plugin and driver APIs. For example Blackmagic Design, a
         | manufacturer of pro video capture hardware, provides an SDK
         | that is essentially identical on Windows, Linux and Mac using
         | this design.
        
           | spacechild1 wrote:
           | Another example would be the VST3 SDK.
        
       | tedunangst wrote:
       | I think the answer is function pointers.
        
       | zabzonk wrote:
       | > you can have functions like pdf_page_cmd_l(page, x, y)
       | 
       | oh, no, please don't.
       | 
       | just use c++ and you can have namespaces or classes to control
       | visibility - there is no need to take on all the other c++ stuff
       | if you don't want it.
        
         | cozzyd wrote:
         | Aside from the interlanguage interop issues, c++-like
         | visibility control (via private) makes ABI compatibility
         | essentially impossible.
        
         | lelanthran wrote:
         | But then it stops being a general tool that is used by python,
         | PHP, java, lisp, C++, rust, nim, zig, lisp and becomes a
         | specific tool for C++ programs.
        
           | synergy20 wrote:
           | C is truly the only universal common denominator for all
           | other languages, C++ indeed is only for itself.
        
         | wruza wrote:
         | How would you use that in this case? These cmd_l ops are just
         | similar ops to different objects, like pen on paper vs. brush
         | on canvas. They don't operate on a single type of object.
         | 
         | I'd say that C++ way is a bad idea here, because it usually
         | begs for some compoinherimorphism with operator overloads that
         | makes things 10x worse for the cost of one additional
         | signature.
        
           | zabzonk wrote:
           | namespace pdf_page { dunno cmd_l (whatever); }
        
             | wruza wrote:
             | This is purely cosmetic and doesn't save you anything.
             | Also, Cairo API (and most C APIs in general) already use
             | <lib>_<class>_<method> naming scheme. E.g.
             | cairo_svg_surface_create(), gtk_container_get_children().
        
               | [deleted]
        
               | zabzonk wrote:
               | it is not cosmetic or a naming scheme - the language and
               | compiler enforce it.
        
               | morelisp wrote:
               | None of the concerns are about the name.
        
         | qsort wrote:
         | Even if you're using C++ internally, you're likely exposing a C
         | API behind extern C, so you don't have access to those features
         | at the API boundary.
        
       | ar-nelson wrote:
       | I've had a lot of ideas for cross-language libraries that would
       | need a C API, and this issue always comes up. The idea I had
       | several years ago---but never implemented, because most of the
       | projects I'd use this for are in limbo because I never seem to
       | finish anything---is an API with only one function, which takes
       | JSON and returns JSON, possibly via JSON-RPC. Basically a library
       | that pretends it's a remote service. Slow, yes, but not as slow
       | as some alternatives, and it makes FFI setup with other languages
       | easy.
        
         | CyberDildonics wrote:
         | Software that takes in text and outputs text is literally every
         | command line program.
        
           | ar-nelson wrote:
           | Yes, but it's rare to see linked libraries that use this as
           | their API, even though it would greatly simplify FFI.
        
             | CyberDildonics wrote:
             | That's the exact point. Why would someone use a linked
             | library if speed doesn't matter and they are passing text
             | back and forth to be parsed?
             | 
             | That's a terrible way to use a FFI and you would still deal
             | with all the tricky parts.
             | 
             | If that's what you need, you would write a separate stand
             | alone program and call that.
        
         | lelanthran wrote:
         | > is an API with only one function, which takes JSON and
         | returns JSON, possibly via JSON-RPC.
         | 
         | I've done this one, and once only. I wouldn't do it again
         | because the pain point is the lack of typing.
         | 
         | Yeah, yeah, I know, you've read all these blogs everywhere
         | about how C is not type-safe, how C is weakly-typed, etc, but
         | it's a damn sight better than runtime errors because something
         | emitted valid JSON that missed a field, or has the field in the
         | wrong child, or the field is of the incorrect type, etc.
         | 
         | If you're sending and receiving messages to another part of the
         | program, using an untyped interface with runtime type-checking
         | is the worst way to do it; the errors will not stop coming.
         | 
         | Every single time your FFI functions are entered, the function
         | must religiously type-check every parameter, which means that
         | every FFI call made has to now handle an extra possible error
         | that may be returned - invalid params.
         | 
         | Every single time your FFI function return, the caller must
         | religously type-cechk the response, which means that _the
         | caller itself_ may return an extra possible error - bad
         | response.
         | 
         | Having the compiler check the types is so much better. C
         | enforces types on everything[1], almost everywhere. Take the
         | type enforcement.
         | 
         | [1] Unless the type check is explicitly and intentionally
         | disabled by the programmer
        
       | twic wrote:
       | This is a nice concrete example of a situation where inheritance
       | is useful for program design.
       | 
       | I think i'd go for the "object oriented" approach, but with
       | convenience functions to avoid explicit upcasts. Start with three
       | types:                 cairo_t /* a generic context, could be a
       | page or a pattern */       cairo_page_t       cairo_pattern_t
       | 
       | Functions only defined on pages take a page:                 void
       | pdf_page_cmd_bob(cairo_page_t* ctx);
       | 
       | Functions defined on both take a generic context:
       | void pdf_ctx_cmd_l(cairo_t* ctx, int x, int y);
       | 
       | Then you need some way to upcast from the child types to the
       | parent (which would be implicit in C++ etc):
       | cairo_t* pdf_page_to_ctx(cairo_page_t* ctx);       cairo_t*
       | pdf_pattern_to_ctx(cairo_pattern_t* ctx);
       | 
       | So a call looks like:
       | pdf_ctx_cmd_l(pdf_page_to_ctx(page), 10, 20);
       | 
       | But we can generate this:                 void
       | pdf_page_cmd_l(cairo_page_t* ctx, int x, int y) {
       | pdf_ctx_cmd_l(pdf_page_to_ctx(ctx), x, y);       }
       | 
       | Which lets users write this:                 pdf_page_cmd_l(page,
       | 10, 20);
       | 
       | The convenience functions could even be macros. There would be no
       | loss of type safety from using macros that way. There would need
       | to be a lot of convenience functions or macros, but they are
       | trivial, and so could be generated by a script (or another
       | macro!).
        
         | fanf2 wrote:
         | I have implemented this using the gcc/clang "transparent union"
         | extension, which eliminates the need for explicit casting or
         | helpers.
         | 
         | https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.ht...
        
       ___________________________________________________________________
       (page generated 2023-04-16 23:00 UTC)