[HN Gopher] Typed Config Languages
       ___________________________________________________________________
        
       Typed Config Languages
        
       Author : nalgeon
       Score  : 52 points
       Date   : 2022-01-20 06:36 UTC (1 days ago)
        
 (HTM) web link (kevincox.ca)
 (TXT) w3m dump (kevincox.ca)
        
       | milliams wrote:
       | Can anyone else read that white text on a light grey background?
        
         | kevincox wrote:
         | Author: Oops, I fixed the inline code snippets but accidentally
         | broke the blocks. I'll push out a fix shortly.
         | 
         | Edit: Should be fixed (might need a hard reload). It turns out
         | inverting something twice doesn't really do much.
        
         | Zababa wrote:
         | The white and orange/brown are hard to read for me. Both have
         | less than 1.5 contrast in the chrome dev tools. To continue on
         | the nitpicks, I think there's a typo, the procedural macros
         | have "drive" instead of "derive". And the boxes of code taking
         | all the page compared to the "terminal" text make it a bit hard
         | to "get" the flow of the page.
         | 
         | Other than that it was a nice read, and the grey background is
         | nice and easy on the eyes. I really like the breadcrumbs too,
         | they act as a minimalist menu and easily clickable URL.
        
           | kevincox wrote:
           | Thanks for the feedback. I think now that the bug with the
           | syntax highlighting is fixed the accessibility should be ok.
           | I need to find a better way to test though because I am using
           | the "invert" hack for the light theme and the build-in
           | accessibility checker for Chromium and Firefox don't appear
           | to take this into consideration. I'll need to find a basic
           | calculator and do the math manually when I have time.
           | 
           | Typo fixed.
           | 
           | Thanks for the flow feedback, I thought the full-width code
           | was cool and it can help for wide code without making the
           | page text too wide (it allows the code to grow to the right)
           | but maybe it is more confusing than it is worth. And thanks
           | for the feedback on the background and breadcrumbs, it is
           | good to hear both positive and negative thoughts.
        
             | Zababa wrote:
             | Thanks, it's way easier to read now. The flow feedback is
             | very personal, feel free to ignore it. After rewatching the
             | article, I like how unique it is.
        
       | Cyberdog wrote:
       | Looks like nobody has mentioned TOML yet, so I will. It's
       | basically INI syntax with stricter rules and can be used as such
       | if you wish, but also supports some extra features like arrays,
       | dictionaries/tables, timestamp values, nested sections, and so
       | on. It's a joy to use in the rare times that I'm able to use it.
       | 
       | https://toml.io/en/
        
         | kstrauser wrote:
         | TOML's great, although I'm not in love with the syntax for
         | nested sections.
        
         | jer0me wrote:
         | Tom's Obvious Markup Language, as in Tom Preston-Werner of
         | GitHub
        
       | AaronFriel wrote:
       | I've increasingly found myself drawn to writing declarations as
       | programs, not as config in any plain textual format.
       | 
       | Whether it's going all the way in the direction of supporting
       | general purpose languages like Pulumi does, or with a niche but
       | still Turing-complete language like Nix's expression language, or
       | even Dhall, which other commentators have mentioned. That isn't
       | to say there is no place for simple, human readable schema. I
       | think these tools need a fallback to something simpler.
       | 
       | Nothing these tools do couldn't be emulated by manual processes
       | of hand-writing complex makefiles or YAML or whatever, but what
       | is striking to me is the use of general purpose languages,
       | usually with tooling to do type-checking and IDE assistance to
       | writing these things, which lowers the barrier to entry and
       | empowers someone who "just knows (Python|JavaScript|Go|...)" to
       | contribute in a familiar environment.
       | 
       | A couple more examples:
       | 
       | Envoy: I have had the displeasure, recently, of writing by-hand a
       | configuration for the Envoy load balancer. Envoy uses a "typed
       | config language", which is often represented as YAML or JSON, but
       | it's very painful to write by hand. On the other hand, if you
       | have the protocol buffers/gRPC schema available in code, it's
       | vastly less painful to use any programming language to build the
       | typed objects and then export to plain text. The xDS protocol is
       | designed for being interacted with via programs, not plaintext.
       | 
       | GitLab CI: GitLab supports writing a program, which runs as part
       | of the CI/CD job, to generate the configuration for a subsequent
       | pipeline. This makes writing complex jobs, or repetitive monorepo
       | tasks much simpler. A 10 line Python program that effectively
       | does "for each folder in `python`, emit this block of YAML" is
       | incredibly powerful.
       | 
       | That last example is salient to me: markup languages are really
       | easy to parse, but challenging to read when they're dynamic.
       | Wouldn't it be nice to be able to mix and match? YAML/JSON where
       | it makes sense and the intent and meaning is self-evident, and to
       | write code where you need dynamism?
       | 
       | Full disclosure: I work for Pulumi, opinions are my own, etc.
        
         | jandrese wrote:
         | In the past it was fairly common to write configuration files
         | in TCL and just running the script to parse them.
         | 
         | This was generally considered to be a mistake. It opens up a
         | huge threat surface for your program and few people ended up
         | using the more advanced capabilities it created. If your config
         | format supports simple globbing that covers the 95% use case
         | for otherwise needing a turing compete language for your
         | configuration files.
        
           | blacksqr wrote:
           | > This was generally considered to be a mistake.
           | 
           | Pity. Tcl allows you to create safe interpreters within which
           | you can disable any commands you want in order to have a
           | trustworthy environment for running configuration scripts.
           | 
           | Tcl itself uses them to build its internal list of available
           | package modules.
        
           | AaronFriel wrote:
           | I think that speaks to having that flexibility when needed.
           | The other 5% of use cases are significant too.
        
             | jandrese wrote:
             | The other 5% is generally solved by people who write
             | scripts to generate the config files. It's not like they're
             | up a creek.
        
           | kaba0 wrote:
           | That's why I think Dhall is really interesting, it being
           | purposefully not Turing-complete, yet still very flexible.
        
       | tromp wrote:
       | Dhall [1] [2] looks promising to me.
       | 
       | [1] http://dhall-lang.org/
       | 
       | [2] https://github.com/dhall-lang/dhall-lang
        
         | corysama wrote:
         | Does anyone know if using Dhall to generate JSON that is
         | validated using Cue a sensical idea? I don't know enough about
         | them to be sure.
        
           | kevinmgranger wrote:
           | dhall serves as the validation layer itself.
           | 
           | Unless you're already consuming some other API that publishes
           | Cue validation, in which case dhall is just the templating
           | language.
        
       | vzaliva wrote:
       | I am using Cerberus schema to validate my YAML configs:
       | 
       | https://docs.python-cerberus.org/en/stable/schemas.html
        
       | msoad wrote:
       | Like it or not YAML configuration files are everywhere. I've had
       | a lot of luck using JSON Schema with YAML config files. Luckily
       | VSCdoe and possibly many other editors can be configured to
       | provide type hints for completion. Using any available JSONSchema
       | checker you can validate your config files in CI and elsewhere.
       | 
       | Most of the time, what's missing an accurate JSON Schema for a
       | configuration. I usually encourage owners of those configurations
       | to sit down and write it for everyone to benefit from
        
       | pietroppeter wrote:
       | I like the approach of strictyaml. A parser that concentrates on
       | a restricted subset of yaml and allows to use a schema to have a
       | type safe validator.
       | 
       | https://github.com/crdoconnor/strictyaml
        
       | nicoburns wrote:
       | If we're on the topic of config languages, I'd like to plug Gura
       | (https://github.com/gura-conf/gura). It's not too well-known, but
       | it probably has the best design I've seen, and seems to have a
       | good coverage of languages with an available library.
        
       | milliams wrote:
       | YAML's parsing of `no` as `False` has not been part of the spec
       | for 13 years now. It was changed in YAML 1.2 in 2009 to only be
       | `true` and `false` (with variations in case allowed I think).
        
       | kbd wrote:
       | As has come up in this thread already, any discussion of typed
       | config languages nowadays that doesn't mention Cue
       | (https://cuelang.org/) seems incomplete. They really seem to be
       | tackling the problem in a thorough way. I hope it catches on.
       | 
       | For anyone who knows more about Cue: right now you can go from
       | Cue<->yaml (in fact, their docs on yaml also use the "no" case as
       | an example: https://cuelang.org/docs/integrations/yaml/) to
       | integrate with existing systems, but I suppose eventually the
       | goal would be to have direct support in libraries like Serde?
        
         | kevincox wrote:
         | (disclaimer: author)
         | 
         | Cue is a very cool language, but it is quite different than the
         | "typed config language" that I have described here. Maybe I
         | picked a poor title but in the post I am talking about using
         | the type information to "improve" parsing. IIUC Cue does not
         | due this, it parses in a "dynamically typed" manor, then uses
         | the type system to evaluate the turing complete (or close to
         | it) expression language.
        
           | kbd wrote:
           | Yeah that's why I was asking about Cue's eventual goals with
           | libraries like Serde. I assume eventually they'd like to be
           | able to auto-generate type definitions for a target language,
           | but I don't know.
           | 
           | > Cue does not due this, it parses in a "dynamically typed"
           | manor, then uses the type system to evaluate the turing
           | complete (or close to it) expression language.
           | 
           | As I understand it Cue would help in two ways currently. 1.
           | It would be able to type-check existing yaml files to catch
           | things like the "no" case. 2. if you write your config in
           | Cue, it would output properly-typed yaml to avoid things like
           | "no".
        
             | kevincox wrote:
             | Yes. I agree. It would "prevent" the "no case" by returning
             | an error on parse/evaluation. However the solution
             | described here can do better. It can correctly parse the no
             | case. Basically by knowing it is parsing a string the
             | grammar can be simpler, it doesn't have to decide if it is
             | a int/bool/string anymore.
        
       | yegle wrote:
       | Re the first note in the post: a good serialization format is
       | both easy to read by machine and read/write by human. I think the
       | text protobuf file is one of such example. A (human read/write-
       | able) config language needs to be consumed by program anyway, in
       | a sense a config language is a human-to-computer serialization
       | format.
        
       | [deleted]
        
       | usrbinbash wrote:
       | > Statically typed programming languages are catching on so why
       | don't we extend this typing to our config files?
       | 
       | Because I simply don't want to expend the same amount of
       | cognitive load to read config files as I do for code.
       | 
       | Yes, yaml has some minor ambiguities. These are easily solved. To
       | use the example from the article:                   countries:
       | - ca             - "no"             - us
       | 
       | There, done. The problem was solved with 2 extra characters and
       | remembering the fact that `no` is special in yaml. Comparing that
       | to the amount of typing I have to do to define a scheme, the
       | syntax of which I have to learn, which I also have to read or
       | remember and keep in mind every time I read the config, I take
       | the 2 extra double-quotes.
       | 
       | And, speaking of statically typed languages: This problem would
       | be caught immediately anyway if the config is read into static
       | types.
        
         | andrewzah wrote:
         | "There, done."
         | 
         | Except, we're not done.
         | 
         | YAML has multiple footguns like this, which I have to remember,
         | forever. And anyone who works with YAML. It's unintuitive and
         | confusing, and costs space in my brain that I really should be
         | using for more important things.
         | 
         | Not to mention that if you -don't- know about these ahead of
         | time, debugging them can be confusing.
         | 
         | A type system is marginally more work for decreased cognitive
         | load and eliminating stupid, idiotic bugs that nobody should
         | have to waste their time tracking down.
         | 
         | With IDE integration, the cost is pretty much negligible other
         | than learning the syntax, which, c'mon, is not difficult and
         | we're being paid to do it.
         | 
         | There are even tools like Dhall [0] that auto-generate yaml for
         | us.
         | 
         | [0]: https://github.com/dhall-lang/dhall-lang
        
           | TrainedMonkey wrote:
           | Would most of the footguns be solved by quoting all of the
           | strings? e.g:                 "countries":         - "ca"
           | - "no"         - "us"
        
             | kevincox wrote:
             | Yes, but now you are losing a lot of the clean syntax that
             | causes most people to use YAML in the first place. There is
             | a reason that most people don't write YAML like JSON with
             | trailing commas and comments, it is nice to cut most of
             | this noise.
        
               | meowface wrote:
               | You can also use a stricter subset of YAML that removes
               | things like the "no" footgun. Plenty of such strict
               | parsers exist across languages. Maybe it's no longer
               | technically YAML at that point, but you get all the nice
               | parts of YAML without having to revamp everything with
               | static typing.
        
           | Spivak wrote:
           | So yes it's a footgun but it makes some sense. Most people
           | wouldn't really complain about true not being equivalent to
           | "true" or 100 not keeping it "100". People just aren't used
           | to yes/no being reserved words. Ruby's klass is a funny
           | workaround to this.                   enable_feature: yes
           | 
           | Is totally natural. Nobody reads the spec though. If you're
           | outputting YAML documents with string builders you're headed
           | for ruin no matter what. You don't need Dhall, you need
           | yaml.dump which handles the types too.
        
         | CBLT wrote:
         | I agree that the problem is one of quicker feedback - the dev
         | cycle should involve a program checking the yaml correctness
         | (using types or otherwise) straight away and giving a useful
         | error message. Too often I've seen incorrect yaml checked into
         | git that fails with a cryptic error when deploying the
         | application.
         | 
         | The strength of types, in my opinion, is composability. Most
         | config files I've seen have ultimately pulled in input from
         | another source and used that to create their output. Types
         | would allow the configuration to be checked for correctness
         | even in the face of unknowns.
        
         | kevincox wrote:
         | > Comparing that to the amount of typing I have to do to define
         | a scheme
         | 
         | From my use case of config files the code that is reading them
         | knows the type anyways. So for a setup like Rust+serde there is
         | no overhead to set this up.
         | 
         | > This problem would be caught immediately anyway if the config
         | is read into static types.
         | 
         | That is true, but it still breaks you out of your flow. You get
         | a confusing error, it probably doesn't tell you the exact line
         | number and you need to look over your changes. If you changed a
         | lot of places in the file it may be easy to miss that adding
         | `no` to a list was the mistake. Because problems like that are
         | easy to understand in retrospect, but if you keep reading "no"
         | as "Norway" it is easy to look straight at this mistake and
         | think it is fine before hunting elsewhere in the file.
         | 
         | I think you are right. It is still unclear if the cognitive
         | overhead when writing the file is worth it, but from my point
         | of view the upsides are much more valuable then you make them
         | appear to be.
        
       | simplify wrote:
       | Funny enough, I've implemented a config language that fits
       | exactly this bill https://github.com/gilbert/zaml
       | 
       | An example (also see it in the online editor[0]):
       | users {           andy           beth {             admin true
       | }           carl         }
       | 
       | The author is right that you gain syntax benefits when you define
       | a schema. For those who say this adds cognitive overhead, it
       | actually doesn't; the schema and compiler are able to _reduce_
       | that overhead, because if you make a mistake, you get a nice,
       | accurate error message.
       | 
       | [0]
       | https://gilbert.github.io/zaml/editor.html#s=N4IgzgxgFgpgtgQ...
        
       | dlrush wrote:
       | Just use ruby for advanced config file capabilities:
       | 
       | https://darrenrush.medium.com/ruby-is-the-ultimate-config-fi...
        
       | unwind wrote:
       | Pretty cool, I'm thinking about config file formats at the moment
       | so this was timely.
       | 
       | A minor note since the author seems to be around (and noboby has
       | mentioned it that I could see). There's a typo in the example:
       | allowed-countires
       | 
       | should of course spell "countries" more like I just did. The same
       | error occurs twice.
        
       | alaties wrote:
       | I'm kind of shocked no one brought up protobufs yet. protobuf
       | libraries are available in pretty much all mainstream languages
       | and the textproto format is pretty mature.
       | 
       | It's albeit clunkier and less freeform than YAML. And if you ever
       | only plan on using rust the proposed solution here is probably
       | cleaner.
       | 
       | Having portability over multiple languages maintained by large
       | organizations can be useful in some cases though.
        
       | dqpb wrote:
       | > Statically typed programming languages are catching on so why
       | don't we extend this typing to our config files?
       | 
       | What you really need is Cuelang. Cuelang does graph unification
       | over a type-value lattice. This allows the user to do progressive
       | type -> value refinement (e.g. type->range->value).
       | 
       | For configuration, this is both better than regular type systems,
       | and better than inheritance.
        
       | verdverm wrote:
       | Have you seen CUE? Just so happens v0.4.1 was released today
        
       | vmchale wrote:
       | Is the author aware of Dhall? He might be interested.
       | 
       | I think it's better in general but in any case it gets away from
       | the "everything has to be a keyed map" silliness and you get
       | sums/products.
        
       | brundolf wrote:
       | This is why, in the statically-typed programming language I'm
       | working on, the project manifest is just a file written in the
       | language itself which can export a special (typed) const to
       | configure things like the linter. It gets to piggyback off of all
       | the existing tooling for the language, particularly type checks,
       | and can even be constructed using functions, etc if desired.
        
       ___________________________________________________________________
       (page generated 2022-01-21 23:00 UTC)