= Config files and their formats There are thousands of configuration files on your computer. You may never directly interact with the bulk of them, but they're scattered throughout your `/etc` folder, and in `~/.config` and `~/.local` and `/usr`. There are probably some in `/var` and possibly even in `/opt`. If you've ever opened one either by accident or to make a change, you may have wondered why some configuration files look one way while others look completely different. Storing configurations is a flexible task, because as long as a developer knows how their code puts data into a file, they can easily write code to extract that data as needed. However, the tech industry graciously favours well-documented standardization, and so several well-known formats have developed over the years to make configuration easy. == Why we need configuration Configuration files ("config files" for short) are important to modern computing. They're what allow you to customize how you interact with an application, or how an application interacts with the rest of your system. It's thanks to config files that any time you launch an application, it has "memories" of how you like to use it. Configuration files can be, and often are, very simple in structure. For instance, if your were to write an application and the only thing it ever needed to know was the preferred name of its user, then its one and only config file could contain exactly one word: the name of the user. [source,bash] ---- Tux ---- Usually, though, there's more than just one piece of information an application needs to keep track of, so configuration generally uses a key and a value: [source,bash] ---- NAME='Tux' SPECIES='Penguin' ---- Even without programming experience, you can imagine how code parses that data. Here's a very simple example using the https://opensource.com/article/20/9/awk-ebook[`awk` command] to focus in on just the line containing the "key" of `NAME`, and then to return the "value" appearing after the equal sign (`=`): [source,bash] ---- $ awk -F'=' '/NAME/ { print $2; }' myconfig.ini ---- The same principle applies for any programming language and any configuration file. As long as you have a consistent data structure, you can write simple code to extract and parse it when necessary. == Choose a format To be broadly effective, the most important thing about configuration files is that they are consistent and predictable. The last thing you want to do is dump information into a file under the auspices of saving user preferences, and then spend days writing code to reverse engineer the random bits of information that have ended up in the file. There are several popular formats for configuration files, each with their own strengths. === INI INI files take the format of key and value pairs: [source,bash] ---- [example] name=Tux style=widgety,fidgety enabled=1 ---- This simple style of configuration can be intuitive, with the only point of confusion being poor key names (for example, cryptic names like `unampref` instead of `name`). They're easy to parse and easy to edit. The INI format features sections in addition to keys and values. In this sample code, `[example]` and `[demo]` are configuration sections: [source,bash] ---- [example] name=Tux style=widgety,fidgety enabled=1 [demo] name=Beastie fullscreen=1 ---- This is a little more complex to parse, because in this case there are _two_ `name` keys. You can imagine a careless programmer querying this config file for `name` and always getting back `Beastie` because that's the last name defined by the file. When parsing such a file, a developer must be careful to search within sections for keys, which can be tricky depending on the language being used to parse the file. It's a popular enough format, however, that most languages have an existing library to help programmers parse INI files. == YAML YAML files are structured lists that can contain values or key and value pairs: [source,bash] ---- --- Example: Name: 'Tux' Style: - 'widgety' - 'fidgety' Enabled: 1 ---- YAML is popular partly because it looks clean. That is, it doesn't have much of a syntax aside from where you place the data in relation to previous data. What's a feature for some, though, is a bug for others, and many developers avoid YAML because of the significance it places on what is essentially _not there_. If you get indentation wrong in YAML, your file may be seen as invalid by YAML parsers, and if tolerated then it may return incorrect data. Most languages have YAML parsers, and there are https://yamllint.readthedocs.io/en/stable/quickstart.html[good open source YAML linters] (applications to validate syntax) available to help you ensure the integrity of a YAML file. == JSON JSON files are technically subsets of YAML, so its data structure is the same although its syntax is completely different: [source,bash] ---- { "Example": { "Name": [ "Tux" ], "Style": [ "widgety", "fidgety" ], "Enabled": 1 } } ---- JSON is popular among Javascript programmers, which isn't surprising, given that JSON stands for JavaScript Object Notation. As a result of being strongly associated with web development, JSON is a common output format for web APIs. Most programming languages have libraries to parse JSON. == XML XML uses tags as keys that surround a configuration value: [source,java] ---- Tux 1 ---- XML is often used by Java programmers, and Java accordingly has a rich set of XML parsers. While it has a reputation of being quite strict, XML is simultaneously very flexible. Unlike HTML, which has a set of tags you're allowed to use, you can arbitrarily invent your own XML tags. As long as you structure it consistently and have a good library to parse it, you can extract your data with precision and ease. There's a great http://www.xmlsoft.org/[xmllint] application to help you validate XML files, and most programming languages have a library to parse XML. == Binary formats Linux prides itself on plain text configuration. The advantage here is that you can see configuration data using basic tools like https://opensource.com/article/19/2/getting-started-cat-command[cat], and you can even edit a configuration with your https://opensource.com/article/21/2/open-source-text-editors[favourite text editor]. Some applications use binary formats, though, meaning that the data is encoded in some format that is not a natural language. These files usually require a special application (usually the application they're meant to configure) to interpret their data. You can't view these files, or at least not in a way that makes any sense, and you can't edit them outside of their host application. Some reasons for resorting to binary formats are: * Speed: A programmer can register specifics bits of information at certain points within a binary config file using custom notation. When the data is extracted, there's no searching involved because everything is already indexed. * Size: Text files can get big, and should you choose to compress a text file, you're functionally turning it into a binary format. Binary files can be made smaller through tricks of encoding (the same is actually true of text files, but at some point your optimizations make your data so obscure that it may as well be binary.) * Obfuscation: Some programmers don't want people looking at even their configuration files, so they encode it as binary data. This usually succeeds only in frustrating users. This is not a good reason to use binary formats. If you must use a binary format for configuration, use one that already exists as an open standard, https://www.unidata.ucar.edu/software/netcdf/[such as NetCDF]. == Find what works Configuration formats are meant to help developers store data their applications need, and users to store preferences for how they want applications to act. There's probably no "wrong" answer to the question of what format you should use, as long as you feel well supported by the language you're using. When developing your application, look at the formats available, model some sample data, review and evaluate the libraries and utilities your programming language provides, and choose the one you feel the most confident about.