Posted on May 31, 2023 • Originally published at brettops.io on May 29, 2023

Loading config files in Python

Config files are everywhere. There are lots of reasons your app might need to have one:

You have configuration that you want to persist beyond a reboot.
Your configuration represents a physical state; for example, it contains the settings for peripheral devices, a stored procedure for accomplishing a task, or maybe it expresses the layout of the live user interface.
Your app's configuration cannot be easily expressed as a series of variables. CI pipelines, workflows, etc. feature a lot of complex nesting, repeated blocks, and even internal linking.
You want the app to be able to persist its own changes to configuration, like changing of windows sizes, menu settings, or credentials. In this case, the config file is functioning more as a database than something the user writes.

In all of these cases, the structure of the config is very important and likely long-lived. Mistakes in your config syntax will be hard to undo, so it pays to have a plan upfront, and design for it to be extended and documented.

In this article, we'll learn how to load YAML config files in a way that is clean, easy to support, and easy to extend. We'll do this by creating our own YAML task automation syntax, which we'll call taskbook files:

# taskbook.yml group: # name of group tasks: # list of tasks - name: # name of task module: # module to use options: # key / value options # ...

We'll write a program to read them, which we'll call Taskable *. When finished, it will be easy to determine fields that are supported, validate config values safely, add more fields for future needs, and even access config values within our program as properties.

*Any similarity to Ansible playbook syntax, real or imagined, is purely coincidental. 😂

Create a command line tool

Let's create a file called taskable.py, to contain our implementation of Taskable:

# taskable.py import argparse def main(): parser = argparse.ArgumentParser() parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() if __name__ == " __main__": main()

This provides the scaffolding for an argparse command line interface (for more info, see our article on Python CLIs).

You can run the script as follows:

python3 taskable.py

$ python3 taskable.py usage: taskable.py [-h] file taskable.py: error: the following arguments are required: file

To be able to read in a file, we need to create the file first, which we'll do next.

Create a taskbook file

I'll be using YAML for the config files because it's easy to read and I'm comfortable with it, but you can easily support JSON or TOML, as they offer similar APIs.

Create a taskbook.yml file and add the following:

# taskbook.yml group: localhost tasks: - name: copy file.txt to the place module: saucy.copy options: source: file.txt dest: /etc/file.txt - name: install a package module: cheesy.package options: name: - fzf - tree upgrade: true - name: enable the service module: lettuce.service options: enable: true start: true

At this point, we'll be able to run the following:

python3 taskable.py taskbook.yaml

However, nothing will happen because our app doesn't print anything yet.

Read in the YAML file

YAML files are easy to read with Python. There are multiple libraries available, but pyyaml is the de facto standard and is often installed on whatever system you're already on.

If you don't have pyyaml (or you're using a virtual environment because you're awesome), install it now:

pip install pyyaml

Then, in your taskable.py file, import the yaml package and read in the YAML file:

import yaml ... data = yaml.safe_load(args.file)

Our taskable.py file so far:

# taskable.py import argparse import yaml def main(): parser = argparse.ArgumentParser() parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() data = yaml.safe_load(args.file) if __name__ == " __main__": main()

At this point, you will be able to read in the YAML file, but there's still no output just yet. We could stop here and access its values as nested dictionaries and arrays, like so:

data["tasks"][0]["module"]

...but there are a couple problems with this.

First, there's no validation at all, so a malformed config file has unpredictable results. Second, strings are opaque data, so IDE auto-completion won't work; changing a field name will require manually searching through the code to do so; and I hope you never misspell a field name.

No, we can do a lot better, and we will, starting by building a model of our data in the next section.

Create the data model

We need a way to express our data format so that it's functional. For this purpose, I prefer to use attrs, which gives us data validation, makes our classes more performant, allows us to access our fields as properties with dramatically less boilerplate, and more.

Let's install attrs:

pip install attrs

Then add the following to your taskable.py file:

from typing import Any ... from attrs import define, field @define class Task: name: str module: str options: dict[str, Any] = field(factory=dict) @define class Taskbook: group: str tasks: list[Task]

Our taskable.py file so far:

# taskable.py import argparse from typing import Any import yaml from attrs import define, field @define class Task: name: str module: str options: dict[str, Any] = field(factory=dict) @define class Taskbook: group: str tasks: list[Task] def main(): parser = argparse.ArgumentParser() parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() data = yaml.safe_load(args.file) if __name__ == " __main__": main()

These two classes—Task and Taskbook—fully express the taskbook format. We won't instantiate them ourselves though, because we'll learn a method to do so automagically in the next section.

Structurize into models

"Structurize" is a $6 word (that I may have made up) that translates to, "load all your data into fancy model classes." I'm using it because "de-serialize" sounds awful and is harder to type. 😝

The easiest way to structurize your YAML data into attrs classes is by using the cattrs package. The simplest usage looks like this:

import cattrs taskbook = cattrs.structure(data, Taskbook)

Let's add it to our taskable.py file:

# taskable.py import argparse from typing import Any import cattrs import yaml from attrs import define, field @define class Task: name: str module: str options: dict[str, Any] = field(factory=dict) @define class Taskbook: group: str tasks: list[Task] def main(): parser = argparse.ArgumentParser() parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() data = yaml.safe_load(args.file) taskbook = cattrs.structure(data, Taskbook) if __name__ == " __main__": main()

That's all you need! cattrs will load the data into attrs classes after only being given the expected top-level class, which is Taskbook here.

If you need to tweak the behavior, cattrs provides a hook mechanism. It's a bit cumbersome, but it's easier than writing all the structurization code from scratch.

In the next section, we'll work on doing something useful with our data.

Use the data

At this point, we've fully structurized our data into classes, which means we can access our config data like this:

taskbook.tasks[0].module

This makes our code much easier to read and work with. Now we'll try using it to do stuff.

"Run" tasks

What good is our script if it can't run tasks? Let's add something to simulate "running" our hypothetical tasks, by adding the following to our taskable.pyfile:

... print("group", taskbook.group) for task in taskbook.tasks: print(f"run {task.module}: {task.name}") ...

Our taskable.py file so far:

# taskable.py import argparse from typing import Any import cattrs import yaml from attrs import define, field @define class Task: name: str module: str options: dict[str, Any] = field(factory=dict) @define class Taskbook: group: str tasks: list[Task] def main(): parser = argparse.ArgumentParser() parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() data = yaml.safe_load(args.file) taskbook = cattrs.structure(data, Taskbook) print("group", taskbook.group) for task in taskbook.tasks: print(f"run {task.module}: {task.name}") if __name__ == " __main__": main()

Running our hypothetical tasks will output the following:

python3 taskable.py taskbook.yml

$ python3 taskable.py taskbook.yml group localhost run saucy.copy: copy file.txt to the place run cheesy.package: install a package run lettuce.service: enable the service

It's not hard to imagine connecting this skeleton to real module implementations to drive real task execution.

List used modules

Maybe we'd like to inspect our taskbook to find out what modules it uses. This would be useful, for example, to install necessary modules before running our tasks.

Let's add a -l / --list option to list used modules and exit without running the tasks:

... parser.add_argument("-l", "--list", action="store_true") ... if args.list: used_modules = sorted(list(set(task.module for task in taskbook.tasks))) for module in used_modules: print(module) return ...

Our taskable.py file so far:

# taskable.py import argparse from typing import Any import cattrs import yaml from attrs import define, field @define class Task: name: str module: str options: dict[str, Any] = field(factory=dict) @define class Taskbook: group: str tasks: list[Task] def main(): parser = argparse.ArgumentParser() parser.add_argument("-l", "--list", action="store_true") parser.add_argument("file", type=argparse.FileType("r")) args = parser.parse_args() data = yaml.safe_load(args.file) taskbook = cattrs.structure(data, Taskbook) if args.list: used_modules = sorted(list(set(task.module for task in taskbook.tasks))) for module in used_modules: print(module) return print("group", taskbook.group) for task in taskbook.tasks: print(f"run {task.module}: {task.name}") if __name__ == " __main__": main()

Running taskable.py with the list mode enabled:

python3 taskable.py -l taskbook.yaml

$ python3 taskable.py -l taskbook.yaml cheesy.package lettuce.service saucy.copy

Woot! Static analysis! And it was easy to implement because our data model is so well-defined.

Summary

In this tutorial, we've built up a versatile config loading mechanism.

This setup works equally well for tiny command line utilities as it does for large and complex data formats files like task workflows, specifications, and so on. You can continue growing your application by adding new fields and new data models, and avoid the malignant technical debt that springs from a muddled early config implementation.

The best part? Your configuration will be stable and serve the bedrock and foundation of your application, now and in the future. In the words of Eric S. Raymond:

Smart data structures and dumb code works a lot better than the other way around.

— Eric S. Raymond

Stay smart, people! 😄

DEV Community

Loading config files in Python

Create a command line tool

Create a taskbook file

Read in the YAML file

Create the data model

Structurize into models

Use the data

"Run" tasks

List used modules

Summary

Top comments (0)