Features

In this section, we’ll talk briefly about the features of CFG and how they meet the requirements listed in the section entitled What, another new configuration format?.

High-Level Description

CFG is a superset of JSON, which means that a legal JSON configuration can serve unchanged as a CFG configuration. This meets the following requirements:

  • Supports a hierarchical structure with no arbitrary limit on depth of nesting.
  • Legal JSON is accepted, aiding transition from JSON to CFG.
  • The full range of Unicode characters can be used.
  • Order independence, except in lists.

The file naming convention is to use the .cfg extension for files containing CFG.

More details are provided in the sections below.

Elements

A CFG configuration consists of a number of types of elements:

  • Mappings: these map strings to other elements, like a Python dictionary. The root (starting point) of a configuration is always a mapping.
  • Lists: these hold arbitrary elements and are heterogeneous (i.e. there is no need for all elements in a list to be of the same type).
  • Scalar values: these fall into one or more of the following categories:
    • Strings
    • Identifiers
    • Integers
    • Floating-point numbers
    • Complex numbers
    • Boolean values
    • Null
  • Includes: these allow configurations to contain other configurations.
  • References: these allow referring to parts of a configuration from another part.
  • Special values: these represent values available to the application using the configuration, such as environment variables and internal program values.
  • Expressions: these allow combining parts of a configuration using other parts of it.
  • Comments: these allow a configuration to be documented.

These elements are described in separate sections below.

Example

The following example illustrates a simple configuration, showing instances of all the elements described above:

# You can have comments anywhere in a configuration. Only line comments are
# supported, as you can easily comment and uncomment multiple lines using
# a modern editor or IDE.
{
  # You can have standard JSON-like key-value mapping.
  "writer": "Oscar Fingal O'Flahertie Wills Wilde",
  # But also use single-quotes for keys and values.
  'a dimension': 'length: 5"',
  # You can use identifiers for the keys.
  string_value: 'a string value',
  integer_value: 3,
  # you can use = instead of : as a key-value separator
  float_value = 2.71828,
  # these values are just like in JSON
  boolean_value: true,
  opposite_boolean_value: false,
  null_value: null
  list_value: [
    123,
    4.5     # note the absence of a comma - a newline acts as a separator, too.
    2j,     # a complex number with just an imaginary part
    1 + 3j  # another one with both real and imaginary parts
    [
      1,
      'A',
      2,
      'b',  # note the trailing comma - doesn't cause errors
    ]
  ]  # a comma isn't needed here.
  nested_mapping: {
    integer_as_hex: 0x123
    float_value: .14159,  # note the trailing comma - doesn't cause errors
  } # no comma needed here either.
  # You can use escape sequences in strings ...
  snowman_escaped: '\u2603'
  # or not, and use e.g. utf-8 encoding.
  snowman_unescaped: '☃'
  # You can refer to code points outside the Basic Multilingual Plane
  face_with_tears_of_joy: '\U0001F602'
  unescaped_face_with_tears_of_joy: '😂'
  # Include sub-configurations.
  logging: @'logging.cfg',
  # Refer to other values in this configuration.
  refer_1: ${string_value},                  # -> 'a string value'
  refer_2: ${list_value[1]},                 # -> 4.5
  refer_3: ${nested_mapping.float_value},    # -> 0.14159
  # Special values are implementation-dependent. On Python, for example:
  s_val_1: `sys:stderr`,                     # -> module attribute sys.stderr
  s_val_2: `$LANG|en_GB.UTF-8`               # -> environment var with default
  s_val_3: `2019-03-28T23:27:04.314159`      # -> date/time value
  # Expressions.
  # N.B. backslash immediately followed by newline is seen as a continuation:
  pi_approx: ${integer_value} + \
             ${nested_mapping.float_value}   # -> 3.14159
  sept_et_demi: ${integer_value} + \
                ${list_value[1]}             # -> 7.5
}

The individual elements in the example are discussed in more detail below.

Mappings

Mappings are like Python dictionaries or JavaScript objects – they map keys to values. Keys must be string values, but they can be given either as string literals or as identifiers (starting with a letter or underscore, and followed by any number of alphanumeric characters or underscores). Values can be any element type.

A configuration loaded by a program must be a mapping or a mapping body – i.e. a mapping without the { and } characters which normally bracket it. Thus, a configuration could be expressed as either

{
  foo: 'bar',
  bar: 'baz',
}

or equivalently as

foo = 'bar'
bar = 'baz'

Note that you can use either a colon : or an equals sign = to separate key from value. This can help when migrating from other configuration formats which use = as a key/value separator.

Either commas or newlines can be used to separate elements in a mapping. You can use multiple newlines between two elements, but not multiple commas. A trailing comma is allowed, and ignored.

Keys and Values

Mapping keys are always considered as literal strings, whether they are present in the source as identifiers or string literals. Thus, the following:

foo = 'bar'
bar = 'baz'

is the same as:

'foo' = 'bar'
'bar' = 'baz'

However, identifiers seen in mapping values are not necessarily treated as literal strings. In the following:

foo = 'bar'
bar = baz

the value of bar could be interpreted as either the string value baz, or the result of looking up a value in some context using baz as the key. This allows us to consider passing in a context mapping where certain specific items will be looked up based on keys. This makes baz effectively a variable in the above configuration, where the variable’s value is determined by the software that uses the configuration.

Here’s an example. Let the configuration be:

test0b.cfg
foo: fizz
bar: buzz

You can then pass in a variables context when the Config instance is initialized:

>>> variables = {'fizz': 'Fizz Fizz', 'buzz': 'Buzz Buzz'}
>>> with open('test0b.cfg') as f: cfg = config.Config(f, context=variables)
...
>>> cfg['foo']
'Fizz Fizz'
>>> cfg['bar']
'Buzz Buzz'

In cfg['foo'], the configuration was queried to get the identifier fizz, which, as it’s not a literal string and a context was provided, was used as a lookup key in the context to get the final result, 'Fizz Fizz'.

Variables can be used to provide application specific information which is then usable in the configuration. An example of such information would be the current user’s home directory. The following configuration, for example:

test0c.cfg
bin: home + '/bin'
lib: home + '/lib'

could be used like this:

>>> import os
>>> variables = {'home': os.path.expanduser('~')}
>>> with open('test0c.cfg') as f: cfg = config.Config(f, context=variables)
...
>>> cfg['bin'] == os.path.expanduser('~/bin')
True
>>> cfg['lib'] == os.path.expanduser('~/lib')
True

Of course, identifiers in values can be configured to be treated as literal strings, but this is less useful than the recommended convention – which is to use identifier for keys wherever possible, and use literal strings for string values.

Paths

Along with mappings, you have the concept of a path which is used to access something in the mapping. The simplest path is just a key which allows you to access the corresponding value, but you can also extend the path to refer to elements in nested mappings and/or lists.

A path consists of a sequence of segments, starting with an identifier. The following are all syntactically valid paths:

  • first_part: this path has just one segment; it’s the key into the mapping.
  • first_part.second_part: this is semantically valid if the first_part key’s value is a mapping, and second_part is a key in that mapping. The result is the value associated with the second_part key.
  • first_part['second_part']: this is equivalent to the path just above.
  • first_part[2]: this is semantically valid if first_part refers to a list which has at least three elements, and the result is the third element.
  • first_part[0].second_part['foo'].third_path: this would traverse to first_part, which should be a list, then to its first element (i.e. at index 0), which should be a mapping, then get the value there at key 'second_part', which should also be a mapping, then get the value there at key 'foo', which should be a mapping, and then fetch the value at key 'third_path' as the final result.

Note

Paths start with identifiers, which means that to use them, you generally need to arrange your root configurations with keys that are identifiers, rather than say, keys which can’t be represented by identifiers. In the following example

identifier_key: {
  sub_key: {
    sub_sub_key: 'foo'
  }
},
'hyphenated-key': {
  sub_key: {
    sub_sub_key: 'bar'
  }
}

you can access the 'foo' value via path identifier_key.sub_key.sub_sub_key (and other equivalent forms) but you can’t do that for the bar value – you have to use a slightly more verbose form.

>>> cfg['identifier_key.sub_key.sub_sub_key']
'foo'
>>> cfg['hyphenated-key']['sub_key']['sub_sub_key']
'bar'
Slices

As well as integer indices into lists, CFG also supports the concept of slices. A slice index is written like this: first_part[start:stop:step] where first_part must be a list and start, stop and step must either be absent, or else be integer values (Python developers should be familiar with slices). When applied to a list, the slice index produces another list (the result) whose first element is the start-th element of the source list, whose last element is just before the stop-th element of the source list, and every step-th element between start and stop is picked from the source list into the result. If start is omitted, it’s taken to mean the start of the source list. If end is omitted, it’s taken to mean the end of the source list. If step is omitted, it is taken as the value 1, meaning every element is taken between start and stop from the source list. Negative values for start and stop mean they are computed from the end of the source list. A negative value for step means count backwards: in this case, start should be greater than stop, whereas normally start is expected to be smaller than stop.

Examples of slices are as follows, assuming foo is the list ['a', 'b', 'c', 'd', 'e', 'f', 'g']:

Path expression Result
foo[:] ['a', 'b', 'c', 'd', 'e', 'f', 'g']
foo[::] ['a', 'b', 'c', 'd', 'e', 'f', 'g']
foo[:20] ['a', 'b', 'c', 'd', 'e', 'f', 'g']
foo[-20:4] ['a', 'b', 'c', 'd']
foo[2:] ['c', 'd', 'e', 'f', 'g']
foo[-3:] ['e', 'f', 'g']
foo[-2:2:-1] ['f', 'e', 'd']
foo[::-1] ['g', 'f', 'e', 'd', 'c', 'b', 'a']
foo[2:-2:2] ['c', 'e']
foo[::2] ['a', 'c', 'e', 'g']
foo[::3] ['a', 'd', 'g']
Invalid paths

The following are examples of paths which would be invalid:

  • foo[]: this path does not specify an index into foo, so it can’t be used.
  • foo[1, 2]: since 2-dimensional arrays aren’t supported, 1, 2 is not a usable index into foo.
  • foo.: since there is nothing following the dot, it’s not clear how to access any value beyond foo.
  • foo.123: since there is not an identifier that follows the dot, it’s not clear how to access any value beyond foo.
  • foo[1] bar: There is extraneous text bar following a valid path, which makes the path as a whole invalid.
  • foo[:::]: There is an extraneous : in the slice.

Paths are used both in the Application Programming Interfaces and in the configuration itself, via References.

Lists

CFG lists are like Python lists or JavaScript arrays. They are inherently ordered, and also heterogeneous (i.e. elements in a list don’t all have to be of the same type).

Either commas or newlines can be used to separate elements in a list. You can use multiple newlines between two elements, but not multiple commas. A trailing comma is allowed, and ignored. A comma followed by newlines is accepted.

In paths, lists can be indexed by integers or by slices. See the section on slices for more information.

Scalar Values

If a configuration is like a tree, then scalar values are leaves of the tree. The following subsections describe the different kinds of scalar values you can have.

Strings

Strings can be single- or double-quoted, and can span multiple lines using triple-quoted forms, as in Python:

{
  strings: [
    "Oscar Fingal O'Flahertie Wills Wilde"
    'size: 5"'
    """Triple quoted form
  can span
  'multiple' lines"""
   '''with "either"
  kind of 'quote' embedded within'''
  ]
}

In the triple-quoted forms, newlines are other whitespace are preserved exactly in the source.

Unicode can be entered using escape sequences, just as in JSON:

snowman_escaped: '\u2603'

but also using literal Unicode, as in the following example:

snowman_unescaped: '☃'

If you do this, you should encode the configuration file using UTF-8, as this is the default encoding used to read .cfg files.

You can include characters outside the Basic Multilingual Plane:

face_with_tears_of_joy: '\U0001F602'
unescaped_face_with_tears_of_joy: '😂'

Again, if you use the latter form, make sure to encode the configuration file using UTF-8.

Note

There is no currently no provision, as there is in some formats, for raw strings – i.e. strings which don’t use escapes. In such strings, backslashes are treated literally: for example, 'c:\Users\Me' would be valid for a Windows path. In CFG, you would need to write this as 'c:\\Users\\Me'. The need to provide unescaped strings is generally in two areas – Windows paths and regular expressions.

Identifiers

The CFG format allows you to specify identifiers as values. How they are interpreted is implementation specific:

  • They could be interpreted as literal strings – the identifier foo is interpreted as the literal string 'foo'.
  • They could raise errors if used.
  • They could be used as keys to lookup values in some context.

Identifiers can be used in paths:

some_stuff: {
  foo: 'a value',
  bar: 'another value'
}
ref_foo_1: ${some_stuff.foo}     # -> 'a value'
ref_foo_2: ${some_stuff['foo']}  # behaves the same as the line above
ref_foo_3: ${some_stuff[foo]}    # behaviour is implementation-dependent.

When the reference at ref_foo_3 comes across the identifier foo, then what happens is determined by which of the three methods above is used to resolve identifiers:

  • In the first case, the value of the reference is the same as the values of ref_foo_1 and ref_foo_2.
  • In the second case, an error will be raised.
  • In the third case, the identifier foo is used as a lookup key in some implementation-defined context, and the resulting value is the value of the reference.

See the section entitled Application Programming Interfaces for more information on how different implementations deal with identifiers.

Integers

You can specify integer values using a range of notations:

decimal_integer = 123
hexadecimal_integer = 0x123
octal_integer = 0o123  # Python-style. C-style octal literals not supported!
binary_integer = 0b000100100011

You can also use underscores in numbers to improve readability. Numbers may contain single underscores as separators between digits. They should not end in an underscore (they can’t start with one, as that would be interpreted as an identifier). Thus, the following forms are allowed:

decimal_integer = 1234_5678
hexadecimal_integer = 0x789A_BCDE_F012
octal_integer = 0o123_321
binary_integer = 0b0001_0010_0011

Floating-point numbers

Floating-point values can be expressed in a number of ways:

common_or_garden = 123.456
leading_zero_not_needed = .123
trailing_zero_not_needed = 123.
scientific_large = 1.e6
scientific_small = .1e-6
negated = -.1e-6

The precision with which floating-point values are stored internally is implementation-specific. Current implementations support 64-bit precision (sometimes called double precision).

You can also use single underscores as digit separators for readability, as for integers. For example:

common_or_garden = 123_456.78_90
leading_zero_not_needed = .12_3_4
trailing_zero_not_needed = 1_2_3.
scientific_large = 1_0.e6_2
scientific_small = .1_0e-6_0

Complex numbers

Complex numbers are represented by an integer or floating-point value immediately followed by a j. There must be no space between the number value and the j. This represents just the imaginary part of the number – you can make complex values with both real and imaginary parts using a form such as 1 + 2j (see the section on Expressions, below.)

Boolean values

The Boolean values are represented in CFG using false and true, just as in JSON.

Null

The null value is represented in CFG using null, just as in JSON. Note that in implementations which don’t support null (such as Rust), the value will be a specific one representing a null value. In Python, the corresponding value is None.

Includes

Includes are a mechanism for breaking up a configuration into smaller sub-configurations. While not needed at all for small configurations, they can be very useful when a configuration gets large, or for sharing common elements of configuration across related projects. This has some other benefits:

  • The responsibility for maintaining the configuration as a whole could be shared between different people.
  • Parts of the configuration could be put under change control and digitally signed to check against changes/tampering.
  • An include is described by the unary @ operator: the operand should resolve to a literal string, which is interpreted as the location of the included configuration. The literal string can be interpreted in a way which is implementation-specific:
    • It could be parsed as a URL and fetched from a local or remote location in a standardised way.
    • If not a URL, it could be interpreted as a filename, and if a relative filename, it could be looked for relative to the directory of the including configuration (if known).

See the section on Application Programming Interfaces more information.

References

References allow a part of the configuration to refer to another part. This is very useful to avoid unnecessary repetition.

They have the syntax ${ ... } where the thing between the curly brackets is a path (see the Paths section).

References can refer to things in sub-configurations that they include, but they cannot refer to anything in “parent” configurations that include them. That’s because multiple places might point to a particular sub-configuration.

So for example, if we have

# webapp.cfg
# ... stuff
routes: @'routes.cfg',
# ... more stuff

and

# routes.cfg
# ... stuff
admin_routes: [
  # ...
],
# ... more stuff

then elements in webapp.cfg could refer to e.g. ${routes.admin_routes[0]}. However, if there is a main.cfg which contains a

webapp: @'webapp.cfg'

then nothing in webapp.cfg can refer to anything in main.cfg. Of course, elements in main.cfg can refer to elements in webapp.cfg using a path starting with webapp. as well as to elements in routes.cfg using a path starting with webapp.routes. – and so on for further levels of nesting.

See also the section on Expressions, which often use references.

Special values

Special values allow a configuration to specify values which are available to the program using the configuration, but are not necessarily stored in the configuration itself. The most common example of this is probably environment variables.

Special values are indicated using a special type of string notation: instead of using single or double quotes, backtick characters (`) are used to delimit special values. A special value can contain any character other than a backtick or non-printable character, and the interpretation of special values is entirely up to the program using the configuration.

Examples:

  • Programs in multiple languages could interpret a special value string such as `$VARNAME|default_value` to be the value of the environment variable VARNAME, but which returns the default_value if VARNAME isn’t set in the environment.
  • A Python program could interpret a special value string such as `logging:DEFAULT_TCP_LOGGING_PORT` by using the part before the : as a module to import and the part after the : as an attribute of the module, and resolve to the actual port number as defined in that module.
  • A .NET program could interpret a special value string such as `A.B.C,D.E.F:G` such as the value of the static field or property G in a class with fully-qualified name D.E.F found in an assembly with name A.B.C.
  • A Kotlin or Java program could interpret a special value string such as `A.B.C:d` as the value of the static field d in a class on the classpath with fully-qualified name A.B.C, or the return value of a static method d in that class which takes no arguments.

Platforms which don’t offer powerful run-time reflection facilities (such as Go, Rust and D) can’t take advantage of special strings in the way that Python, Kotlin/Java or .NET can. However, there is a set of special values which are available across platforms, as described below.

Special values – cross-platform

The following special string formats are used by convention across current implementations:

  • `YYYY-mm-ddTHH:mm:ss.NNNNNN+HH:mm:ss.nnnnnn` – this is an ISO-like format which allows precise description of a datetime including a timezone offset. Points to note:

    • The date/time separator can be a space instead of a T.
    • The timezone offset can be a - instead of a +.
    • The .NNNNNN and .nnnnnn values are microseconds (expressed as fractions of a second) and can be omitted or less precise, but not more precise than six digits.
    • The time must be provided to a precision of at least seconds, though the timezone offset need only be precise up to hours and minutes.

    This format is converted to a date-time object appropriate to the platform.

    Note

    Some platforms do not support timezone offsets to high resolution:

    • Kotlin/Java, Rust, Go – in the respective standard libraries, timezone offsets in date/times are only accurate to the nearest second – fractional seconds aren’t available, and if they are present in the CFG source, any fractional value is truncated to zero.
    • D – in the D standard library, timeone offsets are only accurate to the nearest minute – fractional minutes aren’t available, and if they are present in the CFG source, any such fractional value is truncated to zero.
  • `$VARNAME|default` — this is a format to access environment variables with an optional default value specifiable if the environment variable doesn’t exist. Points to note:

    • The pipe character and default can be omitted, or the pipe character provided by itself, making the default value the empty string.
    • If no default is provided and the environment variable doesn’t exist, a suitable platform-specific value is provided (e.g. None for Python, null for Kotlin / Java, .NET and D, nil for Go and a std::Option None value for Rust).

Special values – Python

In addition to the formats that work across implementations, described above, below are the formats specific to the Python implementation:

  • `logging.handlers:` – this is an example of a format to access a Python module. It’s a specific case of the more general format which follows.
  • `logging.handlers:SysLogHandler.LOG_EMERG` – this is an example of a format to access a value inside a Python module.
  • `logging.handlers.SysLogHandler.LOG_EMERG` – an older variant of the above, which is only provided for backward compatibility. The form with a colon is preferred for new projects, as it is more computationally efficient – there’s no need to guess where the module stops and an object within it starts.

Special values – Kotlin / Java

In addition to the formats that work across implementations, described above, below are the formats specific to the Kotlin/Java implementation:

  • `A.B.C.D` where A.B.C.D is a fully-qualified class name for some class found on the classpath. The returned value is the class. An example of this would be `java.io.File`.
  • `A.B.C.D:e` where A.B.C.D is a fully-qualified class name for some class found on the classpath, and e is either a public static field of that class, or a public static method which takes no parameters. Examples of these are `java.lang.System:out` (public static field) or `java.time.LocalDate:now` (public static method with no parameters). The returned value is either the value of the field or the return value of the method.

Special values – .NET

In addition to the formats that work across implementations, described above, below are the formats specific to the .NET implementation:

  • `asmname,A.B.C.D` where asmname is the name of a .NET assembly available on the path, and A.B.C.D is a fully-qualified class name for some class found in that assembly. The returned value is the class. An example of this would be `mscorlib,System.IO.FileAccess`.
  • `asmname,A.B.C.D:e` where asmname is the name of a .NET assembly available on the path, and A.B.C.D is a fully-qualified class name for some class found in that assembly, and e is either a public static field or property of that class, or a public static method of that class which takes no parameters. Examples of these would be `mscorlib,System.IO.FileAccess:ReadWrite` (public field of an enum) or `mscorlib,System.DateTime:Today` (public static method with no parameters). The returned value is either the value of the field or the return value of the method.

Special values – JavaScript

In addition to the formats that work across implementations, described above, below are the formats specific to the JavaScript implementation:

  • `A.B.C.D` where A.B.C.D is a path to an object accessed via the JavaScript global object (globalThis). That object is searched for A, which is then searched for B, etc. If at any point the search fails, the special value conversion fails as a whole.

Expressions

Expressions are used to compose values from other values. In the following example,

base_path: '/var/www/foo'
html_path: ${base_path} + '/static/html'
css_path: ${base_path} + '/static/css'
js_path: ${base_path} + '/static/js'

if the base_path changes, the change only needs to be made in one place.

Operators

Expressions make use of a number of operators applied to AST nodes:

  • A + B is used to:
    • Add numbers
    • Concatenate strings and lists
    • Deep-merge mappings (in which case, values in B override those in A, with nested mappings being recursively merged).
  • A - B is used to:
    • Subtract numbers
    • Make a copy of a mapping A excluding the keys in B.
  • A * B is used to multiply numbers.
  • A / B is used to divide numbers.
  • A % B is used to compute the modulo of one number with respect to another.
  • A | B is used to compute a bitwise-OR of integers.
  • A & B is used to compute a bitwise-AND of integers.
  • A ^ B is used to compute a bitwise-XOR of integers.
  • A << B is used to left-shift one integer by another.
  • A >> B is used to right-shift one integer by another.
  • A ** B is used to raise A to the power of B.
  • A or B is used to do short-circuit Boolean evaluation of A or B. You can also use A || B for this.
  • A and B is used to do short-circuit Boolean evaluation of A and B. You can also use A && B for this.

The above are binary operators, but there are also some unary operators:

  • - A is used to negate numbers in a numerical sense.
  • ~ A is used to compute a bitwise complement of an integer.
  • not A is used to do logical negation of A. You can also use the !A notation for this.

Operators follow standard precedence rules. You can use parentheses to force a particular order of evaluation.

Continuation lines

Because newlines are used to delineate elements in mappings and lists, they cannot normally be used to break up long lines in the middle of a list or mapping, where newlines aren’t expected – e.g. in an expression value. Consider the lines

key1 = ${ref1} + ${ref2}
key2 = ${ref3} + ${ref4} + ${ref5} + ${ref6} ... + ${ref10}

where the second line is quite long. If you want to break the line after say the ${ref4}, then the parser wouldn’t automatically know that the line needed to be continued, unless there’s a way of indicating this. There is such a way – you indicate a continuation by specifying a backslash immediately followed by a newline (i.e. with no intervening whitespace). When the tokenizer sees such a combination, it swallows both characters and acts as neither one had been there – treating the following line as a continuation of the previous line. Thus the key2 line could be written as

key2 = ${ref3} + ${ref4} \
       + ${ref5} + ${ref6} + \
       ... + ${ref10}

thereby making the configuration more readable.