Core elements

Ocelot workflow

Running Ocelot usually involves the following steps:

  • Get the path of an undefined database in ecospold2 format on your local computer.
  • Decide on a system model configuration to transform the undefined datasets to a linked database. This configuration could be a list of Python functions, or could be the default Ocelot system model.
  • Call the system_model function, either directly or through the command line application. system_model takes the directory path from step one and the configuration from step two as inputs.
  • Look through the HTML report generated by the system model function, and either accept the given linked database, or make changes in your configuration definitions or transformation functions.

Configuration & system_model

An Ocelot system model configuration is essentially just a list of transformation functions which, when applied in order, produce a realization of a linked database. Configurations are currently specified in Python code, but in the future will also be able to be defined in other formats such as Excel.

We are actively exploring various ways of defining these configurations. The built-in configurations will be provided as a list of transformation functions already in Ocelot, perhaps wrapped in a configuration object. Another simple configuration format would be a text file, where each line was the name of a transformation function that could be imported from ocelot.transformations. However, this doesn’t work well for user-defined functions, nor if you need to prepare functions by e.g. currying them. We are also looking at several configuration libraries, but haven’t found anything that seems to fit our mental models or use cases well:

So far no final decisions have been made, and things here will evolve along with the Ocelot codebase.

Running Ocelot without specifying a configuration will use the default configuration, which is the cutoff system model.

A typical system model may have many transformation functions, as each function should do exactly one specific change. To make configurations more readable, you can use a Collection object to group transformation functions that are commonly used together, or that form one unit of work.

class ocelot.Collection(name, *functions)

A collection of transformation functions is correctly unwrapped by a system_model.

Useful to quickly specify a list of commonly-grouped functions (e.g. ecospold common data cleanup, economic allocation).

Instantiate a Collection with a name, and the desired transformation functions: Collection("some name", do_something, do_something_else).

The system_model function is actually quite simple:

model.system_model(data_path, config=None, show=False, use_cache=True, save_strategy=None)

A system model is a set of assumptions and modeling choices that define how to take a list of unlinked and unallocated datasets, and transform these datasets into a new list of datasets which are linked and each have a single reference product.

The system model itself is a list of functions. The definition of this list - which functions are included, and in which order - is defined by the input parameter config, which can be a list of functions or a Configuration & system_model object. The system_model does the following:

  • Extract data from the input data sources
  • Initialize a Logging object
  • Then, for each transformation function in the provided configuration:
    • Log the transformation function start
    • Apply the transformation function
    • Save the intermediate data state
    • Log the transformation function end
  • Finally, write a report.

Can be interrupted with CTRL-C. Interrupting will delete the partially completed report.

Returns:

  • An OutputDir object which tells you where the report was generated
  • The final version of the data in a list

Transformation functions

Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, but most are simple functions.

The report generator will use information about each transformation function when creating the report. Specifically, the report generator will look at the function name, its docstring (a text description of what the function does, included in the function code), and any additional tabular data your provide during the function call.

Logging

Ocelot uses standard python logging, with a custom formatter that encodes log messages to JSON dictionaries. Due to this custom formatter, the ocelot logger must be retrieved in each file which uses logging:

import logging

logger = logging.getLogger('ocelot')

def my_transformation(data):
    logger.info({"message": "something", "count": len(data)})

Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The message format for the log written to disk (i.e. with each line JSON encoded) is documented in Logging format.

Note

time is added automatically to each log message.

Reports

In the last step in the workflow, the model run log data is formatted into an HTML report.

class ocelot.HTMLReport(filepath, show=False)

Generate an HTML report from a logfile.

Reports are generated in the same directory as the logfile.

Takes the log filepath as input variable filepath. A second optional input, show, will open the generated report in a new webbrowser tab if True.