Uncertainty distributions, parameters, and the Pedigree Matrix

Variables and formulas

Ecospold 2 supports parameterized datasets, where numeric values for exchanges and production volumes can be calculated using a chain for formulas and variables with uncertainty distributions. Formulas and variables can be present in four different places (see also the Internal data format):

  • An exchange in the list dataset['exchanges'].
  • A property of an exchange in the list dataset['exchanges'][some_index]['properties'][another_index]. Not all exchanges have properties.
  • A parameter in the list dataset['parameters']. Again, not all datasets have parameters.
  • A technosphere exchange production volume dataset['exchanges'][some_index]['production volume']. Only production exchanges (type reference product or byproduct) have production volumes.

Conventions and standards

A variable in an exchange, property, parameter, or production volume is defined with the dictionary key variable. The value for this key will be the string name of the variable, e.g. {'variable': 'some_name'}. Variable names must be valid python identifiers, so some_name instead of some name.

A formula in an exchange, property, parameter, or production volume is defined with the dictionary key formula. The value for this key will be the formula as a string, e.g. {'formula': 'some_name * 2'}.

Variables can be uncertain. If an uncertainty distribution is present in the same object as a variable, and no formula is present, then the given uncertainty is the uncertainty distribution for the variable. If a formula and an uncertainty dictionary are present, behaviour is not defined; there are multiple interpretations for this uncertainty distribution, but e.g. ecoinvent is not consistent.

The Ecospold standard places no real limits on which variables can depend on other variables, so arbitrarily complex relationships are possible.

Formulas that have division by zero errors are evaluated to be zero. However, most of these cases will be rewritten during the data cleaning step.

Evaluation of parameterized datasets

Evaluation of parameterized datasets is done with the bw2parameters library, which in turn relies on asteval.

After making changes in a parameterized dataset, you can use the following utility function to reevaluate all formula and variable values:

ocelot.transformations.parameterization.recalculation.recalculate(dataset)

Recalculate parameterized relationships within a dataset.

Modifies values in place.

Creates a TolerantParameterSet, populates it with named parameters with a dataset, and then gets the evaluation order the graph of parameter relationships. After reevaluating all named parameters, creates an Interpreter with named parameters and all of numpy in its namespace. This interpreter is used to evaluate all other formulas in the dataset.

Formulas that divide by zero are evaluated to zero.

Returns the modified dataset.

You may also be interested in this utility function for extracting parameters:

ocelot.transformations.parameterization.recalculation.extract_named_parameters(dataset)

Extract named parameters from dataset.

Each named parameter must have a name, and should have either a numeric value (amount) or a formula string. Parameters without names (variable) are not extracted, as don’t contribute to dataset recalculation; they only get updated afterwards.

Returns a dictionary with form: {'name': {'amount': number, 'formula': string}}.

Implicit references

To make things extra spicy, some variables can be implicit, and instead of being given a name, they are referred to by the id of their containing reference element. So, the formula Ref('aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee') means get the numeric value (amount) of the exchange whose id is aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee, and substitute in that amount. Datasets with these implicit variables only occur four times in ecoinvent 3.2 and three times in ecoinvent 3.3. Implicit variables can have three forms:

  • Ref('some id'): Get amount value for exchange or parameter with id some id.
  • Ref('some id', 'ProductionVolume'): Get production volume for exchange with id some id.
  • Ref('some id', 'some other id'): Get amount for property with id some other id in exchange some id. This isn’t used in ecoinvent 3.2 or 3.3, and isn’t supported in the current version of Ocelot.

A cleanup function will replace these implicit relationships with named variables.

ocelot.transformations.parameterization.implicit_references.replace_implicit_references(data)

Replace Ref( with actual variables.

Uses existing variables if possible, or else creates new variables in the elements that are referred to.

Generic transformations for parameters and formulas

After replacing implicit references (see above), we manually fix a couple of known problems in certain formula strings, such as numbers with leading zeros that are not understand by Python.

ocelot.transformations.parameterization.known_ecoinvent_issues.fix_known_bad_formula_strings(data)

Change certain known bad text elements in formulas.

The Ecospold 2 formula syntax is similar to Python in some ways, but we still need to use several functions to get formulas that Python can understand. Ocelot is still not 100% compatible with the entire Ecospold 2 formula spec.

ocelot.transformations.parameterization.python_compatibility.lowercase_all_parameters(data)

Convert all formulas and parameters to lower case.

Ecoinvent formulas and variables names are case-insensitive, and often provided in many variants, e.g. clinker_PV and clinker_pv. There are too many of these to fix manually, so we use a sledgehammer approach to guarantee consistency within datasets.

ocelot.transformations.parameterization.python_compatibility.fix_math_formulas(data)

Fix some special cases in formulas needed for correct parsing.

ocelot.transformations.parameterization.python_compatibility.replace_reserved_words(data)

Replace python reserved words in variable names and formulas.

For variable names, this is relatively simple - we just and see of the variable name is a python reserved word. For formulas, we use the check_and_fix_formula function.

Changes datasets in place.

Finally, in cases where we can’t fix problems with formulas, we remove them from the dataset.

ocelot.transformations.parameterization.python_compatibility.delete_unparsable_formulas(data)

Uses AST parser to find unparsable formulas, which are deleted

Production volumes

Production volumes are specified for exchanges which produce reference product and allocatable byproduct flows. These volumes are used only to calculate the contribution of different transforming activities to markets. As such, production volumes are fixed during the evaluation of a system model in Ocelot. In order to stop an evaluation of the datasets formulas and variables from changing the value of the production volume, we move all such parameterization information to a new parameter, outside of the production volume definition.

ocelot.transformations.parameterization.production_volumes.create_pv_parameters(dataset)

Remove all production volume parameterization.

Production volumes are fixed, like reference production exchange amounts. This function will do one of three things:

  1. If there is no formula or variable in the production volume, do nothing.
  2. If there is only a formula, delete the formula.
  3. If there is a variable, move the variable to a new parameter.

Uncertainty distributions

Each uncertainty distribution in Ocelot is parsed and manipulated using a specific class. However, most of the time it is more convenient to use one of the following generic functions which are not distribution-specific:

ocelot.transformations.uncertainty.scale_exchange(exchange, factor)

Scale an exchange and its uncertainty by a constant numeric factor.

Modifies the exchange in place. Returns the modified exchange.

ocelot.transformations.uncertainty.adjust_pedigree_matrix_time(ds, exchange, year)

As each uncertainty distribution class provides the same API, you can also use the get_uncertainty_class function to get the correct distribution for an exchange, and then call a class method, e.g. for any exchange exc:

exc = get_uncertainty_class(exc).repair(exc)

Note that this also works on exchanges which don’t have an uncertainty dictionary - the NoUncertainty class will still do the right thing (which is normally nothing :).

Uncertainty distribution classes

class ocelot.transformations.uncertainty.distributions.NoUncertainty
static recalculate(obj)

Adjusting pedigree matrix values for no uncertainty has no effect

static repair(obj)

No-op for no uncertainty

static rescale(obj, factor)

Rescale uncertainty distribution by a numeric factor

classmethod sample(obj, size=1)

Draw size samples from this uncertainty distribution

classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

class ocelot.transformations.uncertainty.distributions.Undefined

Undefined uncertainty distribution.

This distribution has an uncertainty dictionary, include minimum and maximum values. However, as there is no given way to understand these values, they are not checked or used in Ocelot.

distribution

alias of UndefinedUncertainty

static rescale(obj, factor)

Rescale uncertainty distribution by a numeric factor

classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

class ocelot.transformations.uncertainty.distributions.Lognormal

Lognormal distribution, defined by the mean (\(\mu\), called mu) and variance (\(\sigma^{2}\), called variance) of the distribution’s natural logarithm.

static recalculate(obj)

Recalculate uncertainty values based on new pedigree matrix values

static repair(obj, fix_extremes=True)

Fix some common failures in lognormal distributions.

obj is an object with a lognormal uncertainty distribution.

If fix_extremes, will adjust variance values which are almost physically impossible.

  • If mean is negative, set to positive, and add negative = True.
  • Make mean the same as amount, and set mu to log(amount)
  • Resolve any conflicts between variance and variance with pedigree matrix by preferring values in variance with pedigree uncertainty and pedigree matrix.
  • If fix_extremes, adjust clearly wrong uncertainties, using arbitrary rules I just made up:
    • If 1 < = variance <= e, then the variance is set to ln(variance).
    • If the variance is greater than e, then the variance is set to 0.25.
static rescale(obj, factor)

Rescale uncertainty distribution by a numeric factor

classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

As negative lognormal distributions are not defined using the normal distribution functions, this method sets a negative flag. stats_arrays will adjust any results to have the correct sign.

Uses the standard deviation instead of the variance for compatibility with scipy and numpy.

class ocelot.transformations.uncertainty.distributions.Normal

Normal distribution, defined by mean and variance.

static recalculate(obj)

TODO: This is currently not functioning correctly.

Use new pedigree matrix values to adjust the variance based on The application of the pedigree approach to the distributions foreseen in ecoinvent v3 by Müller, et al.

Adjusting the pedigree matrix for the normal distribution should lead to the same change in coefficient of determination as it would for the lognormal distribution.

For the lognormal distribution, the coefficient of determination is defined by:

\[CV = \sqrt{e^{\sigma^{2}} - 1}\]

For the normal distribution, the coefficient of determination is simply \(\sigma / \mu\). Additionally, we note that:

  • Recalculating the pedigree matrix should not change the mean, i.e. \(\mu\).
  • The pedigree matrix factors operate directly on the variance of the lognormal, so no manipulation is needed on that score.

So, our calculation algorithm is:

  1. Find the different in variance if the recalculation was applied to the lognormal distribution
  2. Find the relative change in coefficient of determination
  3. Calculate the new variance with pedigree matrix
\[CV_{ratio} = \frac{\sqrt{e^{\sigma_{pm}^{2}} - 1}}{\sqrt{e^{\sigma_{without-pm}^{2}} - 1}}\]\[\sigma_{without-pm} = 0\]\[CV_{ratio} = \sqrt{e^{\sigma_{pm}^{2}} - 1}\]\[\frac{\sigma_{new}}{\mu_{new}} = \frac{\sigma_{old}}{\mu_{old}} CV_{ratio}\]\[\mu_{new} = \mu_{old}\]\[\sigma_{new}^{2} = \sigma_{old}^{2} ( e^{\sigma_{pm}^{2}} - 1 )\]
static repair(obj)

Fix some common failures in normal distributions.

obj is an object with a normal uncertainty distribution.

  • Make mean the same as amount
  • Resolve any conflicts between variance and variance with pedigree matrix by preferring values in variance with pedigree uncertainty and pedigree matrix
static rescale(obj, factor)

Rescale uncertainty distribution by a numeric factor.

Following Müller et al, rescaling should preserve the coefficient of determination, i.e. \(\sigma / \mu\). We are given the original variance, \(\sigma^{2}\). Therefore, we can find the new variance using:

\[\frac{\sigma_{old}}{\mu_{old}} = \frac{\sigma_{new}}{\mu_{new}}\]\[\frac{\sigma_{old}^{2}}{\mu_{old}^{2}} = \frac{\sigma_{new}^{2}}{\mu_{new}^{2}}\]\[\sigma_{new}^{2} = \frac{\mu_{new}^{2}}{\mu_{old}^{2}} \sigma_{old}^{2}\]
classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

Uses standard deviation instead of variance for compatibility with scipy and numpy.

class ocelot.transformations.uncertainty.distributions.Triangular

Triangular distribution, defined by minimum, mode, and maximum.

static recalculate(obj)

This is currently a no-op, as pedigree matrices are not used for this distribution. However, it would be nice to have it in the future for completeness.

static repair(obj)

Make sure the provided values are a valid triangular distribution.

  • Set mode to amount.
  • Erases uncertainty if minimum == maximum == mode.
  • Flips minimum and maximum if necessary.
  • Raises ValueError if mode is outside (minimum, maximum)
static rescale(obj, factor)

Rescale the exchange by a constant numeric factor.

classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

class ocelot.transformations.uncertainty.distributions.Uniform

Uniform distribution, defined by minimum and maximum.

distribution

alias of UniformUncertainty

static recalculate(obj)

This is currently a no-op, as pedigree matrices are not used for this distribution. However, it would be nice to have it in the future for completeness.

static repair(obj)

Make sure the provided values are a valid uniform distribution.

  • Erases uncertainty if minimum == maximum == amount.
  • Flips minimum and maximum if necessary.
  • Raises ValueError if mode is outside (minimum, maximum)
  • If amount if not close to halfway between minimum and maximum, change to triangular distribution.
static rescale(obj, factor)

Rescale the exchange by a constant numeric factor.

classmethod to_stats_arrays(obj)

Returns a stats_arrays compatible dictionary.

Pedigree Matrix

Pedigree matrices are stored as dictionaries (see the data format). Currently, Ocelot only adjust the temporal correlation to adjust datasets to the reference year, but other adjustments are possible.

To adjust uncertainty values for a new pedigree matrix, call the method recalculate for the correct uncertainty distribution, i.e. one of the following:

# Works always
get_uncertainty_class(exc).recalculate(exc)
# If you know the specific distribution
Lognormal.recalculate(exc)

To adjust the pedigree matrix value for temporal correlation to a given reference year, use the following utility function (which will already recalculate the uncertainty values):

ocelot.transformations.uncertainty.adjust_pedigree_matrix_time(ds, exchange, year)

Ocelot includes pedigree matrix values for the original pedigree matrix from ecoinvent 2, as well as the revised values from Empirically based uncertainty factors for the pedigree matrix in ecoinvent. However, there is not yet an API to use these updated factors.