Regression Test Manager

The test manager for PFLOTRAN is a python program that is responsible for reading a configuration file, identifying the tests declared in the file, running PFLOTRAN on the appropriate input files, and then comparing the results to a known gold standard output file.

Running the Test Manager

The test manager can be run in two ways, either as part of the build system using make or manually.

There are two options for calling the test manager through make: make check and make test. The check target runs a small set of tests that verify that PFLOTRAN is built and running on a given system. This would be run by user to verify that their installation of PFLOTRAN is working. The test target runs a fuller set of regression tests intended to identify when changes to the code cause significant changes to PFLOTRAN’s results.

$ cd $PFLOTRAN_DIR/regression_tests
$ make check

or

$ cd $PFLOTRAN_DIR/regression_tests
$ make test

When finished, it is useful to remove all of the outfiles generated by running the regression tests with the command make clean-tests:

$ cd $PFLOTRAN_DIR/regression_tests
$ make clean-tests

Calling the test manager through make relies on make variables from PETSc to determine the correct version of python to use, if PFLOTRAN was build with MPI, and optional configurations such as unstructured meshes. The version of python used to call the test manager can be changed from the command line by specifying python:

$ cd ${PFLOTRAN_DIR}/src/pflotran
$ make PYTHON=/opt/local/bin/python3.3 check

To call the test manager manually:

$ cd ${PFLOTRAN_DIR}/regression_tests
$ python regression_tests.py \
    --executable ../src/pflotran/pflotran \
    --config-file shortcourse/copper_leaching/cu_leaching.cfg \
    --tests cu_leaching

Some important command line arguments when running manually are:

  • executable: the path to the PFLOTRAN executable

  • mpiexec: the name of the executable for launching parallel jobs, (mpiexec, mpirun, aprun, etc).

  • config-file: the path to the configuration file containing the tests you want to run

  • recursive-search: the path to a directory. The test manager searches the directory and all its sub-directories for configuration files.

  • tests: a list of test names that should be run

  • suites: a list of test suites that should be run

  • update: indicate that the the gold standard test file for a given test should be updated to the current output.

  • new-tests: indicate that the test is new and current output should be used for gold standard test file.

  • check-performance: include the performance metrics (SOLUTION blocks) in regression checks.

The full list of command line options and a brief description can be found by running with the --help flag:

$ python regression_tests.py --help

Test output

The test manager produces (fairly terse) screen output that includes a progress bar and the status of each test. A legend is provided to help decipher the screen output, and a more detailed explanation of failures and errors can be found in the test log file. Example screen output follows:

Test log file : pflotran-tests-2021-12-21_10-05-24.testlog

Running pflotran regression tests :

  Legend

    . - success
    F - failed regression test (results are outside error tolerances)
    M - failed regression test (results are FAR outside error tolerances)
    G - general error
    U - user error
    V - simulator failure (e.g. failure to converge)
    X - simulator crash
    T - time out error
    C - configuration file [.cfg] error
    I - missing information (e.g. missing files)
    B - pre-processing error (e.g. error in simulation setup scripts
    A - post-processing error (e.g. error in solution comparison)
    S - test skipped
    W - warning
    ? - unknown

.............................................................................
..............................M....................................FF..F.....
.............FFFFFF...FFFF...X..............................................F
FF........FFF.FFFFFFFFFFFFF..F...............................................
...U..........................................F............

------------------------------------------------------------------------------
Regression test summary:
    Total run time: 135.333 [s]
    Total tests : 365
    Tests run : 365
    Failed : 35
    Errors : 2

Users should not be surprised if regression test results produce many F failures. Regression test tolerances are set very tight to catch miniscule changes to simulation results (i.e. default absolute and relative error tolerance: 1.e-12). The correct results stored in .regression.gold files are based on a specific OS and compiler (e.g. Ubuntu, GNU compiler, no optimization). A change in operating system or compiler optization settings will generate very small differences in the solution. However, larger discrepancies (denoted by M) or errors are concerning and should be discussed with developers.

The test directories contain any files generated by PFLOTRAN during the run. Screen output for each test is contained in the file \${TEST\_NAME}.stdout.

Configuration Files

The regression test manager reads tests specified in a series of configuration files in standard cfg (or windows ini file) format. They consist of a series of sections with key-value pairs:

[section-name]
key = value

Section names should be all lower case, and spaces must be replaced by a hyphen or underscore. Comments are specified by a \# character.

A test is declared as a section in the configuration file. It is assumed that there will be a PFLOTRAN input file with the same name as the test section. The key-value pairs in a test section define how the test is run and the output is compared to the gold standard file.

[calcite-kinetics]
#look for an input file named `calcite-kinetics.in'
np = 2
timeout = 30.0
concentration = 1.0e-10 absolute
  • np = N, (optional), indicates a parallel test run with N processors. Default is serial. If mpiexec in not provided on the command line, then parallel tests are skipped.

  • timeout = N, (optional), indicates that the test should be allowed to run for N seconds before it is killed. Default is 60.0 seconds.

  • TYPE = TOLERANCE COMPARISON, indicates that data in the regression file of type TYPE should be compared using a tolerance of TOLERANCE. Know data types are listed below.

The data types and default tolerances are:

  • time = 5 percent

  • concentration = \(1\times 10^{-12}\) absolute

  • generic = \(1\times 10^{-12}\) absolute

  • discrete = 0 absolute

  • rate = \(1\times 10^{-12}\) absolute

  • volume_fraction = \(1\times 10^{-12}\) absolute

  • pressure = \(1\times 10^{-12}\) absolute

  • saturation = \(1\times 10^{-12}\) absolute

  • residual = \(1\times 10^{-12}\) absolute

The default tolerances are deliberately set very tight, and are expected to be overridden on a per-test or per configuration file basis. There are three known comparisons: “absolute”, for absolute differences (\(\delta=|c-g|\)), “relative” for relative differences (\(\delta={|c-g|}/{g}\)), and “percent” for specifying a percent difference (\(\delta=100\cdot{|c-g|}/{g}\)).

In addition there are two optional sections in configuration files. The section “default-test-criteria” specifies the default criteria to be used for all tests in the current file. Criteria specified in a test section override these value. A section name “suites“ defines aliases for a group of tests.

[suites]
standard = test-1 test-2 test-3
standard_parallel = test-4 test-5 test-6

Common test suites are standard and standard_parallel, used by make test, and domain specific test suites, geochemistry, flow, transport, mesh, et cetra.

Creating New Regression Tests

We want running tests to become a habit for developers so that make pflotran is always followed by make test. With that in mind, ideal test cases are small and fast (< 0.1 seconds), and operate on a small subsection of the code so it is easier to diagnose where a problem has occurred. While it may (will) be necessary to create some platform specific tests, we want as many tests as possible to be platform independent and widely used. There is a real danger in having test output become stale if it requires special access to a particular piece of hardware, operating system or compiler to run.

The steps for creating new regression tests are:

  • Create the PFLOTRAN input file, and get the simulation running correctly.

  • Tell PFLOTRAN to generate a regression file by adding a regression block to the input file, e.g.:

    REGRESSION
      CELL_IDS
        1
        3978
      /
      CELLS_PER_PROCESS 4
      VARIABLES
        LIQUID_PRESSURE
        GAS_SATURATION
      /
    END
    
  • Add the test to the configuration file

  • Refine the tolerances so that they will be tight enough to identify problems, but loose enough that they do not create a lot of false positives and discourage users and developers from running the tests.

  • Add the test to the appropriate test suite.

  • Add the configuration file, input file and “gold” file to revision control.

Updating Test Results

The output from PFLOTRAN should be fairly stable, and we consider the current output to be “correct”. Changes to regression output should be rare, and primarily done for bug fixes. Updating the test results is simply a matter of replacing the gold standard file with a new file. This can be done with a simple rename in the file system:

mv test_1.regression test_1.regression.gold

Or using the regression test manager:

$ python regression_tests.py --executable ../src/pflotran/pflotran \
    --config-file my_test.cfg --tests test_1 --update

Updating through the regression test manager ensures that the output is from your current executable rather than a stale file.

Please document why you updated gold standard files in your revision control commit message.