MATK

class matk.matk.matk(model='', model_args=None, model_kwargs={}, cpus=1, workdir_base=None, workdir=None, results_file=None, seed=None, sample_size=10, hosts={})

Class for Model Analysis ToolKit (MATK) module

Jac(h=None, cpus=1, workdir_base=None, save=True, reuse_dirs=False, verbose=False)

Numerical Jacobian calculation

Parameters:h (fl64 or ndarray(fl64)) – Parameter increment, single value or array with npar values
Returns:ndarray(fl64) – Jacobian matrix
MCMC(nruns=10000, burn=1000, init_error_std=1.0, max_error_std=100.0, verbose=1)

Perform Markov Chain Monte Carlo sampling using pymc package

Parameters:
  • nruns (int) – Number of MCMC iterations (samples)
  • burn (int) – Number of initial samples to burn (discard)
  • verbose (int) – verbosity of output
  • init_error_std (fl64) – Initial standard deviation of residuals
  • max_error_std (fl64) – Maximum standard deviation of residuals that will be considered
Returns:

pymc MCMC object

add_obs(name, sim=None, weight=1.0, value=None)

Add observation to problem

Parameters:
  • name (str) – Observation name
  • sim (fl64) – Simulated value
  • weight (fl64) – Observation weight
  • value (fl64) – Value of observation
Returns:

Observation object

add_par(name, value=None, vary=True, min=None, max=None, expr=None, discrete_vals=[], **kwargs)

Add parameter to problem

Parameters:
  • name (str) – Name of parameter
  • value (float) – Initial parameter value
  • vary (bool) – Whether parameter should be varied or not, currently only used with lmfit
  • min (float) – Minimum bound
  • max (float) – Maximum bound
  • expr (str) – Mathematical expression to use to calculate parameter value
  • discrete_vals ((lst,lst)) – tuple of two array_like defining discrete values and associated probabilities
  • kwargs – keyword arguments passed to parameter class
calibrate(cpus=1, maxiter=100, lambdax=0.001, minchange=1e-16, minlambdax=1e-06, verbose=False, workdir=None, reuse_dirs=False, h=1e-06)

Calibrate MATK model using Levenberg-Marquardt algorithm based on original code written by Ernesto P. Adorio PhD. (UPDEPP at Clarkfield, Pampanga)

Parameters:
  • cpus (int) – Number of cpus to use
  • maxiter (int) – Maximum number of iterations
  • lambdax (fl64) – Initial Marquardt lambda
  • minchange (fl64) – Minimum change between successive ChiSquares
  • minlambdax (fl4) – Minimum lambda value
  • verbose (bool) – If True, additional information written to screen during calibration
Returns:

best fit parameters found by routine

Returns:

best Sum of squares.

Returns:

covariance matrix

copy_sampleset(oldname, newname=None)

Copy sampleset

Parameters:
  • oldname (str) – Name of sampleset to copy
  • newname (str) – Name of new sampleset
cpus

Set number of cpus to use for concurrent model evaluations

create_sampleset(samples, name=None, responses=None, indices=None, index_start=1)

Add sample set to problem

Parameters:
  • name (str) – Name of sample set
  • samples (list(fl64),ndarray(fl64)) – Matrix of parameter samples with npar columns in order of matk.pars.keys()
  • responses (list(fl64),ndarray(fl64)) – Matrix of associated responses with nobs columns in order matk.obs.keys() if observation exists (existence of observations is not required)
  • indices (list(int),ndarray(int)) – Sample indices to use when creating working directories and output files
differential_evolution(bounds=(), workdir=None, strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube', save_evals=False)

Perform differential evolution calibration using scipy.optimize.differential_evolution:

Differential Evolution is stochastic in nature (does not use gradient methods) to find the minimium, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient based techniques.

The algorithm is due to Storn and Price.

Parameters func : callable The objective function to be minimized. Must be in the form f(x, *args), where x is the argument in the form of a 1-D array and args is a tuple of any additional fixed parameters needed to completely specify the function. bounds : sequence Bounds for variables. (min, max) pairs for each element in x, defining the lower and upper bounds for the optimizing argument of func. It is required to have len(bounds) == len(x). len(bounds) is used to determine the number of parameters in x. strategy : str, optional The differential evolution strategy to use. Should be one of: - ‘best1bin’ - ‘best1exp’ - ‘rand1exp’ - ‘randtobest1exp’ - ‘best2exp’ - ‘rand2exp’ - ‘randtobest1bin’ - ‘best2bin’ - ‘rand2bin’ - ‘rand1bin’ The default is ‘best1bin’. maxiter : int, optional The maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is: (maxiter + 1) * popsize * len(x) popsize : int, optional A multiplier for setting the total population size. The population has popsize * len(x) individuals. tol : float, optional When the mean of the population energies, multiplied by tol, divided by the standard deviation of the population energies is greater than 1 the solving process terminates: convergence = mean(pop) * tol / stdev(pop) > 1 mutation : float or tuple(float, float), optional The mutation constant. In the literature this is also known as differential weight, being denoted by F. If specified as a float it should be in the range [0, 2]. If specified as a tuple (min, max) dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken from U[min, max). Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence. recombination : float, optional The recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability. seed : int or np.random.RandomState, optional If seed is not specified the np.RandomState singleton is used. If seed is an int, a new np.random.RandomState instance is used, seeded with seed. If seed is already a np.random.RandomState instance, then that np.random.RandomState instance is used. Specify seed for repeatable minimizations. disp : bool, optional Display status messages callback : callable, callback(xk, convergence=val), optional A function to follow the progress of the minimization. xk is the current value of x0. val represents the fractional value of the population convergence. When val is greater than one the function halts. If callback returns True, then the minimization is halted (any polishing is still carried out). polish : bool, optional If True (default), then scipy.optimize.minimize with the L-BFGS-B method is used to polish the best population member at the end, which can improve the minimization slightly. init : string, optional Specify how the population initialization is performed. Should be one of:

  • ‘latinhypercube’
  • ‘random’

The default is ‘latinhypercube’. Latin Hypercube sampling tries to maximize coverage of the available parameter space. ‘random’ initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Returns ——- res : OptimizeResult The optimization result represented as a OptimizeResult object. Important attributes are: x the solution array, success a Boolean flag indicating if the optimizer exited successfully and message which describes the cause of the termination. See OptimizeResult for a description of other attributes. If polish was employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains the jac attribute.

emcee(lnprob=None, lnprob_args=(), nwalkers=100, nsamples=500, burnin=50, pos0=None, ncpus=1)

Perform Markov Chain Monte Carlo sampling using emcee package

Parameters:
  • lnprob (function) – Function specifying the natural logarithm of the likelihood function
  • nwalkers (int) – Number of random walkers
  • nsamples (int) – Number of samples per walker
  • burnin (int) – Number of “burn-in” samples per walker to be discarded
  • pos0 (list) – list of initial positions for the walkers
  • ncpus (int) – number of cpus
Returns:

numpy array containing samples

forward(pardict=None, workdir=None, reuse_dirs=False, job_number=None, hostname=None, processor=None)

Run MATK model using current values

Parameters:
  • pardict (dict) – Dictionary of parameter values keyed by parameter names
  • workdir (str) – Name of directory where model will be run. It will be created if it does not exist
  • reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
  • job_number (int) – Sample id
  • hostname (str) – Name of host to run job on, will be passed to MATK model as kwarg ‘hostname’
  • processor (str or int) – Processor id to run job on, will be passed to MATK model as kwarg ‘processor’
Returns:

int – 0: Successful run, 1: workdir exists

levmar(workdir=None, reuse_dirs=False, max_iter=1000, full_output=True)

Calibrate MATK model using levmar package

Parameters:
  • workdir (str) – Name of directory where model will be run. It will be created if it does not exist
  • reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
  • max_iter (int) – Maximum number of iterations
  • full_output – If True, additional output displayed during calibration
Returns:

levmar output

lhs(name=None, siz=None, noCorrRestr=False, corrmat=None, seed=None, index_start=1)

Draw lhs samples of parameter values from scipy.stats module distribution

Parameters:
  • name (str) – Name of sample set to be created
  • siz (int) – Number of samples to generate, ignored if samples are provided
  • noCorrRestr (bool) – If True, correlation structure is not enforced on sample, use if siz is less than number of parameters
  • corrmat (matrix) – Correlation matrix
  • seed (int) – Random seed to allow replication of samples
  • index_start – Starting value for sample indices
Type:

int

Returns:

matrix – Parameter samples

lmfit(maxfev=0, report_fit=True, cpus=1, epsfcn=None, xtol=1e-07, ftol=1e-07, workdir=None, verbose=False, save_evals=False, difference_type='forward', **kwargs)

Calibrate MATK model using lmfit package

Parameters:
  • maxfev (int) – Max number of function evaluations, if 0, 100*(npars+1) will be used
  • report_fit (bool) – If True, parameter statistics and correlations are printed to the screen
  • cpus (int) – Number of cpus to use for concurrent simulations during jacobian approximation
  • epsfcn (float or lst[float]) – jacobian finite difference approximation increment (single float of list of npar floats)
  • xtol (float) – Relative error in approximate solution
  • ftol (float) – Relative error in the desired sum of squares
  • workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
  • verbose (bool) – If true, print diagnostic information to the screen
  • difference_type (str) – Type of finite difference approximation, ‘forward’ or ‘central’
  • save_evals – If True, a MATK sampleset of calibration function evaluation parameters and responses will be returned
Returns:

tuple(lmfit minimizer object; parameter object; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses)

Additional keyword argments will be passed to scipy leastsq function: http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.leastsq.html

make_workdir(workdir=None, reuse_dirs=False)

Create a working directory

Parameters:
  • workdir (str) – Name of directory where model will be run. It will be created if it does not exist
  • reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
Returns:

int – 0: Successful run, 1: workdir exists

minimize(method='SLSQP', maxiter=100, workdir=None, bounds=(), constraints=(), options={'eps': 0.001}, save_evals=False)

Minimize a scalar function of one or more variables

Parameters:
  • maxiter (int) – Max number of iterations
  • workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
Returns:

OptimizeResult; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses

model

Python function that runs model

model_args

Tuple of extra arguments to MATK model expected to come after parameter dictionary

model_kwargs

Dictionary of extra keyword arguments to MATK model expected to come after parameter dictionary and model_args

nomvalues

Nominal parameter values used in info gap analyses

obsnames

Get observation names

obsvalues

Observation values

obsweights

Get observation weights

pardist_pars

Get parameters needed by parameter distributions

pardists

Get parameter probabilistic distributions

parmaxs

Get parameter upper bounds

parmins

Get parameter lower bounds

parnames

Get parameter names

parstudy(nvals=2, name=None)
Generate parameter study samples.
For discrete parameters with nvals>3, bins are chosen to be spaced as far apart as possible, while still being evenly spaced (note that this is based on bins, not actual values).
Parameters:
  • name (str) – Name of sample set to be created
  • outfile (str) – Name of file where samples will be written. If outfile=None, no file is written.
  • nvals (int or list(int)) – number of values for each parameter
Returns:

ndarray(fl64) – Array of samples

parvalues

Parameter values

read_sampleset(file, name=None)

Read MATK output file and assemble corresponding sampleset with responses.

Parameters:
  • name (str) – Name of sample set
  • file (str) – Path to MATK output file
residuals

Get least squares values

results_file

Set the name of the results_file for parallel runs

saltelli(nsamples, name=None, calc_second_order=True, index_start=1, problem={})

Create sampleset using Saltelli’s extension of the Sobol sequence intended to be used with sobol method. This method calls functionality from the SALib package.

Parameters:
  • nsamples (int) – Number of samples to create for each parameter. If calc_second_order is False, the actual sample size will be N * (D + 2), otherwise, it will be N * (2D + 2)
  • name (str) – Name of sample set to be created
  • calc_second_order (bool) – Calculate second-order sensitivities
  • index_start (int) – Starting value for sample indices
  • problem (dict) – Dictionary of model attributes used by sampler
  • problem – Dictionary of model attributes used by sampler. For example, dictionary with a list with keyname ‘groups’ containing a list of length of the number of parameters with parameter group names can be used to group parameters with similar effects on the observation.
Returns:

MATK sampleset

seed

Set the seed for random sampling

simdict

Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()

simvalues

Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()

ssr

Sum of squared residuals

workdir

Set the base name for parallel working directories

workdir_base

Set the base name for parallel working directories

workdir_index

Set the working directory index for parallel runs