MATK¶

class matk.matk.matk(model='', model_args=None, model_kwargs={}, cpus=1, workdir_base=None, workdir=None, results_file=None, seed=None, sample_size=10, hosts={})¶

Class for Model Analysis ToolKit (MATK) module

Jac(h=None, cpus=1, workdir_base=None, save=True, reuse_dirs=False, verbose=False)¶

Numerical Jacobian calculation

Parameters:	h (fl64 or ndarray(fl64)) – Parameter increment, single value or array with npar values
Returns:	ndarray(fl64) – Jacobian matrix

MCMC(nruns=10000, burn=1000, init_error_std=1.0, max_error_std=100.0, verbose=1)¶

Perform Markov Chain Monte Carlo sampling using pymc package

Parameters:	nruns (int) – Number of MCMC iterations (samples) burn (int) – Number of initial samples to burn (discard) verbose (int) – verbosity of output init_error_std (fl64) – Initial standard deviation of residuals max_error_std (fl64) – Maximum standard deviation of residuals that will be considered
Returns:	pymc MCMC object

add_obs(name, sim=None, weight=1.0, value=None, group=None)¶

Add observation to problem

Parameters:	name (str) – Observation name sim (fl64) – Simulated value weight (fl64) – Observation weight value (fl64) – Value of observation group (str) – Name identifying group or type of observation
Returns:	Observation object

add_par(name, value=None, vary=True, min=None, max=None, expr=None, discrete_vals=[], **kwargs)¶

Add parameter to problem

Parameters:

name (str) – Name of parameter
value (float) – Initial parameter value
vary (bool) – Whether parameter should be varied or not, currently only used with lmfit
min (float) – Minimum bound
max (float) – Maximum bound
expr (str) – Mathematical expression to use to calculate parameter value
discrete_vals ((lst,lst)) – tuple of two array_like defining discrete values and associated probabilities
kwargs – keyword arguments passed to parameter class

calibrate(cpus=1, maxiter=100, lambdax=0.001, minchange=1e-16, minlambdax=1e-06, verbose=False, workdir=None, reuse_dirs=False, h=1e-06)¶

Calibrate MATK model using Levenberg-Marquardt algorithm based on original code written by Ernesto P. Adorio PhD. (UPDEPP at Clarkfield, Pampanga)

Parameters:	cpus (int) – Number of cpus to use maxiter (int) – Maximum number of iterations lambdax (fl64) – Initial Marquardt lambda minchange (fl64) – Minimum change between successive ChiSquares minlambdax (fl4) – Minimum lambda value verbose (bool) – If True, additional information written to screen during calibration
Returns:	best fit parameters found by routine
Returns:	best Sum of squares.
Returns:	covariance matrix

copy_sampleset(oldname, newname=None)¶

Copy sampleset

Parameters:	oldname (str) – Name of sampleset to copy newname (str) – Name of new sampleset

cpus¶: Set number of cpus to use for concurrent model evaluations

create_sampleset(samples, name=None, responses=None, indices=None, index_start=1)¶

Add sample set to problem

Parameters:

name (str) – Name of sample set
samples (list(fl64),ndarray(fl64)) – Matrix of parameter samples with npar columns in order of matk.pars.keys()
responses (list(fl64),ndarray(fl64)) – Matrix of associated responses with nobs columns in order matk.obs.keys() if observation exists (existence of observations is not required)
indices (list(int),ndarray(int)) – Sample indices to use when creating working directories and output files

differential_evolution(bounds=(), workdir=None, strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube', save_evals=False)¶

Perform differential evolution calibration using scipy.optimize.differential_evolution:

Differential Evolution is stochastic in nature (does not use gradient methods) to find the minimium, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient based techniques.

The algorithm is due to Storn and Price.

Parameters func : callable The objective function to be minimized. Must be in the form f(x, *args), where x is the argument in the form of a 1-D array and args is a tuple of any additional fixed parameters needed to completely specify the function. bounds : sequence Bounds for variables. (min, max) pairs for each element in x, defining the lower and upper bounds for the optimizing argument of func. It is required to have len(bounds) == len(x). len(bounds) is used to determine the number of parameters in x. strategy : str, optional The differential evolution strategy to use. Should be one of: - ‘best1bin’ - ‘best1exp’ - ‘rand1exp’ - ‘randtobest1exp’ - ‘best2exp’ - ‘rand2exp’ - ‘randtobest1bin’ - ‘best2bin’ - ‘rand2bin’ - ‘rand1bin’ The default is ‘best1bin’. maxiter : int, optional The maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is: (maxiter + 1) * popsize * len(x) popsize : int, optional A multiplier for setting the total population size. The population has popsize * len(x) individuals. tol : float, optional When the mean of the population energies, multiplied by tol, divided by the standard deviation of the population energies is greater than 1 the solving process terminates: convergence = mean(pop) * tol / stdev(pop) > 1 mutation : float or tuple(float, float), optional The mutation constant. In the literature this is also known as differential weight, being denoted by F. If specified as a float it should be in the range [0, 2]. If specified as a tuple (min, max) dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken from U[min, max). Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence. recombination : float, optional The recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability. seed : int or np.random.RandomState, optional If seed is not specified the np.RandomState singleton is used. If seed is an int, a new np.random.RandomState instance is used, seeded with seed. If seed is already a np.random.RandomState instance, then that np.random.RandomState instance is used. Specify seed for repeatable minimizations. disp : bool, optional Display status messages callback : callable, callback(xk, convergence=val), optional A function to follow the progress of the minimization. xk is the current value of x0. val represents the fractional value of the population convergence. When val is greater than one the function halts. If callback returns True, then the minimization is halted (any polishing is still carried out). polish : bool, optional If True (default), then scipy.optimize.minimize with the L-BFGS-B method is used to polish the best population member at the end, which can improve the minimization slightly. init : string, optional Specify how the population initialization is performed. Should be one of:

‘latinhypercube’

‘random’

The default is ‘latinhypercube’. Latin Hypercube sampling tries to maximize coverage of the available parameter space. ‘random’ initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Returns ——- res : OptimizeResult The optimization result represented as a OptimizeResult object. Important attributes are: x the solution array, success a Boolean flag indicating if the optimizer exited successfully and message which describes the cause of the termination. See OptimizeResult for a description of other attributes. If polish was employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains the jac attribute.

emcee(lnprob=None, lnprob_args=(), nwalkers=100, nsamples=500, burnin=50, pos0=None, ncpus=1)¶

Perform Markov Chain Monte Carlo sampling using emcee package

Parameters:	lnprob (function) – Function specifying the natural logarithm of the likelihood function nwalkers (int) – Number of random walkers nsamples (int) – Number of samples per walker burnin (int) – Number of “burn-in” samples per walker to be discarded pos0 (list) – list of initial positions for the walkers ncpus (int) – number of cpus
Returns:	numpy array containing samples

forward(pardict=None, workdir=None, reuse_dirs=False, job_number=None, hostname=None, processor=None)¶

Run MATK model using current values

Parameters:

pardict (dict) – Dictionary of parameter values keyed by parameter names
workdir (str) – Name of directory where model will be run. It will be created if it does not exist
reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
job_number (int) – Sample id
hostname (str) – Name of host to run job on, will be passed to MATK model as kwarg ‘hostname’
processor (str or int) – Processor id to run job on, will be passed to MATK model as kwarg ‘processor’

Returns:

int – 0: Successful run, 1: workdir exists

levmar(workdir=None, reuse_dirs=False, max_iter=1000, full_output=True)¶

Calibrate MATK model using levmar package

Parameters:	workdir (str) – Name of directory where model will be run. It will be created if it does not exist reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory max_iter (int) – Maximum number of iterations full_output – If True, additional output displayed during calibration
Returns:	levmar output

lhs(name=None, siz=None, noCorrRestr=False, corrmat=None, seed=None, index_start=1)¶

Draw lhs samples of parameter values from scipy.stats module distribution

Parameters:	name (str) – Name of sample set to be created siz (int) – Number of samples to generate, ignored if samples are provided noCorrRestr (bool) – If True, correlation structure is not enforced on sample, use if siz is less than number of parameters corrmat (matrix) – Correlation matrix seed (int) – Random seed to allow replication of samples index_start – Starting value for sample indices
Type:	int
Returns:	matrix – Parameter samples

lmfit(maxfev=0, report_fit=True, cpus=1, epsfcn=None, xtol=1e-07, ftol=1e-07, workdir=None, verbose=False, save_evals=False, difference_type='forward', **kwargs)¶

Calibrate MATK model using lmfit package

Parameters:

maxfev (int) – Max number of function evaluations, if 0, 100*(npars+1) will be used
report_fit (bool) – If True, parameter statistics and correlations are printed to the screen
cpus (int) – Number of cpus to use for concurrent simulations during jacobian approximation
epsfcn (float or lst[float]) – jacobian finite difference approximation increment (single float of list of npar floats)
xtol (float) – Relative error in approximate solution
ftol (float) – Relative error in the desired sum of squares
workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
verbose (bool) – If true, print diagnostic information to the screen
difference_type (str) – Type of finite difference approximation, ‘forward’ or ‘central’
save_evals (bool) – If True, a MATK sampleset of calibration function evaluation parameters and responses will be returned

Returns:

tuple(lmfit minimizer object; parameter object; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses)

Additional keyword argments will be passed to scipy leastsq function: http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.leastsq.html

make_workdir(workdir=None, reuse_dirs=False)¶

Create a working directory

Parameters:	workdir (str) – Name of directory where model will be run. It will be created if it does not exist reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
Returns:	int – 0: Successful run, 1: workdir exists

minimize(method='SLSQP', maxiter=100, workdir=None, bounds=(), constraints=(), options={'eps': 0.001}, save_evals=False)¶

Minimize a scalar function of one or more variables

Parameters:	maxiter (int) – Max number of iterations workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
Returns:	OptimizeResult; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses

model¶: Python function that runs model

model_args¶: Tuple of extra arguments to MATK model expected to come after parameter dictionary

model_kwargs¶: Dictionary of extra keyword arguments to MATK model expected to come after parameter dictionary and model_args

nomvalues¶: Nominal parameter values used in info gap analyses

obsgroups¶: Get observation groups

obsnames¶: Get observation names

obsvalues¶: Observation values

obsweights¶: Get observation weights

pardist_pars¶: Get parameters needed by parameter distributions

pardists¶: Get parameter probabilistic distributions

parmaxs¶: Get parameter upper bounds

parmins¶: Get parameter lower bounds

parnames¶: Get parameter names

parstudy(nvals=2, name=None)¶

Generate parameter study samples.: For discrete parameters with nvals>3, bins are chosen to be spaced as far apart as possible, while still being evenly spaced (note that this is based on bins, not actual values).

Parameters:	name (str) – Name of sample set to be created outfile (str) – Name of file where samples will be written. If outfile=None, no file is written. nvals (int or list(int)) – number of values for each parameter
Returns:	ndarray(fl64) – Array of samples

parvalues¶: Parameter values

read_sampleset(file, name=None)¶

Read MATK output file and assemble corresponding sampleset with responses.

Parameters:	name (str) – Name of sample set file (str) – Path to MATK output file

residuals(group=None)¶

Get least squares values

Parameters:	group (str) – Group name of observations; if not None, only residuals in group will be returned

results_file¶: Set the name of the results_file for parallel runs

saltelli(nsamples, name=None, calc_second_order=True, index_start=1, problem={})¶

Create sampleset using Saltelli’s extension of the Sobol sequence intended to be used with sobol method. This method calls functionality from the SALib package.

Parameters:

nsamples (int) – Number of samples to create for each parameter. If calc_second_order is False, the actual sample size will be N * (D + 2), otherwise, it will be N * (2D + 2)
name (str) – Name of sample set to be created
calc_second_order (bool) – Calculate second-order sensitivities
index_start (int) – Starting value for sample indices
problem (dict) – Dictionary of model attributes used by sampler
problem – Dictionary of model attributes used by sampler. For example, dictionary with a list with keyname ‘groups’ containing a list of length of the number of parameters with parameter group names can be used to group parameters with similar effects on the observation.

Returns:

MATK sampleset

seed¶: Set the seed for random sampling

simdict¶: Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()

simvalues¶: Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()

ssr(group=None)¶

Sum of squared residuals

Parameters:	group (str) – Group name of observations; if not None, ssr for observation group will be returned

workdir¶: Set the base name for parallel working directories

workdir_base¶: Set the base name for parallel working directories

workdir_index¶: Set the working directory index for parallel runs

Previous topic

Next topic

This Page

MATK¶