MATK¶
-
class
matk.matk.
matk
(model='', model_args=None, model_kwargs={}, cpus=1, workdir_base=None, workdir=None, results_file=None, seed=None, sample_size=10, hosts={})¶ Class for Model Analysis ToolKit (MATK) module
-
Jac
(h=None, cpus=1, workdir_base=None, save=True, reuse_dirs=False, verbose=False)¶ Numerical Jacobian calculation
Parameters: h (fl64 or ndarray(fl64)) – Parameter increment, single value or array with npar values Returns: ndarray(fl64) – Jacobian matrix
-
MCMC
(nruns=10000, burn=1000, init_error_std=1.0, max_error_std=100.0, verbose=1)¶ Perform Markov Chain Monte Carlo sampling using pymc package
Parameters: - nruns (int) – Number of MCMC iterations (samples)
- burn (int) – Number of initial samples to burn (discard)
- verbose (int) – verbosity of output
- init_error_std (fl64) – Initial standard deviation of residuals
- max_error_std (fl64) – Maximum standard deviation of residuals that will be considered
Returns: pymc MCMC object
-
add_obs
(name, sim=None, weight=1.0, value=None, group=None)¶ Add observation to problem
Parameters: - name (str) – Observation name
- sim (fl64) – Simulated value
- weight (fl64) – Observation weight
- value (fl64) – Value of observation
- group (str) – Name identifying group or type of observation
Returns: Observation object
-
add_par
(name, value=None, vary=True, min=None, max=None, expr=None, discrete_vals=[], **kwargs)¶ Add parameter to problem
Parameters: - name (str) – Name of parameter
- value (float) – Initial parameter value
- vary (bool) – Whether parameter should be varied or not, currently only used with lmfit
- min (float) – Minimum bound
- max (float) – Maximum bound
- expr (str) – Mathematical expression to use to calculate parameter value
- discrete_vals ((lst,lst)) – tuple of two array_like defining discrete values and associated probabilities
- kwargs – keyword arguments passed to parameter class
-
calibrate
(cpus=1, maxiter=100, lambdax=0.001, minchange=1e-16, minlambdax=1e-06, verbose=False, workdir=None, reuse_dirs=False, h=1e-06)¶ Calibrate MATK model using Levenberg-Marquardt algorithm based on original code written by Ernesto P. Adorio PhD. (UPDEPP at Clarkfield, Pampanga)
Parameters: - cpus (int) – Number of cpus to use
- maxiter (int) – Maximum number of iterations
- lambdax (fl64) – Initial Marquardt lambda
- minchange (fl64) – Minimum change between successive ChiSquares
- minlambdax (fl4) – Minimum lambda value
- verbose (bool) – If True, additional information written to screen during calibration
Returns: best fit parameters found by routine
Returns: best Sum of squares.
Returns: covariance matrix
-
copy_sampleset
(oldname, newname=None)¶ Copy sampleset
Parameters: - oldname (str) – Name of sampleset to copy
- newname (str) – Name of new sampleset
-
cpus
¶ Set number of cpus to use for concurrent model evaluations
-
create_sampleset
(samples, name=None, responses=None, indices=None, index_start=1)¶ Add sample set to problem
Parameters: - name (str) – Name of sample set
- samples (list(fl64),ndarray(fl64)) – Matrix of parameter samples with npar columns in order of matk.pars.keys()
- responses (list(fl64),ndarray(fl64)) – Matrix of associated responses with nobs columns in order matk.obs.keys() if observation exists (existence of observations is not required)
- indices (list(int),ndarray(int)) – Sample indices to use when creating working directories and output files
-
differential_evolution
(bounds=(), workdir=None, strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube', save_evals=False)¶ Perform differential evolution calibration using scipy.optimize.differential_evolution:
Differential Evolution is stochastic in nature (does not use gradient methods) to find the minimium, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient based techniques.
The algorithm is due to Storn and Price.
Parameters func : callable The objective function to be minimized. Must be in the form
f(x, *args)
, wherex
is the argument in the form of a 1-D array andargs
is a tuple of any additional fixed parameters needed to completely specify the function. bounds : sequence Bounds for variables.(min, max)
pairs for each element inx
, defining the lower and upper bounds for the optimizing argument of func. It is required to havelen(bounds) == len(x)
.len(bounds)
is used to determine the number of parameters inx
. strategy : str, optional The differential evolution strategy to use. Should be one of: - ‘best1bin’ - ‘best1exp’ - ‘rand1exp’ - ‘randtobest1exp’ - ‘best2exp’ - ‘rand2exp’ - ‘randtobest1bin’ - ‘best2bin’ - ‘rand2bin’ - ‘rand1bin’ The default is ‘best1bin’. maxiter : int, optional The maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is:(maxiter + 1) * popsize * len(x)
popsize : int, optional A multiplier for setting the total population size. The population haspopsize * len(x)
individuals. tol : float, optional When the mean of the population energies, multiplied by tol, divided by the standard deviation of the population energies is greater than 1 the solving process terminates:convergence = mean(pop) * tol / stdev(pop) > 1
mutation : float or tuple(float, float), optional The mutation constant. In the literature this is also known as differential weight, being denoted by F. If specified as a float it should be in the range [0, 2]. If specified as a tuple(min, max)
dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken fromU[min, max)
. Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence. recombination : float, optional The recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability. seed : int or np.random.RandomState, optional If seed is not specified the np.RandomState singleton is used. If seed is an int, a new np.random.RandomState instance is used, seeded with seed. If seed is already a np.random.RandomState instance, then that np.random.RandomState instance is used. Specify seed for repeatable minimizations. disp : bool, optional Display status messages callback : callable, callback(xk, convergence=val), optional A function to follow the progress of the minimization.xk
is the current value ofx0
.val
represents the fractional value of the population convergence. Whenval
is greater than one the function halts. If callback returns True, then the minimization is halted (any polishing is still carried out). polish : bool, optional If True (default), then scipy.optimize.minimize with the L-BFGS-B method is used to polish the best population member at the end, which can improve the minimization slightly. init : string, optional Specify how the population initialization is performed. Should be one of:- ‘latinhypercube’
- ‘random’
The default is ‘latinhypercube’. Latin Hypercube sampling tries to maximize coverage of the available parameter space. ‘random’ initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Returns ——- res : OptimizeResult The optimization result represented as a OptimizeResult object. Important attributes are:
x
the solution array,success
a Boolean flag indicating if the optimizer exited successfully andmessage
which describes the cause of the termination. See OptimizeResult for a description of other attributes. If polish was employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains thejac
attribute.
-
emcee
(lnprob=None, lnprob_args=(), nwalkers=100, nsamples=500, burnin=50, pos0=None, ncpus=1)¶ Perform Markov Chain Monte Carlo sampling using emcee package
Parameters: - lnprob (function) – Function specifying the natural logarithm of the likelihood function
- nwalkers (int) – Number of random walkers
- nsamples (int) – Number of samples per walker
- burnin (int) – Number of “burn-in” samples per walker to be discarded
- pos0 (list) – list of initial positions for the walkers
- ncpus (int) – number of cpus
Returns: numpy array containing samples
-
forward
(pardict=None, workdir=None, reuse_dirs=False, job_number=None, hostname=None, processor=None)¶ Run MATK model using current values
Parameters: - pardict (dict) – Dictionary of parameter values keyed by parameter names
- workdir (str) – Name of directory where model will be run. It will be created if it does not exist
- reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
- job_number (int) – Sample id
- hostname (str) – Name of host to run job on, will be passed to MATK model as kwarg ‘hostname’
- processor (str or int) – Processor id to run job on, will be passed to MATK model as kwarg ‘processor’
Returns: int – 0: Successful run, 1: workdir exists
-
levmar
(workdir=None, reuse_dirs=False, max_iter=1000, full_output=True)¶ Calibrate MATK model using levmar package
Parameters: - workdir (str) – Name of directory where model will be run. It will be created if it does not exist
- reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
- max_iter (int) – Maximum number of iterations
- full_output – If True, additional output displayed during calibration
Returns: levmar output
-
lhs
(name=None, siz=None, noCorrRestr=False, corrmat=None, seed=None, index_start=1)¶ Draw lhs samples of parameter values from scipy.stats module distribution
Parameters: - name (str) – Name of sample set to be created
- siz (int) – Number of samples to generate, ignored if samples are provided
- noCorrRestr (bool) – If True, correlation structure is not enforced on sample, use if siz is less than number of parameters
- corrmat (matrix) – Correlation matrix
- seed (int) – Random seed to allow replication of samples
- index_start – Starting value for sample indices
Type: int
Returns: matrix – Parameter samples
-
lmfit
(maxfev=0, report_fit=True, cpus=1, epsfcn=None, xtol=1e-07, ftol=1e-07, workdir=None, verbose=False, save_evals=False, difference_type='forward', **kwargs)¶ Calibrate MATK model using lmfit package
Parameters: - maxfev (int) – Max number of function evaluations, if 0, 100*(npars+1) will be used
- report_fit (bool) – If True, parameter statistics and correlations are printed to the screen
- cpus (int) – Number of cpus to use for concurrent simulations during jacobian approximation
- epsfcn (float or lst[float]) – jacobian finite difference approximation increment (single float of list of npar floats)
- xtol (float) – Relative error in approximate solution
- ftol (float) – Relative error in the desired sum of squares
- workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
- verbose (bool) – If true, print diagnostic information to the screen
- difference_type (str) – Type of finite difference approximation, ‘forward’ or ‘central’
- save_evals (bool) – If True, a MATK sampleset of calibration function evaluation parameters and responses will be returned
Returns: tuple(lmfit minimizer object; parameter object; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses)
Additional keyword argments will be passed to scipy leastsq function: http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.leastsq.html
-
make_workdir
(workdir=None, reuse_dirs=False)¶ Create a working directory
Parameters: - workdir (str) – Name of directory where model will be run. It will be created if it does not exist
- reuse_dirs (bool) – If True and workdir exists, the model will reuse the directory
Returns: int – 0: Successful run, 1: workdir exists
-
minimize
(method='SLSQP', maxiter=100, workdir=None, bounds=(), constraints=(), options={'eps': 0.001}, save_evals=False)¶ Minimize a scalar function of one or more variables
Parameters: - maxiter (int) – Max number of iterations
- workdir (str) – Name of directory to use for model runs, calibrated parameters will be run there after calibration
Returns: OptimizeResult; if save_evals=True, also returns a MATK sampleset of calibration function evaluation parameters and responses
-
model
¶ Python function that runs model
-
model_args
¶ Tuple of extra arguments to MATK model expected to come after parameter dictionary
-
model_kwargs
¶ Dictionary of extra keyword arguments to MATK model expected to come after parameter dictionary and model_args
-
nomvalues
¶ Nominal parameter values used in info gap analyses
-
obsgroups
¶ Get observation groups
-
obsnames
¶ Get observation names
-
obsvalues
¶ Observation values
-
obsweights
¶ Get observation weights
-
pardist_pars
¶ Get parameters needed by parameter distributions
-
pardists
¶ Get parameter probabilistic distributions
-
parmaxs
¶ Get parameter upper bounds
-
parmins
¶ Get parameter lower bounds
-
parnames
¶ Get parameter names
-
parstudy
(nvals=2, name=None)¶ - Generate parameter study samples.
- For discrete parameters with nvals>3, bins are chosen to be spaced as far apart as possible, while still being evenly spaced (note that this is based on bins, not actual values).
Parameters: - name (str) – Name of sample set to be created
- outfile (str) – Name of file where samples will be written. If outfile=None, no file is written.
- nvals (int or list(int)) – number of values for each parameter
Returns: ndarray(fl64) – Array of samples
-
parvalues
¶ Parameter values
-
read_sampleset
(file, name=None)¶ Read MATK output file and assemble corresponding sampleset with responses.
Parameters: - name (str) – Name of sample set
- file (str) – Path to MATK output file
-
residuals
(group=None)¶ Get least squares values
Parameters: group (str) – Group name of observations; if not None, only residuals in group will be returned
-
results_file
¶ Set the name of the results_file for parallel runs
-
saltelli
(nsamples, name=None, calc_second_order=True, index_start=1, problem={})¶ Create sampleset using Saltelli’s extension of the Sobol sequence intended to be used with sobol method. This method calls functionality from the SALib package.
Parameters: - nsamples (int) – Number of samples to create for each parameter. If calc_second_order is False, the actual sample size will be N * (D + 2), otherwise, it will be N * (2D + 2)
- name (str) – Name of sample set to be created
- calc_second_order (bool) – Calculate second-order sensitivities
- index_start (int) – Starting value for sample indices
- problem (dict) – Dictionary of model attributes used by sampler
- problem – Dictionary of model attributes used by sampler. For example, dictionary with a list with keyname ‘groups’ containing a list of length of the number of parameters with parameter group names can be used to group parameters with similar effects on the observation.
Returns: MATK sampleset
-
seed
¶ Set the seed for random sampling
-
simdict
¶ Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()
-
simvalues
¶ Simulated values :returns: lst(fl64) – simulated values in order of matk.obs.keys()
-
ssr
(group=None)¶ Sum of squared residuals
Parameters: group (str) – Group name of observations; if not None, ssr for observation group will be returned
-
workdir
¶ Set the base name for parallel working directories
-
workdir_base
¶ Set the base name for parallel working directories
-
workdir_index
¶ Set the working directory index for parallel runs
-