SampleSet¶
-
class
matk.sampleset.
SampleSet
(name, samples, parent, index_start=1, **kwargs)¶ MATK SampleSet class - Stores information related to a sample including parameter samples, associated responses, and sample indices
-
corner
(bins=20, range=None, weights=None, color='k', smooth=None, smooth1d=None, labels=None, label_kwargs=None, show_titles=False, title_fmt='.2f', title_kwargs=None, truths=None, truth_color='#4682b4', scale_hist=False, quantiles=None, verbose=False, fig=None, max_n_ticks=5, top_ticks=False, use_math_text=False, hist_kwargs=None, **hist2d_kwargs)¶ Plot corner plot using the corner package written by Dan Foreman-Mackey (https://pypi.python.org/pypi/corner/1.0.0)
-
corr
(type='pearson', plot=False, printout=True, plotvals=True, figsize=None, title=None, xrotation=0, filename=None, adjust_dict=None)¶ Calculate correlation coefficients of parameters and responses
Parameters: - type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
- plot (bool) – If True, plot correlation matrix
- printout (bool) – If True, print correlation matrix with row and column headings
- plotvals (bool) – If True, print correlation coefficients on plot matrix
- figsize (tuple(fl64,fl64)) – Width and height of figure in inches
- title (str) – Title of plot
- xrotation – Rotation for x axis tick labels
- filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
- adjust_dict (dict) – Dictionary of kwargs to pass to plt.subplots_adjust. Keys and defaults are: left = 0.125, right = 0.9, bottom = 0.1, top = 0.9, wspace = 0.2, hspace = 0.2
Returns: ndarray(fl64) – Correlation coefficients
-
index_start
¶ Starting integer value for sample indices
-
indices
¶ Array of sample indices
-
main_effects
()¶ For each parameter, compile array of main effects.
-
mean
(pretty_print=False)¶ Mean of samples
-
name
¶ Sample set name
-
obsnames
¶ Array of observation names
-
panels
(type='pearson', alpha=0.2, figsize=None, title=None, tight=False, symbol='.', fontsize=None, corrfontsize=None, ms=5, mins=None, maxs=None, frequency=False, bins=10, ylim=None, labels=[], filename=None, xticks=2, yticks=2, color=None, cmap=None, edgecolors='face')¶ Plot histograms, scatterplots, and correlation coefficients in paired matrix
Parameters: - type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
- alpha (float) – Histogram color shading
- figsize (tuple(fl64,fl64)) – Width and height of figure in inches
- title (str) – Title of plot
- tight (bool) – Use matplotlib tight layout
- symbol (str) – matplotlib symbol for scatterplots
- fontsize (fl64) – Size of font for axis labels
- corrfontsize (fl64) – Size of font for correlation coefficients
- ms (fl64) – Scatterplot marker size
- frequency (bool) – If True, the first element of the return tuple will be the counts normalized by the length of data, i.e., n/len(x)
- bins (int) – Number of bins in histograms
- ylim (tuples - 2 element tuples with y limits for histograms) – y-axis limits for histograms.
- labels (lst(str)) – Names to use instead of parameter names in plot
- filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
- xticks (int) – Number of ticks along x axes
- yticks (int) – Number of ticks along y axes
- color (str) – Name of parameter of observation to color points in colorplots by
- cmap (matplotlib.colors.Colormap) – Colormap for color option
- edgecolors (str) – Color of edges of markers in scatterplots
-
pardict
(index)¶ Get parameter dictionary for sample with specified index
Parameters: index (int) – Sample index Returns: dict(fl64)
-
parnames
¶ Array of observation names
-
percentile
(pct, interpolation='linear', pretty_print=False)¶ Percentile of samples
Parameters: - pct (fl64 or lst[fl64]) – Percentile in range [0,100] or list of percentiles
- interpolation (str - {'linear', 'lower', 'higher', 'midpoint', 'nearest'}) – Interpolation method to use when quantile lies between data points
- pretty_print (bool) – If True, print with row and column headings
Returns: ndarray(fl64)
-
rank_parameter_frequencies
()¶ Yields a printout of parameter value frequencies in the sample set
- returns An array of tuples, each containing the parameter name tagged as min or max and a
- second tuple containing the parameter value and the frequency of its appearance in the sample set.
-
rbd_fast
(obsname='sse', M=10, print_to_console=True, problem={})¶ Perform RBD_Fast analysis on model output. This assumes that the sampleset has been run so that responses have been generated. This method calls functionality from the SALib package.
Parameters: - obsname (str) – Name of observation to perform analysis on. The default is to use the sum-of-squared errors of all observations. This requires that observation values were designated. An individual observation name can be used instead.
- M (int) – The interference parameter, i.e., the number of harmonics to sum in the Fourier series decomposition
- print_to_console (bool) – Print results directly to console
- problem (dict) – Dictionary of model attributes used by sampler. For example, dictionary with a list with keyname ‘groups’ containing a list of length of the number of parameters with parameter group names can be used to group parameters with similar effects on the observation. This will reduce the number of samples required.
Returns: Dictionary of rbd_fast analysis results
-
recarray
¶ Structured (record) array of samples
-
run
(cpus=1, workdir_base=None, save=True, reuse_dirs=False, outfile=None, logfile=None, restart_logfile=None, verbose=True, hosts={})¶ Run model using values in samples for parameter values If samples are not specified, LHS samples are produced
Parameters: - cpus (int,dict(lst)) – number of cpus; alternatively, dictionary of lists of processor ids keyed by hostnames to run models on (i.e. on a cluster); hostname provided as kwarg to model (hostname=<hostname>); processor id provided as kwarg to model (processor=<processor id>)
- workdir_base (str) – Base name for model run folders, run index is appended to workdir_base
- save (bool) – If True, model files and folders will not be deleted during parallel model execution
- reuse_dirs (bool) – Will use existing directories if True, will return an error if False and directory exists
- outfile (str) – File to write results to
- logfile (str) – File to write details of run to during execution
- restart_logfile (str) – Existing logfile containing completed runs, used to complete an incomplete sampling; Warning: sample indices are expected to match!
- verbose (bool or str) – Prints results as calculated if True and Display progress bar in console if ‘progress’
- hosts (lst(str)) – Option deprecated, use cpus instead
Returns: tuple(ndarray(fl64),ndarray(fl64)) - (Matrix of responses from sampled model runs siz rows by npar columns, Parameter samples, same as input samples if provided)
-
savestats
(outfile, q=[2.5, 5.0, 50.0, 95.0, 97.5], interpolation='linear')¶ Save statistical measures of sampleset to file
Parameters: - outfile (str) – Name of file
- q (fl64 or lst[fl64]) – percentile or list of percentiles to compute
- interpolation (str - {'linear', 'lower', 'higher', 'midpoint', 'nearest'}) – Interpolation method to use when quantile lies between data points
-
savetxt
(outfile, sse=False)¶ Save sampleset to file
Parameters: - outfile (str) – Name of file where sampleset will be written
- sse (bool) – Print out sum-of-squared-errors instead of observations
-
sobol
(obsname='sse', calc_second_order=True, print_to_console=True, num_resamples=100, conf_level=0.95, problem={})¶ Perform Sobol analysis on model output. This requires that the sampleset is a Saltelli sample and has been run. This method calls functionality from the SALib package.
Parameters: - obsname (str) – Name of observation to perform analysis on. The default is to use the sum-of-squared errors of all observations. This requires that observation values were designated. An individual observation name can be used instead.
- calc_second_order (bool) – Calculate second-order sensitivities
- num_resamples (int) – The number of resamples
- conf_level (flt) – The confidence interval level
- print_to_console (bool) – Print results directly to console
- problem (dict) – Dictionary of model attributes used by sampler. For example, dictionary with a list with keyname ‘groups’ containing a list of length of the number of parameters with parameter group names can be used to group parameters with similar effects on the observation. This will reduce the number of samples required.
Returns: Dictionary of sobol analysis results
-
sse
(group=None)¶ Sum of squared errors (sse) for all samples
Parameters: group (str) – Group name of observations; if not None, sse for observation group will be returned
-
std
(pretty_print=False)¶ Standard deviation of samples
-
subset
(boolfcn, field, *args, **kwargs)¶ Collect subset of samples based on parameter or response values, remove all others
Parameters: - boofcn – Function that returns true for samples to keep and false for samples to remove
- field (str) – Name of parameter or observations to apply boolfcn to
- args – Additional arguments to add to boolfcn
- kwargs – Keyword arguments to add to boolfcn
-
var
(pretty_print=False)¶ Variance of samples
-
-
class
matk.sampleset.
DataSet
(samples, names, mins=None, maxs=None)¶ MATK DataSet class used by SampleSet class to store samples (parameter combinations; ‘SampleSet.samples’) and responses (model outputs; ‘SampleSet.responses’)
-
corner
(bins=20, range=None, weights=None, color='k', smooth=None, smooth1d=None, labels=None, label_kwargs=None, show_titles=False, title_fmt='.2f', title_kwargs=None, truths=None, truth_color='#4682b4', scale_hist=False, quantiles=None, verbose=False, fig=None, max_n_ticks=5, top_ticks=False, use_math_text=False, hist_kwargs=None, **hist2d_kwargs)¶ Plot corner plot using the corner package written by Dan Foreman-Mackey (https://pypi.python.org/pypi/corner/1.0.0)
-
corr
(type='pearson', plot=False, printout=True, plotvals=True, figsize=None, title=None, xrotation=0, filename=None, adjust_dict=None)¶ Calculate correlation coefficients of dataset values
Parameters: - type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
- plot (bool) – If True, plot correlation matrix
- plotvals (bool) – If True, print correlation coefficients on plot matrix
- printout (bool) – If True, print correlation matrix with row and column headings
- figsize (tuple(fl64,fl64)) – Width and height of figure in inches
- title (str) – Title of plot
- xrotation – Rotation for x axis tick labels
- filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
- adjust_dict (dict) – Dictionary of kwargs to pass to plt.subplots_adjust. Keys and defaults are: left = 0.125, right = 0.9, bottom = 0.1, top = 0.9, wspace = 0.2, hspace = 0.2
Returns: ndarray(fl64) – Correlation coefficients
-
hist
(ncols=4, alpha=0.2, figsize=None, title=None, tight=False, mins=None, maxs=None, frequency=False, bins=10, ylim=None, printout=True, labels=[], filename=None, fontsize=None, xticks=3)¶ Plot histograms of dataset
Parameters: - ncols (int) – Number of columns in plot matrix
- alpha (float) – Histogram color shading
- figsize (tuple(fl64,fl64)) – Width and height of figure in inches
- title (str) – Title of plot
- tight (bool) – Use matplotlib tight layout
- frequency (bool) – If True, the first element of the return tuple will be the counts normalized by the length of data, i.e., n/len(x)
- bins (int) – Number of bins in histograms
- ylim (tuple - 2 element tuple with y limits for histograms) – y-axis limits for histograms.
- printout (bool) – If True, histogram values are printed to the terminal
- labels (lst(str)) – Names to use instead of parameter names in plot
- filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
- fontsize (fl64) – Size of font
- xticks (int) – Number of ticks along x axes
Returns: dict(lst(int),lst(fl64)) - dictionary of histogram data (counts,bins) keyed by name
-
mean
(pretty_print=False)¶ Mean of samples
-
names
¶ Array of parameter names
-
panels
(type='pearson', alpha=0.2, figsize=None, title=None, tight=False, symbol='.', fontsize=None, corrfontsize=None, ms=5, mins=None, maxs=None, frequency=False, bins=10, ylim=None, labels=[], filename=None, xticks=2, yticks=2, color=None, cmap=None, edgecolors='face')¶ Plot histograms, scatterplots, and correlation coefficients in paired matrix
Parameters: - type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
- alpha (float) – Histogram color shading
- figsize (tuple(fl64,fl64)) – Width and height of figure in inches
- title (str) – Title of plot
- tight (bool) – Use matplotlib tight layout
- symbol (str) – matplotlib symbol for scatterplots
- corrfontsize (fl64) – Size of font for correlation coefficients
- fontsize (fl64) – Size of font for axis labels
- ms (fl64) – Scatterplot marker size
- frequency (bool) – If True, the first element of the return tuple will be the counts normalized by the length of data, i.e., n/len(x)
- bins (int or lst(lst(int))) – If an integer is given, bins + 1 bin edges are returned. Unequally spaced bins are supported if bins is a list of sequences for each histogram.
- ylim (tuple - 2 element tuples with y limits for histograms) – y-axis limits for histograms.
- labels (lst(str)) – Names to use instead of parameter names in plot
- filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
- xticks (int) – Number of ticks along x axes
- yticks (int) – Number of ticks along y axes
- color (str) – Name of parameter of observation to color points in colorplots by
- cmap (matplotlib.colors.Colormap) – Colormap for color option
- edgecolors (str) – Color of edges of markers in scatterplots
-
percentile
(pct, interpolation='linear', pretty_print=False)¶ Percentile of samples
Parameters: - pct (fl64 or lst[fl64]) – Percentile in range [0,100] or list of percentiles
- interpolation (str - {'linear', 'lower', 'higher', 'midpoint', 'nearest'}) – Interpolation method to use when quantile lies between data points
- pretty_print (bool) – If True, print with row and column headings
Returns: ndarray(fl64)
-
recarray
¶ Structured (record) array of samples
-
savestats
(outfile, q=[2.5, 5.0, 50.0, 95.0, 97.5], interpolation='linear')¶ Save statistical measures to file
Parameters: - outfile (str) – Name of file
- q (fl64 or lst[fl64]) – percentile or list of percentiles to compute
- interpolation (str - {'linear', 'lower', 'higher', 'midpoint', 'nearest'}) – Interpolation method to use when quantile lies between data points
-
std
(pretty_print=False)¶ Standard deviation of samples
-
values
¶ Ndarray of parameter samples, rows are samples, columns are parameters in order of MATKobject.parlist
-
var
(pretty_print=False)¶ Variance of samples
-