doubt.models.boot package

Submodules

doubt.models.boot.boot module

Bootstrap wrapper for datasets and models

class doubt.models.boot.boot.Boot(input: object, random_seed: Optional[float] = None)

Bases: object

Bootstrap wrapper for datasets and models.

Datasets can be any sequence of numeric input, from which bootstrapped statistics can be calculated, with confidence intervals included.

The models can be any model that is either callable or equipped with a predict method, such as all the models in scikit-learn, pytorch and tensorflow, and the bootstrapped model can then produce predictions with prediction intervals.

The bootstrapped prediction intervals are computed using the an extension of method from [2] which also takes validation error into account. To remedy this, the .632+ bootstrap estimate from [1] has been used. Read more in [3].

Parameters
  • input (float array or model) – Either a dataset to calculate bootstrapped statistics on, or an model for which bootstrapped predictions will be computed.

  • random_seed (float or None) – The random seed used for bootstrapping. If set to None then no seed will be set. Defaults to None.

Examples

Compute the bootstrap distribution of the mean, with a 95% confidence interval:

>>> from doubt.datasets import FishToxicity
>>> X, y = FishToxicity().split()
>>> boot = Boot(y, random_seed=42)
>>> boot.compute_statistic(np.mean)
(4.064430616740088, array([3.97621225, 4.16582087]))

Alternatively, we can output the whole bootstrap distribution:

>>> boot.compute_statistic(np.mean, n_boots=3, return_all=True)
(4.064430616740088, array([4.05705947, 4.06197577, 4.05728414]))

Wrap a scikit-learn model and get prediction intervals:

>>> from sklearn.linear_model import LinearRegression
>>> from doubt.datasets import PowerPlant
>>> X, y = PowerPlant().split()
>>> linreg = Boot(LinearRegression(), random_seed=42)
>>> linreg = linreg.fit(X, y)
>>> linreg.predict([10, 30, 1000, 50], uncertainty=0.05)
(481.99688920651676, array([473.50425407, 490.14061895]))
Sources:
[1]: Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements

of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

[2]: Kumar, S., & Srivistava, A. N. (2012). Bootstrap prediction

intervals in non-parametric regression with applications to anomaly detection.

[3]: https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/

doubt.models.boot.boot.compute_statistic(self, statistic: Callable[[Sequence[Union[float, int]]], float], n_boots: Optional[int] = None, uncertainty: float = 0.05, quantiles: Optional[Sequence[float]] = None, return_all: bool = False) Union[float, Tuple[float, numpy.ndarray]]

Compute bootstrapped statistic.

Parameters
  • statistic (numeric array -> float) – The statistic to be computed on bootstrapped samples.

  • n_boots (int or None) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None

  • uncertainty (float) – The uncertainty used to compute the confidence interval of the bootstrapped statistic. Not used if return_all is set to True or if quantiles is not None. Defaults to 0.05.

  • quantiles (sequence of floats or None, optional) – List of quantiles to output, as an alternative to the uncertainty argument, and will not be used if that argument is set. If None then uncertainty is used. Defaults to None.

  • return_all (bool) – Whether all bootstrapped statistics should be returned instead of the confidence interval. Defaults to False.

Returns

The statistic, and if uncertainty is set then also the confidence interval, or if quantiles is set then also the specified quantiles, or if return_all is set then also all of the bootstrapped statistics.

Return type

a float or a pair of a float and an array of floats

doubt.models.boot.boot.fit(self, X: Sequence[float], y: Sequence[float], n_boots: Optional[int] = None)

Fits the model to the data.

Parameters
  • X (float array) – The array containing the data set, either of shape (f,) or (n, f), with n being the number of samples and f being the number of features.

  • y (float array) – The array containing the target values, of shape (n,)

  • n_boots (int or None) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None

doubt.models.boot.boot.predict(self, X: Sequence[float], n_boots: Optional[int] = None, uncertainty: Optional[float] = None, quantiles: Optional[Sequence[float]] = None) Tuple[Union[float, numpy.ndarray], numpy.ndarray]

Compute bootstrapped predictions.

Parameters
  • X (float array) – The array containing the data set, either of shape (f,) or (n, f), with n being the number of samples and f being the number of features.

  • n_boots (int or None, optional) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None

  • uncertainty (float or None, optional) – The uncertainty used to compute the prediction interval of the bootstrapped prediction. If None then no prediction intervals are returned. Defaults to None.

  • quantiles (sequence of floats or None, optional) – List of quantiles to output, as an alternative to the uncertainty argument, and will not be used if that argument is set. If None then uncertainty is used. Defaults to None.

Returns

The bootstrapped predictions, and the confidence intervals if uncertainty is not None, or the specified quantiles if quantiles is not None.

Return type

float array or pair of float arrays

Module contents