doubt.models.boot package¶
Submodules¶
doubt.models.boot.boot module¶
Bootstrap wrapper for datasets and models
- class doubt.models.boot.boot.Boot(input: object, random_seed: Optional[float] = None)¶
Bases:
object
Bootstrap wrapper for datasets and models.
Datasets can be any sequence of numeric input, from which bootstrapped statistics can be calculated, with confidence intervals included.
The models can be any model that is either callable or equipped with a predict method, such as all the models in scikit-learn, pytorch and tensorflow, and the bootstrapped model can then produce predictions with prediction intervals.
The bootstrapped prediction intervals are computed using the an extension of method from [2] which also takes validation error into account. To remedy this, the .632+ bootstrap estimate from [1] has been used. Read more in [3].
- Parameters
input (float array or model) – Either a dataset to calculate bootstrapped statistics on, or an model for which bootstrapped predictions will be computed.
random_seed (float or None) – The random seed used for bootstrapping. If set to None then no seed will be set. Defaults to None.
Examples
Compute the bootstrap distribution of the mean, with a 95% confidence interval:
>>> from doubt.datasets import FishToxicity >>> X, y = FishToxicity().split() >>> boot = Boot(y, random_seed=42) >>> boot.compute_statistic(np.mean) (4.064430616740088, array([3.97621225, 4.16582087]))
Alternatively, we can output the whole bootstrap distribution:
>>> boot.compute_statistic(np.mean, n_boots=3, return_all=True) (4.064430616740088, array([4.05705947, 4.06197577, 4.05728414]))
Wrap a scikit-learn model and get prediction intervals:
>>> from sklearn.linear_model import LinearRegression >>> from doubt.datasets import PowerPlant >>> X, y = PowerPlant().split() >>> linreg = Boot(LinearRegression(), random_seed=42) >>> linreg = linreg.fit(X, y) >>> linreg.predict([10, 30, 1000, 50], uncertainty=0.05) (481.99688920651676, array([473.50425407, 490.14061895]))
- Sources:
- [1]: Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements
of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.
- [2]: Kumar, S., & Srivistava, A. N. (2012). Bootstrap prediction
intervals in non-parametric regression with applications to anomaly detection.
[3]: https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/
- doubt.models.boot.boot.compute_statistic(self, statistic: Callable[[Sequence[Union[float, int]]], float], n_boots: Optional[int] = None, uncertainty: float = 0.05, quantiles: Optional[Sequence[float]] = None, return_all: bool = False) Union[float, Tuple[float, numpy.ndarray]] ¶
Compute bootstrapped statistic.
- Parameters
statistic (numeric array -> float) – The statistic to be computed on bootstrapped samples.
n_boots (int or None) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None
uncertainty (float) – The uncertainty used to compute the confidence interval of the bootstrapped statistic. Not used if return_all is set to True or if quantiles is not None. Defaults to 0.05.
quantiles (sequence of floats or None, optional) – List of quantiles to output, as an alternative to the uncertainty argument, and will not be used if that argument is set. If None then uncertainty is used. Defaults to None.
return_all (bool) – Whether all bootstrapped statistics should be returned instead of the confidence interval. Defaults to False.
- Returns
The statistic, and if uncertainty is set then also the confidence interval, or if quantiles is set then also the specified quantiles, or if return_all is set then also all of the bootstrapped statistics.
- Return type
a float or a pair of a float and an array of floats
- doubt.models.boot.boot.fit(self, X: Sequence[float], y: Sequence[float], n_boots: Optional[int] = None)¶
Fits the model to the data.
- Parameters
X (float array) – The array containing the data set, either of shape (f,) or (n, f), with n being the number of samples and f being the number of features.
y (float array) – The array containing the target values, of shape (n,)
n_boots (int or None) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None
- doubt.models.boot.boot.predict(self, X: Sequence[float], n_boots: Optional[int] = None, uncertainty: Optional[float] = None, quantiles: Optional[Sequence[float]] = None) Tuple[Union[float, numpy.ndarray], numpy.ndarray] ¶
Compute bootstrapped predictions.
- Parameters
X (float array) – The array containing the data set, either of shape (f,) or (n, f), with n being the number of samples and f being the number of features.
n_boots (int or None, optional) – The number of resamples to bootstrap. If None then it is set to the square root of the data set. Defaults to None
uncertainty (float or None, optional) – The uncertainty used to compute the prediction interval of the bootstrapped prediction. If None then no prediction intervals are returned. Defaults to None.
quantiles (sequence of floats or None, optional) – List of quantiles to output, as an alternative to the uncertainty argument, and will not be used if that argument is set. If None then uncertainty is used. Defaults to None.
- Returns
The bootstrapped predictions, and the confidence intervals if uncertainty is not None, or the specified quantiles if quantiles is not None.
- Return type
float array or pair of float arrays