There are two categories of uncertainty:
- Epistemic (systematic) uncertainty is a repeatable uncertainty due to factors that could be known, such as limitations of a model or imprecision of measuring apparatus.
- Aleatoric (stochastic) uncertainty is uncertainty due to unknown or immeasurable factors.
Definition
Stochastic behaviour has random components, and will be largely the same with some unpredictable variability
From a set of data points , we can produce an estimate for the value of , denoted as . As this is an estimate, it is important to know how good it is. As such, we can estimate the uncertainty in , given as . We then express the estimate as .
Notation: empirical values
A hat () over a symbol denotes that it is an empirical estimate (derived from data).
These values should be computed in such a way that we have a high confidence that is in the range
Determining and
A number of Measures of Central Tendency can be used as a value for . Determining which value is best will depend on the specific dataset.
It may seem logical that the standard Measures of Spread can be good estimates for . However, while the standard deviation does give a good representation of the overall precision of the data, and does give the uncertainty of any given individual measurement, it is not the uncertainty in the mean value. As such, we introduce a new measure:
Standard Error in the Mean
The Standard Error in the Mean (SEM), or standard error, is given by the formula:
This measure gives the uncertainty in the empirical mean as an estimate of the true value. That is, the SEM is the value used as when is the empirical mean.
The SEM is also used to construct error bars on a plot - one SEM above and one below.
Accuracy and Precision
When describing gathered data, these two words have well-defined, specific definitions:
- Accuracy is the degree to which collected values are close to the true value.
- Precision is the degree to which collected values are close to each other.
This means that it is possible, perhaps counter intuitively, for a highly inaccurate dataset to be very precise. The opposites of these two properties are sometimes more relevant:
- Bias is the lack of accuracy. It is often caused by systematic errors.
- Variability is the lack of precision.
Formatting and Reporting
- Uncertainty should usually be rounded to one or two significant figures.
- The estimate’s value should never be given to a higher level of precision than the uncertainty.
- The results of a calculation or conversion should not be presented as more precise than the gathered data.
- A description of how the uncertainty was calculated should always be included.