Setting Uncertainties
One of the most important parts of any data analysis is the inclusion of proper uncertainties. Uncertanties help quantify the reliability and precision of a conclusion obtained from data.
When comparing detector data to simulations, you can see a difference that might seem significant. However, whether that difference is interesting or important requires understanding uncertainties. Agreement within uncertainties implies that the observed and predicted values are consistent. If a number is measured to be 1000 and it was predicted to be 2000±1000, then the measurement and prediction agree. Despite the measurement appearing far from the prediction, the large uncertainty range indicates that the prediction is not very precise, allowing for agreement.
Similarly, it is important not to misinterpret agreement that is better than the uncertainty suggests. If a number is measured to be 1000 and the prediction was 1000±500, that does not mean that the true value will be 1000. A more precise model might give a prediction of 600±100, which would be in consistent with the original prediction, but would no longer agree with the measurement.
A key part of scientific training is understanding when a difference between a prediction and an observation is meaningful and significant, and that comes down to understanding uncertainties.
Why Consider Uncertainties?
In ATLAS analyses we consider uncertainties for several reasons:
-
Accurate Parameter Estimation: To get reliable estimates of the parameters of interest (POIs), such as the Higgs boson couplings or the top quark mass, we need to account for all sources of uncertainty. Ignoring systematic uncertainties can lead to biased estimates and incorrect conclusions.
-
Robust Hypothesis Testing: In testing theoretical models against experimental data, systematic uncertainties ensure that discrepancies between the observed data and theoretical predictions are not mistakenly attributed to new physics or phenomena, instead that they are correctly identified as derived from known uncertainties in the experimental or theoretical setup.
-
Credible Confidence Intervals: Confidence intervals derived from the data should reflect the true level of uncertainty in the measurements. By incorporating systematic uncertainties, these intervals provide a more realistic range of values for the parameters of interest.
-
Improved Comparisons with Other Experiments: Systematic uncertainties enable more significant comparisons between results from different experiments or analyses
-
Informed Decision Making in Future Experiments: Understanding and quantifying systematic uncertainties help guide the design and improvement of future experiments. By identifying the sources of uncertainty, we can target specific areas for enhancement, such as improving detector calibration methods or refining theoretical models, reducing uncertainties in future measurements.
Types of Uncertainties
Statistical Uncertainties
Statistical uncertainties are the most intuitive, as we encounter them every day. These are the uncertainties related to using sample data to draw conclusions about a larger population. A friend might tell you, "Every time I flipped this coin, it comes up heads!" If they've only flipped the coin once, you would be (rightly) unimpressed by their claim. If they'd flipped the coin thousands of times, then there would be a much larger statistical uncertainty on their claim - and it would be much more believable.
The statistical uncertainty depends on the population being sampled, specifically on the number of events, trials, electrons, or other objects being counted.
Systematic Uncertainties
A much more complex uncertainty to calculate in any analysis is "systematic uncertainties." They come in many different forms and types, and describe many different aspects of an analysis. A high-precision analysis might have hundreds of different uncertainties applied to the final result.
At ATLAS, one of the simplest systematic uncertainties conceptually is the luminosity uncertainty. ATLAS has released the 2015 and 2016 data; the 2015 data is 3.24±0.04 fb-1, and the 2016 data is 33.40±0.30 fb-1. That means that for a physics process with a cross section of 1 fb (one femtobarn), 3.24 events are expected to appear in the 2015 data, and 33.4 events are expected to appear in the 2016 data. The exact number of proton collisions is known with an uncertainty of about 1%, which is the most precise uncertainty on the luminosity at a hadron collider to date.
Similarly, each physics process has a cross-section uncertainty. It might be expected that the cross section of top quark pair production is 834 pb, with an uncertainty of about 30 pb in the prediction. In that case, any prediction of top quark pair production might be too high or too low by some amount without being in disagreement with the data.
Modeling Systematic Uncertainty
Some of the most important and diverse systematic uncertainty are often called a "modeling systematic uncertainty." An event generator is used to model a particular physics process - say, Sherpa is used to model Z-boson production. Sherpa might not be a good model of nature, though. In order to understand how wrong that model might be, several tests can be done:
- Different event generators might be compared (e.g. Sherpa could be compared to Powheg);
- Unphysical parameters inside of Sherpa itself might be varied within some allowed ranges. Some of the parameters, like "scales", are numbers used in calculations that shouldn't affect the result - we should get the same answer no matter what scale is used. In fact, they do affect the result, so the variation in prediction when changing the scale is often included as a systematic uncertainty;
- Physical parameters inside of Sherpa might be varied within some allowed range. For example, the top quark mass might be varied within a few hundred MeV;
- The parton distribution function (PDF) used by Sherpa might be changed. PDFs are highly constrained and can only vary in specific ways; the PDFs are often provided with "uncertainty sets" that say how the PDFs should be allowed to vary exactly.
Statistical Analysis
In the eventual comparisons between detector data and simulation (often called "statistical analysis"), each of these uncertainties are included separately (each one is called a "nuisance parameter"). These nuisance parameters describe all the different ways that the simulation might be incorrect. In the open data, several different event generators' predictions for various processes are included, and internal "event weights" are provided for many other variations, so that these modeling uncertainties can be calculated.
One particularly difficult situation, which unfortunately arises quite often, is that the comparison made to calculate the uncertainty might not really give a "range" of allowed predictions. Take, for example, a simulation of vehicles on a highway. One simulation might have many different kinds of cars. A second might have many different kinds of cars and motorcycles. An uncertainty made by comparing those two could suggest that the relative number of cars and motorcycles isn't known (probably true), or that anything between a car and a motorcycle is allowed (not really true). And of course, when looking at the real vehicles, some might be trucks - which aren't included in either simulation. In the same way, Sherpa might model nature, or Powheg might, but it's not always obvious whether nature could be "part way between Sherpa and Powheg." And at the same time, it's possible that neither is a good model of nature, and some other tricks are needed to understand the uncertainties.
Validation Systematic Uncertainty
Another very common type of modeling uncertainty can be described as a "validation systematic uncertainty." In essence: the simulation and the data are compared in some set of events that are close to, but not quite the same as, the events that are included in the analysis. In a search for high-mass particles, for example, the validation might be done at slightly lower mass. If the real data and the model disagree, that discrepancy can be included as a systematic uncertainty.
Calibration Uncertainty
Another major class of systematic uncertainties are the uncertainties on the calibration of physics objects like electrons and jets. When deriving the calibrations for these objects, to ensure that they are measured correctly, a great deal of effort is spent understanding how precisely the calibration is known and determining systematic uncertainties associated with those calibrations. These can come in several different types, for example:
- Scale factor uncertainties arise when the real data and simulation differ in, for example, the efficiency with which something is identified. Perhaps in real data an electron is correctly identified 85% of the time, and in simulation it is only correctly identified 83% of the time. A scale factor is applied to the simulation to correct it to match the data, and the uncertainties on that scale factor are included as systematic uncertainties in an analysis. Common examples are lepton reconstruction, identification, isolation, and trigger scale factors and uncertainties.
- Calibration uncertainties arise when, for example, the momentum of a physics object must be calculated from measurements in the detector. For example, the momentum of a jet is calculated from measurements of charged particles in the inner detector and measurements of energy deposits in the calorimeter, among other things. These individual measurements all have uncertainties, and there are additional uncertainties in the way they are combined to produce a final momentum. All those uncertainties need to be included in an analysis.
Ensuring that all the possible variations and uncertainties have been included is quite difficult, and can require a great deal of experience. One excellent starting point is always to check a comparable data analysis and understand all the sources of uncertainty that were included for that analysis.
Combining Uncertainties
There are many ways that uncertainties, particularly systematic uncertainties, are combined in order to simplify analyses. The simplest example is when drawing a figure that compares real data to simulation: it isn't possible to show dozens of different variations of the simulation, so they are normally added in quadrature (the square root of the sum of the squares is used) and displayed as a band around the baseline prediction.
The original Higgs boson discovery paper, for example, included a figure with the distribution of the mass of the possible particle in events with four leptons:
The hashed band on the estimated background (around the red histogram in this case) is a combination of all of the uncertainties included when calculating that background.
When performing a rigorous statistical analysis, having many systematic uncertainties can cause simple practical problems - it is slow to calculate all the necessary numbers! One common approach to simplifying analysis life is to sum together small uncertainties before-hand and create "effective uncertainties". These uncertainties don't represent a single variation in particular, but a sum of several. These often solve the problem: for jets, for example, the more than 100 uncertainties can be reduced to 20 or 30 effective terms. Unfortunately, this also makes the meaning of an individual uncertainty difficult to understand. Instead of simply representing "the uncertainty on the jet momentum from mis-modeling the charged particle reconstruction efficiency", for example, they represent some purely mathematical construct that combines various uncertainties.
Particularly for physics object uncertainties, therefore, while some of them have clear names that can be easily understood, some of them may simply not represent something physical and have a name like "effective nuisance parameter number 3".
Uncertainties in Results
Althought uncertainties might all be combined together for displaying in a figure, when doing a statistical analysis to compare real data and simulation, each individual term is treated with care. The reason is simple: a combined band doesn't properly capture whether the real data and simulation agree. Take, for example, a measurement of red and green balls. A simulation might expect 10 red balls and 10 green balls, or 5 and 15, or 15 and 5, but always 20 total balls. If in fact 13 red and 14 green balls are observed, the number of red or green balls would be perfectly allowed by the simulation, but both those numbers at once would not be.
The most common way that statistical analysis is done is through a "likelihood" - a statistical formalism that lets us express how likely the data are, given the simulation available and the uncertainties that are included. One of the key numbers that can be extracted from the likelihood is called a "p-value"; very simply, it expresses how likely something is. A p-value can be calculated to express how likely it is that the real data and simulation match, for example. In the case of the Higgs boson discovery, this is what the p-value looked like:
The very low value around 126 GeV was the indication that the data could not be explained by the existing simulations without including the Higgs boson. Separate simulations then were needed to prove that the real data did match simulations including the Higgs boson, of course.
Practical Uncertainties
With the open data for research, all the tools are available to calculate hundreds of different systematic uncertainties for any analysis. Very often, a good physicist will be able to say in advance, "These are the uncertainties that are going to be most important for this analysis." Often times, very small uncertainties are even neglected - much like if you weigh yourself each day, you might neglect whether you're wearing socks or not each day. Of course, as analyses become more precise, more uncertainties matter. Moreover, when making particularly important claims, like the discovery of a new particle, it can be important to check every conceivable source of uncertainty to be absolutely sure that nothing was forgotten, missed, or too-quickly rejected.