The 13 TeV Data
❗ For detailed information about this release, you can read "Review of the 13 TeV ATLAS Open Data release."
A set of proton-proton (pp) collision data was released by the ATLAS Collaboration to the public for educational purposes. The data has been collected by the ATLAS detector at the LHC at 13 TeV during the year 2016 and corresponds to an integrated luminosity of 10 fb-1. The pp collision data is accompanied by a set of MC simulated samples describing several processes which are used to model the expected distributions of different signal and background events.
-
The released samples are provided in a simplified data format, reducing the information content of the original data analysis format used within the ATLAS Collaboration.
-
The resulting format is a ROOT tuple with more than 80 branches. For those not familiar with this modular scientific software toolkit, please refer to the ROOT documentation, which provides a rich set of tutorials and code examples.
-
Several final-state collections are provided within this release. The corresponding multiplicities of final-state objects, minimum transverse momentum requirements and collection names are shown below:
Final-state categories | Leading object (min) [GeV] | Collection name |
---|---|---|
25 | 1lep | |
25 | 2lep | |
25 | 3lep | |
25 | 4lep | |
& | 250 (large-R jet), 25 (lepton) | 1largeRjet1lep |
& | 20 (), 25 (lepton) | 1lep1tau |
35 | GamGam |
Reconstructed physics objects
Several reconstructed physics objects (electrons, muons, photons, hadronically decaying tau-leptons, small-R jets, large-R jets) are contained within the 13 TeV ATLAS Open Data, and their preselection requirements are detailed below:
Electron (e) | Muon () | Photon () |
---|---|---|
InDet & EMCAL rec. | InDet & MS rec. | InDet & EMCAL rec. |
loose identification | loose identification | tight identification |
loose isolation | loose isolation | loose isolation |
GeV | GeV | GeV |
Hadronically decaying -leptons () | Small-R jets | Large-R jets |
---|---|---|
InDet & EMCAL rec. | EMCAL & HCAL rec. | EMCAL & HCAL rec. |
medium identification | anti-kt, R = 0.4 | anti-kt, R = 1.0 |
GeV | GeV | GeV |
1 or 3 associated tracks | b-tagging (MV2c10) | trimming: , |
The 13 TeV ATLAS Open Data events are selected by applying several event-quality and trigger criteria, and classified according to the type and multiplicity of reconstructed objects with high transverse momentum. Several standard selection requirements, referred to as preselection, are applied to each of the reconstructed physics objects within the 13 TeV ATLAS Open Data, as detailed in the table below:
Electrons & Muons | Small-R jets | Photons | Large-R jets | |
---|---|---|---|---|
GeV | GeV | GeV | GeV | |
GeV | ||||
In addition, several data quality criteria ensure that the detector was functioning properly and events are rejected if they contain reconstructed jets associated with energy deposits that can arise from hardware problems, beam-halo events or cosmic-ray showers. Furthermore, events are required to have at least one reconstructed vertex with two or more associated tracks.
Processes
The 13 TeV ATLAS Open Data set is comprised not only of pp collision data recorded with the ATLAS detector in 2016. It is accompanied by MC simulation samples describing several SM processes, which are used to model the expected distributions of different signal and background events. All simulated samples were processed through the same reconstruction algorithms and analysis chain as the data and subjected to a loose event preselection to reduce processing time.
MC simulation samples describing several Standard Model (SM) and beyond the Standard Model (BSM) processes, which are used to model the expected distributions of different signal and background processes, are included in the release.
A set of simulated SM processes includes top-quark-pair production, single-top production, production of weak bosons in association with jets (W+jets, Z+jets), production of a pair of bosons (diboson WW, WZ, ZZ) and SM Higgs production. The basic set of SM processes is complemented by simulations of BSM processes (heavy Z' and SUSY production). The description of the MC samples released in the 13 TeV ATLAS Open Data is presented below:
Top-quark production
Process | Unique "channelNumber" | Generator, hadronisation | Additional information |
---|---|---|---|
+jets | 410000 | Powheg-Box V2 + Pythia 8 + Pythia 8 | only and decays of -system |
single (anti)top t-channel | (410012) 410011 | Powheg-Box v1 + Pythia 6 | |
single (anti)top Wt-channel | (410014) 410013 | Powheg-Box V2 + Pythia 6 | |
single (anti)top s-channel | (410026) 410025 | Powheg-Box V2 + Pythia 6 |
W/Z (+jets) production
Process | Unique "channelNumber" | Generator, hadronisation | Additional information |
---|---|---|---|
361100 – 361108 | Powheg-Box V2 + Pythia 8 | LO accuracy up to Njets = 1 | |
361500 – 361505 | Powheg-Box V2 + Pythia 8 | LO accuracy up to 3-jets final states | |
361400 – 361441 | Sherpa 2.2 | LO accuracy up to 3-jets final states |
Diboson production
Process | Unique "channelNumber" | Generator, hadronisation | Additional information |
---|---|---|---|
363359, 363360 | Sherpa 2.2 | final states | |
363492 | Sherpa 2.2 | final states | |
363356 | Sherpa 2.2 | final states | |
363490 | Sherpa 2.2 | final states | |
363358 | Sherpa 2.2 | final states | |
363489 | Sherpa 2.2 | final states | |
363491 | Sherpa 2.2 | final states | |
363493 | Sherpa 2.2 | final states |
SM Higgs production (m_H = 125 GeV)
Process | Unique "channelNumber" | Generator, hadronisation | Additional information |
---|---|---|---|
345324 | Powheg-Box V2 + Pythia 8 | final states | |
345323 | Powheg-Box V2 + Pythia 8 | final states | |
345060 | Powheg-Box V2 + Pythia 8 | final states | |
344235 | Powheg-Box V2 + Pythia 8 | final states | |
341947 | Pythia 8 | final states | |
341964 | Pythia 8 | final states | |
343981 | Powheg-Box V2 + Pythia 8 | final states | |
345041 | Powheg-Box V2 + Pythia 8 | final states | |
γγ | 345318, 345319 | Powheg-Box V2 + Pythia 8 | final states |
341081 | aMC@NLO + Pythia 8 | final states |
BSM production
Process | Unique "channelNumber" | Generator, hadronisation | Additional information |
---|---|---|---|
301325 | Pythia 8 | TeV | |
392985 | aMC@NLO + Pythia 8 | GeV, GeV |
General Capabilities of the Datasets
The publicly released datasets can be used for educational purposes with different levels of task difficulty.
At a beginner level, one could visualise the content of the datasets and produce simple distributions. An intermediate-level task would consist of making histograms with collision data after some basic selection. Advanced-level tasks would allow for a deeper look into the ATLAS data, with possibilities of measuring real event properties and physical quantities.
A non-exhaustive list of possible tasks with the proposed datasets include:
- Comparisons of several distributions of event variables for simulated signal and background events.
- Finding variables that are able to separate signal from background (jet multiplicity, transverse momenta of jets and leptons, lepton isolation, b-tagging, missing transverse energy, angular distributions).
- Development and modification of cuts on these variables in order to enrich the signal-over-background separation.
- Optimisation of the signal-over-background ratio and estimation of the purity based on simulation only.
- Comparisons of the selection efficiency between data and simulation.
Advanced-level tasks might include:
- Derivation of production cross sections and masses of objects.
- Reconstruction of the objects (quarks or bosons) by assigning the detector physics objects (jets, leptons, missing energy) to the hypothetical decay trees.
- Estimation of the impact of other sources of systematic uncertainties (luminosity uncertainty, b-tagging efficiency, background modelling) by adding approximate and conservative values.
- A test-bed for new data-analysis techniques, e.g. kinematic fitting procedures, multivariate discrimination of signal from background and other machine learning tasks.
Limitations
An important aspect of the 13 TeV ATLAS Open Data is that it is prepared specifically for educational purposes. To this end, precision has been traded for simplicity of use. The simplifications are:
-
Scale factors implementing corrections for different object efficiencies are calculated using the preselection cuts. This selection does not have to coincide with the actual object selection defined by the user; therefore, discrepancies may arise due to non-matching object definitions.
-
The per-jet b-tagging scale factor (scaleFactor_BTAG) is computed for a specific working point given a specific b-tagging algorithm (MV2c10) with a 70% b-jet efficiency. In case a different operating point for the MV2c10 b-tagging algorithm is specified, this introduces a potential mismatch between data and MC simulation.
-
No data-driven estimation of the multijet background is provided, and the contributing effects of fake or non-prompt leptons may be countered using strict object definitions such as lepton identification, isolation and transverse momentum requirements. However, any residual disagreement might be understood as a sign that the multijet contribution to the electron and muon channels are not taken into account.
-
In order to provide ground for systematic-uncertainty estimation studies, but reduce large complexities, only a simplified single-component systematic-uncertainty estimate related to object transverse-momentum reconstruction is included in the datasets.
-
The current content does not support the creation of unfolded distributions or searches for signals that are not supported by signal datasets made available by the ATLAS Collaboration.