Skip to main content

Legacy 13 TeV Data for Education

Overview

A set of proton-proton (pp) collision data was released by the ATLAS Collaboration to the public for educational purposes. The data has been collected by the ATLAS detector at the LHC at 13 TeV during the year 2016 and corresponds to an integrated luminosity of 10 fb-1. The pp collision data is accompanied by a set of MC simulated samples describing several processes which are used to model the expected distributions of different signal and background events.

  • The released samples are provided in a simplified data format, reducing the information content of the original data analysis format used within the ATLAS Collaboration.

  • The resulting format is a ROOT tuple with more than 80 branches. For those not familiar with this modular scientific software toolkit, please refer to the ROOT documentation, which provides a rich set of tutorials and code examples.

  • Several final-state collections are provided within this release. The corresponding multiplicities of final-state objects, minimum transverse momentum requirements and collection names are shown below:

Final-state categoriesLeading object pTp_T (min) [GeV]Collection name
Nl=1N_l = 1251lep
Nl1N_l \leq 1252lep
Nl=3N_l = 3253lep
Nl4N_l \leq 4254lep
NlargeRjet1N_{\mathrm{largeRjet}} \leq 1 & Nl=1N_l = 1250 (large-R jet), 25 (lepton)1largeRjet1lep
Nτhad=1N_{\tau - \mathrm{had}} = 1 & Nl=1N_l = 120 (τh\tau_h), 25 (lepton)1lep1tau
Nγ2N_{\gamma} \leq 235GamGam

Reconstructed physics objects

Several reconstructed physics objects (electrons, muons, photons, hadronically decaying tau-leptons, small-R jets, large-R jets) are contained within the 13 TeV ATLAS Open Data, and their preselection requirements are detailed below:

Electron (e)Muon (μ\mu)Photon (γ\gamma)
InDet & EMCAL rec.InDet & MS rec.InDet & EMCAL rec.
loose identificationloose identificationtight identification
loose isolationloose isolationloose isolation
pT>7p_T > 7 GeVpT>7p_T > 7 GeVET>25E_T > 25 GeV
η<2.47\|\eta\|< 2.47η<2.5\|\eta\| < 2.5η<2.37\|\eta\| < 2.37
Hadronically decaying τ\tau-leptons (τh\tau_h)Small-R jetsLarge-R jets
InDet & EMCAL rec.EMCAL & HCAL rec.EMCAL & HCAL rec.
medium identificationanti-kt, R = 0.4anti-kt, R = 1.0
pT>20p_T > 20 GeVpT>20p_T > 20 GeVpT>250p_T > 250 GeV
η<2.5\|\eta\| < 2.5η<2.5\|\eta\| < 2.5η<2.0\|\eta\| < 2.0
1 or 3 associated tracksb-tagging (MV2c10)trimming: Rsub=0.2R_{sub} = 0.2, fcut=0.05f_{cut} = 0.05

The 13 TeV ATLAS Open Data events are selected by applying several event-quality and trigger criteria, and classified according to the type and multiplicity of reconstructed objects with high transverse momentum. Several standard selection requirements, referred to as preselection, are applied to each of the reconstructed physics objects within the 13 TeV ATLAS Open Data, as detailed in the table below:

Electrons & MuonsSmall-R jetsPhotonsLarge-R jetsτh\tau_h
pT>25p_T > 25 GeVpT>25p_T > 25 GeVpT<1500p_T < 1500 GeVpT>25p_T > 25 GeV
lep_ptcone30<0.15\mathrm{lep\_ptcone30} < 0.15JVT>0.59\mathrm{JVT} > 0.59photon_ptcone30<0.065\mathrm{photon\_ptcone30} < 0.065mass>50\mathrm{mass} > 50 GeV
lep_etcone20<0.15\mathrm{lep\_etcone20} < 0.15photon_etcone20<0.065\mathrm{photon\_etcone20} < 0.065

In addition, several data quality criteria ensure that the detector was functioning properly and events are rejected if they contain reconstructed jets associated with energy deposits that can arise from hardware problems, beam-halo events or cosmic-ray showers. Furthermore, events are required to have at least one reconstructed vertex with two or more associated tracks.

Processes

The 13 TeV ATLAS Open Data set is comprised not only of pp collision data recorded with the ATLAS detector in 2016. It is accompanied by MC simulation samples describing several SM processes, which are used to model the expected distributions of different signal and background events. All simulated samples were processed through the same reconstruction algorithms and analysis chain as the data and subjected to a loose event preselection to reduce processing time.

MC simulation samples describing several Standard Model (SM) and beyond the Standard Model (BSM) processes, which are used to model the expected distributions of different signal and background processes, are included in the release.

A set of simulated SM processes includes top-quark-pair production, single-top production, production of weak bosons in association with jets (W+jets, Z+jets), production of a pair of bosons (diboson WW, WZ, ZZ) and SM Higgs production. The basic set of SM processes is complemented by simulations of BSM processes (heavy Z' and SUSY production). The description of the MC samples released in the 13 TeV ATLAS Open Data is presented below:

Top-quark production
ProcessUnique "channelNumber"Generator, hadronisationAdditional information
ttˉt\bar{t}+jets410000Powheg-Box V2 + Pythia 8 + Pythia 8only 11\ell and 22\ell decays of ttˉt\bar{t}-system
single (anti)top t-channel(410012) 410011Powheg-Box v1 + Pythia 6
single (anti)top Wt-channel(410014) 410013Powheg-Box V2 + Pythia 6
single (anti)top s-channel(410026) 410025Powheg-Box V2 + Pythia 6
W/Z (+jets) production
ProcessUnique "channelNumber"Generator, hadronisationAdditional information
Zee,μμ,ττZ \rightarrow ee, \mu\mu, \tau\tau 361100 – 361108Powheg-Box V2 + Pythia 8LO accuracy up to Njets = 1
Weν,μν,τν+jetsW \rightarrow e\nu, \mu\nu, \tau\nu + \mathrm{jets} 361500 – 361505Powheg-Box V2 + Pythia 8LO accuracy up to 3-jets final states
Zee,μμ,ττ+jetsZ \rightarrow ee, \mu\mu, \tau\tau + \mathrm{jets} 361400 – 361441Sherpa 2.2LO accuracy up to 3-jets final states
Diboson production
ProcessUnique "channelNumber"Generator, hadronisationAdditional information
WWWW363359, 363360Sherpa 2.2qqνqq' \ell\nu final states
WWWW363492Sherpa 2.2νν\ell\nu\ell'\nu ' final states
ZZZZ363356Sherpa 2.2qq+qq'\ell^{+}\ell^{-} final states
ZZZZ363490Sherpa 2.2++\ell^{+}\ell^{-}\ell^{+}\ell^{-} final states
WZWZ363358Sherpa 2.2qq+qq'\ell^{+}\ell^{-} final states
WZWZ363489Sherpa 2.2νqq\ell\nu qq' final states
WZWZ363491Sherpa 2.2ν+\ell\nu\ell^{+}\ell^{-} final states
WZWZ363493Sherpa 2.2ννν\ell\nu\nu\nu final states
SM Higgs production (m_H = 125 GeV)
ProcessUnique "channelNumber"Generator, hadronisationAdditional information
ggF,HWWggF, H \rightarrow WW 345324Powheg-Box V2 + Pythia 8νν\ell\nu\ell\nu final states
VBF,HWWVBF, H \rightarrow WW 345323Powheg-Box V2 + Pythia 8νν\ell\nu\ell\nu final states
ggF,HZZggF, H \rightarrow ZZ 345060Powheg-Box V2 + Pythia 8++\ell^{+}\ell^{-}\ell^{+}\ell^{-} final states
VBF,HZZVBF, H \rightarrow ZZ 344235Powheg-Box V2 + Pythia 8++\ell^{+}\ell^{-}\ell^{+}\ell^{-} final states
ZH,HZZZH, H \rightarrow ZZ 341947Pythia 8++\ell^{+}\ell^{-}\ell^{+}\ell^{-} final states
WH,HZZWH, H \rightarrow ZZ 341964Pythia 8++\ell^{+}\ell^{-}\ell^{+}\ell^{-} final states
ggF,HγγggF, H \rightarrow γγ 343981Powheg-Box V2 + Pythia 8γγ\gamma\gamma final states
VBF,HγγVBF, H \rightarrow γγ 345041Powheg-Box V2 + Pythia 8γγ\gamma\gamma final states
WH(ZH),HWH (ZH), H \rightarrow γγ345318, 345319Powheg-Box V2 + Pythia 8γγ\gamma\gamma final states
ttH,HγγttH, H \rightarrow γγ 341081aMC@NLO + Pythia 8γγ\gamma\gamma final states
BSM production
ProcessUnique "channelNumber"Generator, hadronisationAdditional information
ZttˉZ' \rightarrow t\bar{t}301325Pythia 8mZ=1m_{Z'} = 1 TeV
~~χ~10χ~10\tilde{\ell}\tilde{\ell}'\rightarrow \ell\tilde{\chi}^0_1 \ell' \tilde{\chi}_1^{0}{'}392985aMC@NLO + Pythia 8m~=600m_{\tilde{\ell}} = 600 GeV, mχ~10=300m_{\tilde{\chi}^0_1} = 300 GeV

General Capabilities of the Datasets

The publicly released datasets can be used for educational purposes with different levels of task difficulty.

At a beginner level, one could visualise the content of the datasets and produce simple distributions. An intermediate-level task would consist of making histograms with collision data after some basic selection. Advanced-level tasks would allow for a deeper look into the ATLAS data, with possibilities of measuring real event properties and physical quantities.

A non-exhaustive list of possible tasks with the proposed datasets include:

  • Comparisons of several distributions of event variables for simulated signal and background events.
  • Finding variables that are able to separate signal from background (jet multiplicity, transverse momenta of jets and leptons, lepton isolation, b-tagging, missing transverse energy, angular distributions).
  • Development and modification of cuts on these variables in order to enrich the signal-over-background separation.
  • Optimisation of the signal-over-background ratio and estimation of the purity based on simulation only.
  • Comparisons of the selection efficiency between data and simulation.

Advanced-level tasks might include:

  • Derivation of production cross sections and masses of objects.
  • Reconstruction of the objects (quarks or bosons) by assigning the detector physics objects (jets, leptons, missing energy) to the hypothetical decay trees.
  • Estimation of the impact of other sources of systematic uncertainties (luminosity uncertainty, b-tagging efficiency, background modelling) by adding approximate and conservative values.
  • A test-bed for new data-analysis techniques, e.g. kinematic fitting procedures, multivariate discrimination of signal from background and other machine learning tasks.

Limitations

An important aspect of the 13 TeV ATLAS Open Data is that it is prepared specifically for educational purposes. To this end, precision has been traded for simplicity of use. The simplifications are:

  • Scale factors implementing corrections for different object efficiencies are calculated using the preselection cuts. This selection does not have to coincide with the actual object selection defined by the user; therefore, discrepancies may arise due to non-matching object definitions.

  • The per-jet b-tagging scale factor (scaleFactor_BTAG) is computed for a specific working point given a specific b-tagging algorithm (MV2c10) with a 70% b-jet efficiency. In case a different operating point for the MV2c10 b-tagging algorithm is specified, this introduces a potential mismatch between data and MC simulation.

  • No data-driven estimation of the multijet background is provided, and the contributing effects of fake or non-prompt leptons may be countered using strict object definitions such as lepton identification, isolation and transverse momentum requirements. However, any residual disagreement might be understood as a sign that the multijet contribution to the electron and muon channels are not taken into account.

  • In order to provide ground for systematic-uncertainty estimation studies, but reduce large complexities, only a simplified single-component systematic-uncertainty estimate related to object transverse-momentum reconstruction is included in the datasets.

  • The current content does not support the creation of unfolded distributions or searches for signals that are not supported by signal datasets made available by the ATLAS Collaboration.