The 13 TeV 2020 Data

note

❗ For detailed information about this release, you can read "Review of the 13 TeV ATLAS Open Data release."

A set of proton-proton (pp) collision data was released by the ATLAS Collaboration to the public for educational purposes. The data has been collected by the ATLAS detector at the LHC at 13 TeV during the year 2016 and corresponds to an integrated luminosity of 10 fb^-1. The pp collision data is accompanied by a set of MC simulated samples describing several processes which are used to model the expected distributions of different signal and background events.

Explore the 13 TeV Data for Education

The released samples are provided in a simplified data format, reducing the information content of the original data analysis format used within the ATLAS Collaboration.
The resulting format is a ROOT tuple with more than 80 branches. For those not familiar with this modular scientific software toolkit, please refer to the ROOT documentation, which provides a rich set of tutorials and code examples.
Several final-state collections are provided within this release. The corresponding multiplicities of final-state objects, minimum transverse momentum requirements and collection names are shown below:

Final-state categories	Leading object $p_T$ (min) [GeV]	Collection name
$N_l = 1$	25	1lep
$N_l \leq 1$	25	2lep
$N_l = 3$	25	3lep
$N_l \leq 4$	25	4lep
$N_{\mathrm{largeRjet}} \leq 1$ & $N_l = 1$	250 (large-R jet), 25 (lepton)	1largeRjet1lep
$N_{\tau - \mathrm{had}} = 1$ & $N_l = 1$	20 ( $\tau_h$ ), 25 (lepton)	1lep1tau
$N_{\gamma} \leq 2$	35	GamGam

Reconstructed physics objects

Several reconstructed physics objects (electrons, muons, photons, hadronically decaying tau-leptons, small-R jets, large-R jets) are contained within the 13 TeV ATLAS Open Data, and their preselection requirements are detailed below:

Electron (e)	Muon ( $\mu$ )	Photon ( $\gamma$ )
InDet & EMCAL rec.	InDet & MS rec.	InDet & EMCAL rec.
loose identification	loose identification	tight identification
loose isolation	loose isolation	loose isolation
$p_T > 7$ GeV	$p_T > 7$ GeV	$E_T > 25$ GeV
$\\|\eta\\|< 2.47$	$\\|\eta\\| < 2.5$	$\\|\eta\\| < 2.37$

Hadronically decaying $\tau$ -leptons ( $\tau_h$ )	Small-R jets	Large-R jets
InDet & EMCAL rec.	EMCAL & HCAL rec.	EMCAL & HCAL rec.
medium identification	anti-kt, R = 0.4	anti-kt, R = 1.0
$p_T > 20$ GeV	$p_T > 20$ GeV	$p_T > 250$ GeV
$\\|\eta\\| < 2.5$	$\\|\eta\\| < 2.5$	$\\|\eta\\| < 2.0$
1 or 3 associated tracks	b-tagging (MV2c10)	trimming: $R_{sub} = 0.2$ , $f_{cut} = 0.05$

The 13 TeV ATLAS Open Data events are selected by applying several event-quality and trigger criteria, and classified according to the type and multiplicity of reconstructed objects with high transverse momentum. Several standard selection requirements, referred to as preselection, are applied to each of the reconstructed physics objects within the 13 TeV ATLAS Open Data, as detailed in the table below:

Electrons & Muons	Small-R jets	Photons	Large-R jets	$\tau_h$
$p_T > 25$ GeV	$p_T > 25$ GeV		$p_T < 1500$ GeV	$p_T > 25$ GeV
$\mathrm{lep\_ptcone30} < 0.15$	$\mathrm{JVT} > 0.59$	$\mathrm{photon\_ptcone30} < 0.065$	$\mathrm{mass} > 50$ GeV
$\mathrm{lep\_etcone20} < 0.15$		$\mathrm{photon\_etcone20} < 0.065$

In addition, several data quality criteria ensure that the detector was functioning properly and events are rejected if they contain reconstructed jets associated with energy deposits that can arise from hardware problems, beam-halo events or cosmic-ray showers. Furthermore, events are required to have at least one reconstructed vertex with two or more associated tracks.

Processes

The 13 TeV ATLAS Open Data set is comprised not only of pp collision data recorded with the ATLAS detector in 2016. It is accompanied by MC simulation samples describing several SM processes, which are used to model the expected distributions of different signal and background events. All simulated samples were processed through the same reconstruction algorithms and analysis chain as the data and subjected to a loose event preselection to reduce processing time.

MC simulation samples describing several Standard Model (SM) and beyond the Standard Model (BSM) processes, which are used to model the expected distributions of different signal and background processes, are included in the release.

A set of simulated SM processes includes top-quark-pair production, single-top production, production of weak bosons in association with jets (W+jets, Z+jets), production of a pair of bosons (diboson WW, WZ, ZZ) and SM Higgs production. The basic set of SM processes is complemented by simulations of BSM processes (heavy Z' and SUSY production). The description of the MC samples released in the 13 TeV ATLAS Open Data is presented below:

Top-quark production

Process	Unique "channelNumber"	Generator, hadronisation	Additional information
$t\bar{t}$ +jets	410000	Powheg-Box V2 + Pythia 8 + Pythia 8	only $1\ell$ and $2\ell$ decays of $t\bar{t}$ -system
single (anti)top t-channel	(410012) 410011	Powheg-Box v1 + Pythia 6
single (anti)top Wt-channel	(410014) 410013	Powheg-Box V2 + Pythia 6
single (anti)top s-channel	(410026) 410025	Powheg-Box V2 + Pythia 6

W/Z (+jets) production

Process	Unique "channelNumber"	Generator, hadronisation	Additional information
$Z \rightarrow ee, \mu\mu, \tau\tau$	361100 – 361108	Powheg-Box V2 + Pythia 8	LO accuracy up to Njets = 1
$W \rightarrow e\nu, \mu\nu, \tau\nu + \mathrm{jets}$	361500 – 361505	Powheg-Box V2 + Pythia 8	LO accuracy up to 3-jets final states
$Z \rightarrow ee, \mu\mu, \tau\tau + \mathrm{jets}$	361400 – 361441	Sherpa 2.2	LO accuracy up to 3-jets final states

Diboson production

Process	Unique "channelNumber"	Generator, hadronisation	Additional information
$WW$	363359, 363360	Sherpa 2.2	$qq' \ell\nu$ final states
$WW$	363492	Sherpa 2.2	$\ell\nu\ell'\nu '$ final states
$ZZ$	363356	Sherpa 2.2	$qq'\ell^{+}\ell^{-}$ final states
$ZZ$	363490	Sherpa 2.2	$\ell^{+}\ell^{-}\ell^{+}\ell^{-}$ final states
$WZ$	363358	Sherpa 2.2	$qq'\ell^{+}\ell^{-}$ final states
$WZ$	363489	Sherpa 2.2	$\ell\nu qq'$ final states
$WZ$	363491	Sherpa 2.2	$\ell\nu\ell^{+}\ell^{-}$ final states
$WZ$	363493	Sherpa 2.2	$\ell\nu\nu\nu$ final states

SM Higgs production (m_H = 125 GeV)

Process	Unique "channelNumber"	Generator, hadronisation	Additional information
$ggF, H \rightarrow WW$	345324	Powheg-Box V2 + Pythia 8	$\ell\nu\ell\nu$ final states
$VBF, H \rightarrow WW$	345323	Powheg-Box V2 + Pythia 8	$\ell\nu\ell\nu$ final states
$ggF, H \rightarrow ZZ$	345060	Powheg-Box V2 + Pythia 8	$\ell^{+}\ell^{-}\ell^{+}\ell^{-}$ final states
$VBF, H \rightarrow ZZ$	344235	Powheg-Box V2 + Pythia 8	$\ell^{+}\ell^{-}\ell^{+}\ell^{-}$ final states
$ZH, H \rightarrow ZZ$	341947	Pythia 8	$\ell^{+}\ell^{-}\ell^{+}\ell^{-}$ final states
$WH, H \rightarrow ZZ$	341964	Pythia 8	$\ell^{+}\ell^{-}\ell^{+}\ell^{-}$ final states
$ggF, H \rightarrow γγ$	343981	Powheg-Box V2 + Pythia 8	$\gamma\gamma$ final states
$VBF, H \rightarrow γγ$	345041	Powheg-Box V2 + Pythia 8	$\gamma\gamma$ final states
$WH (ZH), H \rightarrow$ γγ	345318, 345319	Powheg-Box V2 + Pythia 8	$\gamma\gamma$ final states
$ttH, H \rightarrow γγ$	341081	aMC@NLO + Pythia 8	$\gamma\gamma$ final states

BSM production

Process	Unique "channelNumber"	Generator, hadronisation	Additional information
$Z' \rightarrow t\bar{t}$	301325	Pythia 8	$m_{Z'} = 1$ TeV
$\tilde{\ell}\tilde{\ell}'\rightarrow \ell\tilde{\chi}^0_1 \ell' \tilde{\chi}_1^{0}{'}$	392985	aMC@NLO + Pythia 8	$m_{\tilde{\ell}} = 600$ GeV, $m_{\tilde{\chi}^0_1} = 300$ GeV

General Capabilities of the Datasets

The publicly released datasets can be used for educational purposes with different levels of task difficulty.

At a beginner level, one could visualise the content of the datasets and produce simple distributions. An intermediate-level task would consist of making histograms with collision data after some basic selection. Advanced-level tasks would allow for a deeper look into the ATLAS data, with possibilities of measuring real event properties and physical quantities.

A non-exhaustive list of possible tasks with the proposed datasets include:

Comparisons of several distributions of event variables for simulated signal and background events.
Finding variables that are able to separate signal from background (jet multiplicity, transverse momenta of jets and leptons, lepton isolation, b-tagging, missing transverse energy, angular distributions).
Development and modification of cuts on these variables in order to enrich the signal-over-background separation.
Optimisation of the signal-over-background ratio and estimation of the purity based on simulation only.
Comparisons of the selection efficiency between data and simulation.

Advanced-level tasks might include:

Derivation of production cross sections and masses of objects.
Reconstruction of the objects (quarks or bosons) by assigning the detector physics objects (jets, leptons, missing energy) to the hypothetical decay trees.
Estimation of the impact of other sources of systematic uncertainties (luminosity uncertainty, b-tagging efficiency, background modelling) by adding approximate and conservative values.
A test-bed for new data-analysis techniques, e.g. kinematic fitting procedures, multivariate discrimination of signal from background and other machine learning tasks.

Reconstructed physics objects​

Processes​

General Capabilities of the Datasets​

Reconstructed physics objects

Processes

General Capabilities of the Datasets