Skip to main content

8 TeV Data for Education

The ATLAS Collaboration published two distinct datasets from the 8 TeV proton-proton (pp) collision data, recorded in 2012 at the Large Hadron Collider (LHC). These datasets, released in both XML and ROOT ntuple formats, represent an effort to make LHC data accessible for educational purposes. With 2 fb-1 provided in XML format and an additional 1 fb-1 in ROOT ntuple format. These datasets serve as a cornerstone for the ATLAS Open Data initiative.

  • Scope: Encompassing nearly 15 million events, this dataset offers a comprehensive view of the ATLAS experiment's 2012 data-taking period.
  • Format: Provided in a simplified TTree tuple (or ROOT ntuple) format, it contains 45 branches detailed for ease of analysis.
  • Versatility: The 2016 dataset's broad scope and extensive event count allow for diverse, in-depth studies.

Event Selection and Data Quality

Event selection for the ATLAS Open Data 2016 dataset was performed to streamline the data, making it more manageable for analysis while preserving its scientific value. This process involved:

  • Corrupted Event Protection: Removal of events affected by short-term detector issues.
  • Trigger Satisfaction: Inclusion of events that satisfy single-lepton triggers for electrons or muons, with a pT threshold of 5 GeV.
  • Veto on Bad Jets: Exclusion of events containing jets not associated with energy deposits in the calorimeters.
  • Primary Vertex Requirement: Selection of events with at least one primary vertex associated with four or more tracks.

The layout is optimised towards simplicity to reduce the complexities encountered in a full-scale analysis, emphasising the educational character of the dataset.

By providing these datasets, the ATLAS Collaboration aims to demystify particle physics, encouraging exploration and discovery among the next generation of scientists. The open data initiative exemplifies the collaboration's commitment to open science, inviting learners and researchers to delve into the intricacies of the universe with real data from the forefront of particle physics.

ATLAS Open Data 2016 contains two data files; one where the presence of an electron triggered the eventrecording (called the ‘egamma dataset’) and the other where a muon triggered the event recording (‘muon dataset’). The data are accompanied by relevant simulated data:

  • ATLAS Open Data 2016 egamma dataset containing ∼8 million events.
  • ATLAS Open Data 2016 muon dataset containing ∼7 million events.
  • ATLAS simulated data consisting of 42 datasets containing ∼45 million events. Overlap removal was applied to ensure that the same event does not exist in both the egamma and muon datasets. The previous means that electron-triggered events may contain a muon with high transverse momentum, for that reason the muon dataset has a veto on such events.

Dataset Details

An important aspect of the data samples is that they were prepared specifically for educational purposes. To this end, precision has been traded for simplicity of use. The introduced simplifications are:

  • No facilities to estimate systematic uncertainties have been included as these quickly introduce large complexities.

  • The b-tagging scale factor is computed for a specific working point (MV1@70% efficiency). The user, however, is free to specify the b-tagging weight used for tagging jets allowing for a potential mismatch of the definition considered in the scale factor calculation and the one being actually applied.

  • No QCD simulated samples were prepared as they would have been insufficient in statistics while introducing large set of additional samples.

  • The description of the WW boson properties in simulated WW + jets events is not ideal. Corrections are only available for samples produced with the Monte Carlo generator Alpgen but not for those produced with Sherpa generator. However, using Alpgen would have introduced a prohibitively large number of samples. Sherpa was therefore used.

  • The missing transverse momentum was calculated using the object preselection. A recalculation of the missing transverse momentum is not implemented into the tools provided for simplicity reasons. Therefore, changes in the object selection are not reflected in the missing transverse momentum leading to potential mis-modeling of variables relying on it.

  • The simulated data takes into account the pile-up and vertex position profile of the whole 2012 data taking, although the measured data is taken from a small list of runs from period D. This introduces a certain mismatch regarding the number of vertices and the primary vertex position.

Details of the available simulated Monte Carlo datasets

The datasets have been reduced in size to optimise the storage requirements. The available number of events in the samples is given in the column N events, which is after the preselection cuts.

The factor FE denotes the filter efficiency for a given sample and fkf_k is used for rescaling the leading order estimate to next to leading order in perturbative QCD.

The following samples represent about 6.5 Gb.

Click to view full list
processDataSet IDGeneratorσ\sigma*FE [pb]fkf_kL [fb1fb^{-1}]N eventssize/Mb
ttbar -> l + X117050PowHeg+Pythia114.511.226.2361500000291
ttbar -> Jets117049PowHeg+Pythia96.351.285.027251705.7
single top t-chan top110090PowHeg+Pythia17.521.0524.2115000021
single top t-chan antitop110091PowHeg+Pythia9.41.0643.2315000015
single top s-chan110119PowHeg+Pythia1.641.107167.7310000015
single top Wt-chan110140PowHeg+Pythia20.461.0928.5015000026
Z+Jets ee147770Sherpa1207.41.02810.087500000938
Z+Jets mumu147771Sherpa1207.41.0289.637500000918
Z+Jets tautau147772Sherpa1207.11.02811.0875000093
Drell-Yan ee M08to15173041Sherpa92.151.045.9540000057
Drell-Yan ee M15to40173042Sherpa279.191.047.22750000100
Drell-Yan mumu M08to15173043Sherpa92.081.051.9350000074
Drell-Yan mumu M15to40173044Sherpa279.21.041.01750000103
Drell-Yan tautau M08to15173045Sherpa92.121.027.1399931.5
Drell-Yan tautau M15to40173046Sherpa279.111.049.54323934.5
W+Jets enu with b167740Sherpa140.341.112.33375000086
W+Jets enu with jets, bveto167741Sherpa537.841.19.5632600000296
W+Jets enu no jets, bveto167742Sherpa102951.11.9718000000722
W+Jets munu with b167743Sherpa140.391.111.93575000084
W+Jets munu with jets, bveto167744Sherpa466.471.110.5822500000287
W+Jets munu no jets, bveto167745Sherpa103681.11.7197500000666
W+Jets taunu with b167746Sherpa140.341.118.24510000013
W+Jets taunu with jets, bveto167747Sherpa506.451.19.82125000031
W+Jets taunu no jets, bveto167748Sherpa103271.11.94555000055

The ZZ' and Higgs samples represent a further 150 Mb.

Click to view full list
ProcessDataSet IDGeneratorσ*FE [pb]fkL [fb−1]N eventssize/Mb
Z' -> ttbar [ 400] GeV110899Pythia4.2591.023.48183074.3
Z' -> ttbar [ 500] GeV110901Pythia3.9251.025.48197374.7
Z' -> ttbar [ 750] GeV110902Pythia1.2431.080.45210515.3
Z' -> ttbar [1000] GeV110903Pythia0.3941.0253.81206495.5
Z' -> ttbar [1250] GeV110904Pythia0.1391.0719.43192745.5
Z' -> ttbar [1500] GeV110905Pythia0.05241.01908176955.4
Z' -> ttbar [1750] GeV110906Pythia0.02111.04739159495.1
Z' -> ttbar [2000] GeV110907Pythia0.008941.011186144554.9
Z' -> ttbar [2250] GeV110908Pythia0.003941.025381133894.7
Z' -> ttbar [2500] GeV110909Pythia0.001801.055556127234.5
Z' -> ttbar [3000] GeV110910Pythia0.0004341.0230415123874.3
gg-> H-> WW-> llnunu ; M(H) = 125 GeV161005PowHeg+Pythia6.4631.032.1310000014
VBF H-> WW-> llnunu ; M(H) = 125 GeV161055PowHeg+Pythia0.8191.0229.9310000018
gg-> H-> ZZ -> 4l ; M(H) = 125 GeV160155PowHeg+Pythia13.171.014.3110000015
VBF H-> ZZ -> 4l ; M(H) = 125 GeV160205PowHeg+Pythia1.6171.0104.9610000019

More information

For detailed information about this release, you can read "Review of ATLAS Open Data 8 TeV datasets, tools and activities"