The 8 TeV Data
❗For detailed information about this release, you can read "Review of ATLAS Open Data 8 TeV datasets, tools and activities."
The ATLAS Collaboration published two distinct datasets from the 8 TeV proton-proton (pp) collision data, recorded in 2012 at the Large Hadron Collider (LHC) as part of its first release of open data for education. These datasets, with 2 fb-1 provided in XML format and an additional 1 fb-1 in ROOT ntuple format, represent an effort to make LHC data accessible for educational purposes.
- Scope: Encompassing nearly 15 million events, this dataset offers a comprehensive view of the ATLAS experiment's 2012 data-taking period.
- Format: Provided in a simplified TTree tuple (or ROOT ntuple) format, it contains 45 branches detailed for ease of analysis.
- Versatility: The 2016 dataset's broad scope and extensive event count allow for diverse, in-depth studies.
Event Selection and Data Quality
Event selection for the ATLAS Open Data 2016 dataset was performed to streamline the data, making it more manageable for analysis while preserving its scientific value. This process involved:
- Corrupted Event Protection: Removal of events affected by short-term detector issues.
- Trigger Satisfaction: Inclusion of events that satisfy single-lepton triggers for electrons or muons, with a pT threshold of 5 GeV.
- Veto on Bad Jets: Exclusion of events containing jets not associated with energy deposits in the calorimeters.
- Primary Vertex Requirement: Selection of events with at least one primary vertex associated with four or more tracks.
The layout is optimised towards simplicity to reduce the complexities encountered in a full-scale analysis, emphasising the educational character of the dataset.
By providing these datasets, the ATLAS Collaboration aims to demystify particle physics, encouraging exploration and discovery among the next generation of scientists. The open data initiative exemplifies the collaboration's commitment to open science, inviting learners and researchers to delve into the intricacies of the universe with real data from the forefront of particle physics.
ATLAS Open Data 2016 contains two data files; one where the presence of an electron triggered the eventrecording (called the ‘egamma dataset’) and the other where a muon triggered the event recording (‘muon dataset’). The data are accompanied by relevant simulated data:
- ATLAS Open Data 2016 egamma dataset containing ∼8 million events.
- ATLAS Open Data 2016 muon dataset containing ∼7 million events.
- ATLAS simulated data consisting of 42 datasets containing ∼45 million events. Overlap removal was applied to ensure that the same event does not exist in both the egamma and muon datasets. The previous means that electron-triggered events may contain a muon with high transverse momentum, for that reason the muon dataset has a veto on such events.
Dataset Details
An important aspect of the data samples is that they were prepared specifically for educational purposes. To this end, precision has been traded for simplicity of use. The introduced simplifications are:
- No facilities to estimate systematic uncertainties have been included as these quickly introduce large complexities.
- The b-tagging scale factor is computed for a specific working point (MV1@70% efficiency). The user, however, is free to specify the b-tagging weight used for tagging jets allowing for a potential mismatch of the definition considered in the scale factor calculation and the one being actually applied.
- No QCD simulated samples were prepared as they would have been insufficient in statistics while introducing large set of additional samples.
- The description of the boson properties in simulated + jets events is not ideal. Corrections are only available for samples produced with the Monte Carlo generator Alpgen but not for those produced with Sherpa generator. However, using Alpgen would have introduced a prohibitively large number of samples. Sherpa was therefore used.
- The missing transverse momentum was calculated using the object preselection. A recalculation of the missing transverse momentum is not implemented into the tools provided for simplicity reasons. Therefore, changes in the object selection are not reflected in the missing transverse momentum leading to potential mis-modeling of variables relying on it.
- The simulated data takes into account the pile-up and vertex position profile of the whole 2012 data taking, although the measured data is taken from a small list of runs from period D. This introduces a certain mismatch regarding the number of vertices and the primary vertex position.