Skip to main content

Using the PHYSLITE Format

The research data is available in the PHYSLITE format, which is user-friendly and ready for analysis. This notebook demonstrates how to utilize ATLAS Open Data in PHYSLITE format using uproot and awkward arrays for a basic physics analysis. Specifically, it shows how to reconstruct the hadronically decaying top quark from semi-leptonic ttˉt\bar{t} events.

What's Inside the Notebook

In this notebook, you will learn:

  • How to read PHYSLITE data with uproot and inspect its branches.
  • How to compile branches into records.
  • How to perform basic event and object selection.
  • How to conduct basic overlap removal.

These steps will guide you to the top quark reconstruction.

Transforming PHYSLITE to NTuple

To convert PHYSLITE to an NTuple, follow these sections from the Analysis Software Tutorial:

  1. Basic Analysis Algorithm: From Introduction to Algorithms to Run Algorithm.
  2. CP Algorithms: All subsections starting with Common CP Algorithms Introduction.
  3. Physics Objects: Electrons in Analysis, Muons in Analysis, Jets in Analysis.

For these you will need to use AnalysisBase. Check how to setup a container in the next section.

Resources

Downloading all the available Open Data requires significant resources. For those who wish to tinker, but might not have the computing resources to hand, there are a few options.

On these resources, we recommend installing dependencies via a terminal:

pip3 install --user jupyterlab matplotlib tqdm xrootd zstandard uproot==5.1.2 awkward==2.5.0 vector==1.1.1 cernopendata-client[xrootd]

If you wish to run without xrootd, this is sufficient:

pip3 install --user jupyterlab matplotlib tqdm zstandard uproot==5.1.2 awkward==2.5.0 vector==1.1.1 cernopendata-client

With the CERN Open Data client you can identify the files you wish to access, and either download a few of them locally or run on them remotely. Files from the ATLAS Open Data for research have file names like root://eospublic.cern.ch//eos/opendata/atlas/rucio/; in this file name, root://eospublic.cern.ch/ can be replaced with http://opendata.cern.ch. You can try both xrdcp (with a filename beginning with root://) and curl -O (with a filename beginning with https://) for downloading. Which is faster will depend on the system.

Once you have a file to run on, feel free to try the notebooks described above. With xrootd installed, you can also remote-read a file (without downloading it in advance) by giving the full filename starting with root://. Remote reading also relies on the network between CERN and the computing resources you're using. For slower connections, downloading a file once and then operating on it locally will generally give a better experience.

At the Nebraska-Lincoln site, there is a special xcache instance which allows faster access to the open data on the CERN Open Data portal. In the file names you get from the Open Data client, replace root://eospublic.cern.ch/ with root://red-xcache1.unl.edu:1096/, and you'll have a more responsive experience, especially in case you access a file that is already in the local cache.

On Google resources, we've found that xrootd is slow to build. Skipping its installation and using curl for downloading files seems to give a reasonable user experience.