The CERN Open Data Client
For remote data access, you can use the CERN Open Data Client, leveraging the xrootd protocol. This will allow you to retrieve the necessary URLs and access the data directly from CERN’s storage systems.
Install the cernopendata-client package along with the fsspec-xrootd package by running:
pip install cernopendata-client fsspec-xrootd
Retrieving File URLs
Once the client is installed, you can use it to obtain the URLs of the data files, for example, via their DOI (Digital Object Identifier). To get the file URLs from this record, you can use the following command:
cernopendata-client get-file-locations --doi 10.7483/OPENDATA.ATLAS.TC5G.AC24 --protocol xrootd
This command will return a file location like the following:
root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2016-07-29/MC/mc_147770.Zee.root
Accessing Data
Using the xrootd protocol, you can stream the data directly into your code without downloading the full dataset to your local machine. Here’s a Python example using ROOT to open a file:
import ROOT
file_url = "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2016-07-29/MC/mc_147770.Zee.root"
file = ROOT.TFile.Open(file_url)
This approach allows for efficient access to large datasets hosted remotely on CERN's servers.