Atlas Open Magic 🪄📊
Atlas Open Magic is a Python package made to simplify working with ATLAS Open Data by providing utilities to manage metadata and URLs for streaming the data.
Installation​
You can install this package using pip
.
pip install atlasopenmagic
Alternatively, clone the repository and install locally:
git clone https://github.com/atlas-outreach-data-tools/atlasopenmagic.git
cd atlasopenmagic
pip install .
Quick start​
First, import the package:
import atlasopenmagic as atom
See the available releases and set to one of the options given by available_releases()
atom.available_releases()
set_release('2024r-pp')
Check in the Monte Carlo Metadata which datasets do you want to retrieve and use the 'Dataset ID'. For example, to get the metadata from Pythia8EvtGen_A14MSTW2008LO_Zprime_NoInt_ee_SSM3000:
all_metadata = atom.get_metadata('301204')
If we only want a specific variable:
xsec = atom.get_metadata('301204', 'cross_section')
To get the URLs to stream the files for that MC dataset:
all_mc = atom.get_urls('301204')
To get some data instead, check the available options:
atom.available_data()
And get the URLs for the one that's to be used:
all_mc = atom.get_urls('2016')
Open Data functions description and usage​
available_releases()
​
Shows the available open data releases keys and descriptions.
Usage:
import atlasopenmagic as atom
atom.available_releases()
get_current_release()
​
Retrieves the release that the package is currently set at.
Usage:
release = atom.get_current_release()
print(release)
set_release(release)
​
Set the release (scope) in which to look for information (research open data, education 8 TeV, et). The release
passed to the function has to be one of the keys listed by available_releases()
.
Args:
release
: name of the release to use.
Usage:
atom.set_release('2024r-pp')
get_metadata(key, var)
​
Get metadata information for MC data.
Args:
key
: Dataset ID.var
: Variable to retrieve.
Usage: You can get a dictionary with all the metadata
metadata = atom.get_metadata('301209')
Or a single variable
xsec = atom.get_metadata('301209', 'cross_section')
The available variables are: dataset_id
, short_name
, e-tag
, cross_section
, filter_efficiency
, k_factor
, number_events
, sum_weights
, sum_weights_squared
, process
, generators
, keywords
, description
, job_link
.
The keys to be used for research data are the Dataset IDs found in the Monte Carlo Metadata
get_urls(key, skim, protocol)
​
Retrieves the list of URLs corresponding to a given key. This is used for MC data.
Args:
key
: Dataset ID.skim
: Skim for the dataset. This parameter is only taken into account when using the2025e-13tev-beta
release.protocol
: protocol for the URLs. Options: 'root' and 'https'.
Usage:
urls = atom.get_urls('12345', protocol='root')
available_data()
​
Retrieves the list of keys for the data available for a scope/release.
Usage:
atom.available_data()
get_urls_data(data_key, protocol)
​
Retrieves the list of URLs corresponding to one of the keys listed by available_data()
.
Args:
data_key
: For non-beta releases (e.g. '2015', '2016', etc.), the data key to look up.skim
: Only for the 2025e-13tev-beta release: the skim name to look up.
Usage:
data = get_urls_data(data_key='2016', protocol='https')
Notebooks utilities description and usage​
install_from_environment(*packages, environment_file)
​
Install specific packages listed in an environment.yml
file via pip.
Args:
*packages
: Package names to install (e.g., 'coffea', 'dask').environment_file
: Path to the environment.yml file. If None, defaults to the environment file for the educational resources.
Usage:
import atlasopenmagic as atom
atom.install_from_environment("coffea", "pandas", environment_file="./myfile.yml")
build_mc_dataset(mc_defs, skim='noskim', protocol='https')
​
Build a dict of MC samples URLs.
Args:
mc_defs
: Dictionary with DIDs and optional color:{ sample_name: {'list': [...urls...], 'color': ...}, … }
skim
: The MC skim tag (only meaningful in the 2025e-13tev-beta release)protocol
: Protocol to use for URLs.
Usage:
import atlasopenmagic as atom
mc_defs = {
r'Background $t\bar t$': {'dids': [410470], 'color': 'yellow'},
r'Background $V+$jets': {'dids': [700335,700336,700337], 'color': 'orange'},
r'Background Diboson': {'dids': [700488,700489,700490,700491],'color': 'green'},
r'Background $ZZ^{*}$': {'dids': [700600,700601], 'color': '#ff0000'},
r'Signal ($m_H$=125 GeV)': {'dids': [345060,346228], 'color': '#00cdff'},
}
mc_samples = build_mc_dataset(mc_defs, skim='2bjets', protocol='https')
build_data_dataset(data_keys, name="Data", color=None, protocol="https")
​
Build a dict of Data samples URLS.
Args:
data_keys
: The data_key(s) to fetch (e.g. '2015' or ['2015','2016']).name
: The key under which the sample appears in the returned dict.color
: A color string to attach to the sample.protocol
: Protocol to use for URLs.
Usage:
import atlasopenmagic as atom
data_samples = build_data_samples("2bjets", name="Data", color="red", protocol="root")
Contributing​
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-name
). - Commit your changes (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature-name
). - Create a Pull Request.
Please ensure all tests pass before submitting a pull request.
License​
This project is licensed under the Apache 2.0 License