API Reference

This page lists the classes and methods that are necessary for building an analysis, but are not related to expressions (see Building expressions for a description of those)—the aim is to provide a set of easy-to-use classes and methods.

Plots and selections

The bamboo.plots module provides high-level classes to represent and manipulate selections and plots.

class bamboo.plots.CategorizedSelection(parent=None, categories=None, name=None)[source]

Helper class to represent a group of similar selections on different categories

The interface is similar, but not identical to that of Selection (constructing Plot objects is done through the makePlots() method, which takes additional arguments). Each category selection can have a candidate, typically the object or group of object that differs between the categories. The axis variables can then either be expressions, or callables that will be passed this per-category object.

Example:

>>> muonSel = noSel.refine("hasMuon", cut=(
>>>     op.rng_len(muons) > 0, op.OR(op.rng_len(electrons) == 0,
>>>     muons[0].pt > electrons[0].pt)))
>>> electronSel = noSel.refine("hasElectron", cut=(
>>>     op.rng_len(electrons) > 0, op.OR(op.rng_len(muons) == 0,
>>>     electrons[0].pt > muons[0].pt)))
>>> oneLeptonSel = CategorizedSelection(categories={
...     "Mu" : (muonSel, muons[0]),
...     "El" : (electronSel, electrons[0])
...     })
>>> oneLep2JSel = onLeptonSel.refine("hasLep2J", cut=(op.rng_len(jets) >= 2))
>>> plots += oneLep2JSel.makePlots("J1PT", jets[0].pt, EqB(50, 0., 150.))
>>> plots += oneLep2JSel.makePlots("LJ1Mass",
...     (lambda l : op.invariant_mass(jets[0].p4, l.p4)), EqB(50, 0., 200.))

__init__(parent=None, categories=None, name=None)[source]

Construct a group of related selections

Parameters:

name – name (optional)
parent – parent CategorizedSelection (optional)
categories – dictionary of a Selection and candidate (any python object) per category (key is category name), see the addCategory() method below

addCategory(catName, selection, candidate=None)[source]

Add a category

Parameters:

catName – category name
selection – Selection for this category
candidate – any python object with event-level quantities specific to this category

makePlots(name, axisVariables, binnings, construct=None, savePerCategory=True, saveCombined=True, combinedPlotType=<class 'bamboo.plots.SummedPlot'>, **kwargs)[source]

Make a plot for all categories, and/or a combined one

Parameters:

name – plot name (per-category plot names will be "{name}_{category}")
axisVariables – one or more axis variables
binnings – as many binnings as variables
construct – plot factory method, by default the make{N}D method of Plot (with N the number of axis variables)
savePerCategory – save the individual plots (enabled by default)
saveCombine – save the combined plot (enabled by default)
combinedPlotType – combined plot type, SummedPlot by default

Returns:

a list of plots

refine(name, cut=None, weight=None, autoSyst=True)[source]

Equivalent of refine(), but for all categories at a time

Parameters:

name – common part of the name for the new category selections (individual names will be "{name}_{category})
cut – cut(s) to add. If callable, the category’s candidate will be passed
weight – weight(s) to add. If callable, the category’s candidate will be passed
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)

Returns:

the new CategorizedSelection

class bamboo.plots.CutFlowReport(name, selections=None, recursive=False, titles=None, autoSyst=False, cfres=None, printInLog=False)[source]

Collect and print yields at different selection stages, and cut efficiencies

The simplest way to use this, just to get an overview of the number of events passing each selection stage in the log file, is by adding a CutFlowReport("yields", selections=<list of selections>, recursive=True, printInLog=True) to the list of plots. recursive=True will add all parent selections recursively, so only the final selection categories need to be passed to the selections keyword argument.

It is also possible to output a LaTeX yields table, and specify exactly which selections and row or column headers are used. Then the CutFlowReport should be constructed like this:

yields = CutFlowReport("yields")
plots.append(yields)
yields.add(<selection1-or-list-of-selections1>, title=title1)
yields.add(<selection2-or-list-of-selections2>, title=title2)
...

Each yields.add call will then add one entry in the yields table, with the yield the one of the corresponding selection, or the sum over the list (e.g. different categories that should be taken together); the other dimension are the samples (or sample groups). The sample (group) titles and formatting of the table can be customised in the same way as in plotIt, see printCutFlowReports() for a detailed description of the different options.

__init__(name, selections=None, recursive=False, titles=None, autoSyst=False, cfres=None, printInLog=False)[source]

Constructor. name is mandatory, all other are optional; for full control the add() should be used to add entries.

Using the constructor with a list of Selection instances passed to the selections keyword argument, and recursive=True, printInLog=True is the easiest way to get debugging printout of the numbers of passing events.

add(selections, title=None)[source]: Add an entry to the yields table, with a title (optional)

produceResults(bareResults, fbe, key=None)[source]

Main interface method, called by the backend

Parameters:

bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)

Returns:

an iterable with ROOT objects to save to the output file

readFromResults(resultsFile)[source]: Reconstruct the CutFlowReport, reading counters from a results file

class bamboo.plots.DerivedPlot(name, dependencies, **kwargs)[source]

Base class for a plot with results based on other plots’ results

The dependencies attribute that lists the Plot-like objects this one depends on (which may be used e.g. to order operations). The other necessary properties (binnings, titles, labels, etc.) are taken from the keyword arguments to the constructor, or the first dependency. The produceResults() method, which is called by the backend to retrieve the derived results, should be overridden with the desired calculation.

Typical use cases are summed histograms, background subtraction, etc. (the results are combined for different subjobs with hadd, so derived quantities that require the full statistics should be calculated from the postprocessing step; alternative or additional systematic variations calculated from the existing ones can be added by subclassing Plot).

__init__(name, dependencies, **kwargs)[source]

collectDependencyResults(fbe, key=None)[source]

helper method: collect all results of the dependencies

Returns:: [ (nominalResult, {"variation" : variationResult}) ]

produceResults(bareResults, fbe, key=None)[source]

Main interface method, called by the backend

Parameters:

bareResults – iterable of histograms for this plot produced by the backend (none)
fbe – reference to the backend, can be used to retrieve the histograms for the dependencies, e.g. with collectDependencyResults()
key – key under which the backend stores the results (if any)

Returns:

an iterable with ROOT objects to save to the output file

class bamboo.plots.EquidistantBinning(N, mn, mx)[source]

Equidistant binning

__init__(N, mn, mx)[source]

Parameters:

N – number of bins
mn – minimum axis value
mx – maximum axis value

class bamboo.plots.FactoryBackend[source]

Interface for factory backend (to separate Plots classes from ROOT::RDataFrame part)

__init__()[source]

buildGraph(plotList)[source]: Called after adding all products, but before retrieving the results

classmethod create(tree, nThreads=None)[source]: Factory method, should return a pair of the backend and root selection

define(op, selection)[source]: explicitly define column for expression

class bamboo.plots.LateSplittingSelection(parent, name, cuts=None, weights=None, autoSyst=True, keepInclusive=None)[source]

A drop-in replacement for Selection to efficiently split a sample

The concept is quite similar to SelectionWithDataDriven, but with very different performance trade-offs: the former creates two parallel branches of the RDF graph, each for their own set of events (with a typically small performance overhead due to dupliation), whereas this is for cases where all events should be processed identically until they are filled into histograms (e.g. separating subprocesses based on MC truth). It is worth defining columns with these categories early on, such that the splitting does not need to do it many times for different selections and categories.

__init__(parent, name, cuts=None, weights=None, autoSyst=True, keepInclusive=None)[source]

Constructor. Prefer using refine() instead: (except for the ‘root’ selection)

Parameters:

parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)

static create(parent, name, splitCuts=None, keepInclusive=True, cut=None, weight=None, autoSyst=True)[source]

Create a selection that will lazily split into categories

Parameters:

name – name of the new selection (after applying the cut and weight, as in bamboo.plots.Selection.refine())
splitCuts – dictionary of regions, the values should be the cuts that define the region
keepInclusive – also produce the plots without splitting
cut – common selection
weight – common weight
autoSyst – automatically propagate systematic uncertainties

initSub()[source]

Initialize related selections, should be called before registering non-plot products: (anything not going through registerPlot)

class bamboo.plots.Plot(name, variables, selection, binnings, weight=None, title='', axisTitles=(), axisBinLabels=(), plotopts=None, autoSyst=True, key=None)[source]

A Plot object contains all information needed to produce a histogram: the variable(s) to plot, binnings and options (axis titles, optionally some style information), and a reference to a Selection (which holds all cuts and weights to apply for the plot).

Note

All Plot (and Selection) instances need to have a unique name. This name is used to construct output filenames, and internally to define DataFrame columns with readable names. The constructor will raise an exception if an existing name is used.

__init__(name, variables, selection, binnings, weight=None, title='', axisTitles=(), axisBinLabels=(), plotopts=None, autoSyst=True, key=None)[source]: Generic constructor. Please use the static make1D(), make2D() and make3D() methods, which provide a more convenient interface to construct histograms (filling in some defaults requires knowing the dimensionality).

clone(name=None, variables=None, selection=None, binnings=None, weight=None, title=None, axisTitles=None, axisBinLabels=None, plotopts=None, autoSyst=True, key=None)[source]: Helper method: create a copy with optional re-setting of attributes

classmethod make1D(name, variable, selection, binning, **kwargs)[source]

Construct a 1-dimensional histogram plot

Parameters:

name – unique plot name
variable – x-axis variable expression
selection – the Selection with cuts and weights to apply
binning – x-axis binning
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, taken from plot title by default)
xBinLabels – x-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)

Returns:

the new Plot instance with a 1-dimensional histogram

Example:

>>> hasTwoEl = noSel.refine(cut=(op.rng_len(t.Electron) >= 2))
>>> mElElPlot = Plot.make1D(
>>>     "mElEl", op.invariant_mass(t.Electron[0].p4, t.Electron[1].p4), hasTwoEl,
>>>     EquidistantBinning(80, 50., 130.), title="Invariant mass of the leading-PT electrons")

classmethod make2D(name, variables, selection, binnings, **kwargs)[source]

Construct a 2-dimensional histogram plot

Parameters:

name – unique plot name
variables – x- and y-axis variable expression (iterable, e.g. tuple or list)
selection – the Selection with cuts and weights to apply
binnings – x- and y-axis binnings (iterable, e.g. tuple or list)
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, empty by default)
yTitle – y-axis title (optional, empty by default)
xBinLabels – x-axis bin labels (optional)
yBinLabels – y-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)

Returns:

the new Plot instance with a 2-dimensional histogram

classmethod make3D(name, variables, selection, binnings, **kwargs)[source]

Construct a 3-dimensional histogram

Parameters:

name – unique plot name
variables – x-, y- and z-axis variable expression (iterable, e.g. tuple or list)
selection – the Selection with cuts and weights to apply
binnings – x-, y-, and z-axis binnings (iterable, e.g. tuple or list)
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, empty by default)
yTitle – y-axis title (optional, empty by default)
zTitle – z-axis title (optional, empty by default)
xBinLabels – x-axis bin labels (optional)
yBinLabels – y-axis bin labels (optional)
zBinLabels – z-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)

Returns:

the new Plot instance with a 3-dimensional histogram

produceResults(bareResults, fbe, key=None)[source]

Trivial implementation of produceResults()

Subclasses can e.g. calculate additional systematic variation histograms from the existing ones

Parameters:

bareResults – list of nominal and systematic variation histograms for this Plot
fbe – reference to the backend
key – key under which the backend stores the results (if any)

Returns:

bareResults

class bamboo.plots.Product(name, key=None)[source]

Interface for output products (plots, counters etc.)

__init__(name, key=None)[source]

produceResults(bareResults, fbe, key=None)[source]

Main interface method, called by the backend

Parameters:

bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)

Returns:

an iterable with ROOT objects to save to the output file

class bamboo.plots.Selection(parent, name, cuts=None, weights=None, autoSyst=True)[source]

A Selection object groups a set of selection criteria (cuts) and weight factors that belong to a specific stage of the selection and analysis. Selections should be constructed by calling the refine() method on a ‘root’ selection (which may include overall selections and weights, e.g. a lumi mask for data and pileup reweighting for MC).

Note

All Selection (and Plot) instances need to have a unique name. This name is used internally to define DataFrame columns with readable names. The constructor will raise an exception if an existing name is used.

__init__(parent, name, cuts=None, weights=None, autoSyst=True)[source]

Constructor. Prefer using refine() instead: (except for the ‘root’ selection)

Parameters:

parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)

refine(name, cut=None, weight=None, autoSyst=True)[source]

Create a new selection by adding cuts and/or weight factors

Parameters:

name – unique name of the new selection
cut – expression (or list of expressions) with additional selection criteria (combined using logical AND)
weight – expression (or list of expressions) with additional weight factors
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)

Returns:

the new Selection

class bamboo.plots.SelectionWithDataDriven(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]

A main Selection with the corresponding “shadow”: Selection instances for evaluating data-driven backgrounds (alternative cuts and/or weights)

static create(parent, name, ddSuffix, cut=None, weight=None, autoSyst=True, ddCut=None, ddWeight=None, ddAutoSyst=True, enable=True)[source]

Create a selection with a data-driven shadow selection

Drop-in replacement for a bamboo.plots.Selection.refine() call: the main selection is made from the parent with cut and weight, the shadow selection is made from the parent with ddCut and ddWeight. With enable=False no shadow selection is made (this may help to avoid duplication in the calling code).

class bamboo.plots.SelectionWithMultipleDataDriven(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]

A main Selection with the corresponding “shadow”: Selection instances for evaluating data-driven backgrounds (alternative cuts and/or weights)

static create(parent, name, ddSuffixes, cut=None, weight=None, autoSyst=True, ddCuts=None, ddWeights=None, ddAutoSyst=True, enables=True)[source]

Create a selection with a data-driven shadow selections

Drop-in replacement for a bamboo.plots.Selection.refine() call: the main selection is made from the parent with cut and weight, the shadow selections are made from the parent with ddCuts and ddWeights. With enables=[False] no shadow selections are made (this may help to avoid duplication in the calling code).

This class is derived from a bamboo.plots.SelectionWithDataDriven() class and it works in a similar manner. In contrast to the bamboo.plots.SelectionWithDataDriven() class, this class can be used for the selection with multiple data-driven backgrounds at the same time. The ddCuts, ddWeights, ddSuffixes and enables need to be lists of ddCuts, ddWeights, ddSuffixes and enables, respectively.

class bamboo.plots.SelectionWithSub(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]

A common base class for Selection subclasses: with related/alternative/sub-Selection instances attached

A dictionary of additional selections is kept in the sub attribute (could be None to disable).

__init__(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]

Constructor. Prefer using refine() instead: (except for the ‘root’ selection)

Parameters:

parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)

static getSubsForPlot(p, requireActive=False, silent=False)[source]: Helper method: gather the sub-selections for which a plot is produced

initSub()[source]

Initialize related selections: (no-op by default, subclasses can request to call this to enable some functionality)

refine(name, cut=None, weight=None, autoSyst=True)[source]

Create a new selection by adding cuts and/or weight factors

Parameters:

name – unique name of the new selection
cut – expression (or list of expressions) with additional selection criteria (combined using logical AND)
weight – expression (or list of expressions) with additional weight factors
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)

Returns:

the new Selection

class bamboo.plots.Skim(name, branches, selection, keepOriginal=None, maxSelected=-1, compressionLevel=1, compressionAlgo=None, treeName=None, key=None)[source]

Save selected branches for events that pass the selection to a skimmed tree

__init__(name, branches, selection, keepOriginal=None, maxSelected=-1, compressionLevel=1, compressionAlgo=None, treeName=None, key=None)[source]

Skim constructor

Parameters:

name – name of the skim (also default name of the TTree)
branches – dictionary of branches to keep (name and definition for new branches, or name and None for specific branches from the input tree)
selection – Selection of events to save
keepOriginal – list of branch names to keep, bamboo.plots.Skim.KeepRegex instances with patterns of branch names to keep, or bamboo.plots.Skim.KeepAll to keep all branches from the input tree
maxSelected – maximal number of events to keep (default: no limit)
compressionLevel – compression level of output file (default: 1)
compressionAlgo – compression algorithm of output file (default: ZLIB)

Example:

>>> plots.append(Skim("dimuSkim", {
>>>     "run": None,  # copy from input
>>>     "luminosityBlock": None,
>>>     "event": None,
>>>     "dimu_m": op.invariant_mass(muons[0].p4, muons[1].p4),
>>>     "mu1_pt": muons[0].pt,
>>>     "mu2_pt": muons[1].pt,
>>>     }, twoMuSel,
>>>     keepOriginal=[
>>>         Skim.KeepRegex("PV_.*"),
>>>         "nOtherPV",
>>>         Skim.KeepRegex("OtherPV_.*")
>>>         ])

produceResults(bareResults, fbe, key=None)[source]

Main interface method, called by the backend

Parameters:

bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)

Returns:

an iterable with ROOT objects to save to the output file

class bamboo.plots.SummedPlot(name, termPlots, **kwargs)[source]

A DerivedPlot implementation that sums histograms

__init__(name, termPlots, **kwargs)[source]

produceResults(bareResults, fbe, key=None)[source]

Main interface method, called by the backend

Parameters:

bareResults – iterable of histograms for this plot produced by the backend (none)
fbe – reference to the backend, can be used to retrieve the histograms for the dependencies, e.g. with collectDependencyResults()
key – key under which the backend stores the results (if any)

Returns:

an iterable with ROOT objects to save to the output file

class bamboo.plots.VariableBinning(binEdges)[source]

Variable-sized binning

__init__(binEdges)[source]

Parameters:: binEdges – iterable with the edges. There will be len(binEges)-1 bins

Analysis modules

Minimally, bambooRun needs a class with a constructor that takes a single argument (the list of command-line arguments that it does not recognize as its own), and a run method that takes no arguments. bamboo.analysismodules provides more interesting base classes, starting from AnalysisModule, which implements a large part of the common functionality for loading samples and distributing worker tasks. HistogramsModule specializes this further for modules that output stack histograms, and NanoAODHistoModule supplements this with loading the decorations for NanoAOD, and merging of the counters for generator weights etc.

Note

When defining a base class that should also be usable for other things than only making plots or only making skims (e.g. both of these) it should not inherit from HistogramsModule or SkimmerModule (but the concrete classes should); otherwise a concrete class may end up inheriting from both (at which point the method resolution order will decide whether it behaves as a skimmer or a plotter, and the result may not be obvious).

A typical case should look like this:

class MyBaseClass(NanoAODModule):
    ...  # define addArgs, prepareTree etc.
class MyPlotter(MyBaseClass, HistogramsModule):
    ...
class MySkimmer(MyBaseClass, SkimmerModule):
    ...

class bamboo.analysismodules.AnalysisModule(args)[source]

Base analysis module

Adds common infrastructure for parsing analysis config files and running on a batch system, with customization points for concrete classes to implement

__init__(args)[source]

Constructor

set up argument parsing, calling addArgs() and initialize()

Parameters:: args – list of command-line arguments that are not parsed by bambooRun

addArgs(parser)[source]

Hook for adding module-specific argument parsing (receives an argument group),: parsed arguments are available in self.args afterwards

customizeAnalysisCfg(analysisCfg)[source]

Hook to modify the analysis configuration before jobs are created: (only called in driver or non-distributed mode)

getATree(fileName=None, sampleName=None, config=None)[source]

Retrieve a representative TTree, e.g. for defining the plots or interactive inspection,: and a dictionary with metadatas

getTasks(analysisCfg, resolveFiles=None, **extraOpts)[source]

Get tasks from analysis configs (and args), called in for driver or sequential mode

Returns:: a list of SampleTask instances

initialize()[source]: Hook for module-specific initialization (called from the constructor after parsing arguments)

postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]

Do postprocessing on the results of the tasks, if needed

should be implemented by concrete modules

Parameters:

taskList – (inputs, output), kwargs for the tasks (list, string, and dictionary)
config – parsed analysis configuration file
workdir – working directory for the current run
resultsdir – path with the results files

run()[source]

Main method

Depending on the arguments passed, this will:

if -i or --interactive, call interact() (which could do some initialization and start an IPython shell)
if --distributed=worker call processTrees() with the appropriate input, output, treename, lumi mask and run range
if --distributed=driver or not given (sequential mode): parse the analysis configuration file, construct the tasks with getTasks(), run them (on a batch cluster or in the same process with processTrees()), and finally call postProcess() with the results.

class bamboo.analysismodules.DataDrivenBackgroundAnalysisModule(args)[source]

AnalysisModule with support for data-driven backgrounds

A number of contributions can be defined, each based on a list of samples or groups needed to evaluate the contribution (typically just data) and a list of samples or groups that should be left out when making the plot with data-driven contributions. The contributions should be defined in the analysis YAML file, with a block datadriven (at the top level) that could look as follows:

datadriven:
  chargeMisID:
    uses: [ data ]
    replaces: [ DY ]
  nonprompt:
    uses: [ data ]
    replaces: [ TTbar ]

The --datadriven command-line switch then allows to specify a scenario for data-driven backgrounds, i.e. a list of data-driven contributions to include (all and none are also possible, the latter is the default setting). The parsed contributions are available as self.datadrivenContributions, and the scenarios (each list is a list of contributions) as self.datadrivenScenarios.

addArgs(parser)[source]

Hook for adding module-specific argument parsing (receives an argument group),: parsed arguments are available in self.args afterwards

initialize()[source]: Hook for module-specific initialization (called from the constructor after parsing arguments)

class bamboo.analysismodules.DataDrivenBackgroundHistogramsModule(args)[source]

HistogramsModule with support for data-driven backgrounds

see the DataDrivenBackgroundAnalysisModule class for more details about configuring data-driven backgrounds, and the SelectionWithDataDriven class for ensuring the necessary histograms are filled correctly. HistogramsModule writes the histograms for the data-driven contributions to different files. This one runs plotIt for the different scenarios.

postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]

Do postprocessing on the results of the tasks, if needed

should be implemented by concrete modules

Parameters:

taskList – (inputs, output), kwargs for the tasks (list, string, and dictionary)
config – parsed analysis configuration file
workdir – working directory for the current run
resultsdir – path with the results files

class bamboo.analysismodules.DataDrivenContribution(name, config)[source]

Configuration helper class for data-driven contributions

An instance is constructed for each contribution in any of the scenarios by the bamboo.analysismodules.DataDrivenBackgroundAnalysisModule.initialize() method, with the name and configuration dictionary found in YAML file. The usesSample():, replacesSample(): and modifiedSampleConfig(): methods can be customised for other things than using the data samples to estimate a background contribution.

__init__(name, config)[source]

modifiedSampleConfig(sampleName, sampleConfig, lumi=None)[source]

Construct the sample configuration for the reweighted counterpart of a sample

The default implementation assumes a data sample and turns it into a MC sample (the luminosity is set as generated-events to avoid changing the normalisation).

replacesSample(sampleName, sampleConfig)[source]: Check if this contribution replaces a sample (name or group in ‘replaces’)

usesSample(sampleName, sampleConfig)[source]: Check if this contribution uses a sample (name or group in ‘uses’)

class bamboo.analysismodules.HistogramsModule(args)[source]

Base histogram analysis module

__init__(args)[source]

Constructor

Defines a plotList member variable, which will store a list of plots (the result of definePlots(), which will be called after prepareTree()). The postProcess() method specifies what to do with the results.

addArgs(parser)[source]

Hook for adding module-specific argument parsing (receives an argument group),: parsed arguments are available in self.args afterwards

definePlots(tree, noSel, sample=None, sampleCfg=None)[source]

Main method: define plots on the trees (for a give systematic variation)

should be implemented by concrete modules, and return a list of bamboo.plots.Plot objects. The structure (name, binning) of the histograms should not depend on the sample, era, and the list should be the same for all values (the weights and systematic variations associated with weights or collections may differ for data and different MC samples, so the actual set of histograms will not be identical).

Parameters:

tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file

getPlotList(fileHint=None, sampleHint=None, resultsdir=None, config=None)[source]

Helper method for postprocessing: construct the plot list

The path (and sample name) of an input file can be specified, otherwise the results directory is searched for a skeleton tree. Please note that in the latter case, the skeleton file is arbitrary (in practice it probably corresponds to the first sample encountered when running in sequential or --distributed=driver mode), so if the postprocessing depends on things that are different between samples, one needs to be extra careful to avoid surprises.

Parameters:

fileHint – name of an input file for one of the samples
sampleHint – sample name for the input file passed in fileHint
resultsdir – directory with the produced results files (mandatory if no fileHint and sampleHint are passed)
config – analysis config (to override the default - optional)

initialize()[source]

makeBackendAndPlotList(inputFiles, tree=None, certifiedLumiFile=None, runRange=None, sample=None, sampleCfg=None, inputFileLists=None, backend=None, distRDFargs=None)[source]

Prepare and plotList definition (internal helper)

Parameters:

inputFiles – input file names
tree – key name of the tree inside the files
certifiedLumiFile – lumi mask json file name
runRange – run range to consider (for efficiency of the lumi mask)
sample – sample name (key in the samples block of the configuration file)
sampleCfg – that sample’s entry in the configuration file
inputFileLists – names of files with the input files (optional, to avoid rewriting if this already exists)
backend – type of backend (lazy/default, debug, compiled, distributed)
distRDFargs – parameters for distributed backend

Returns:

the backend and plot list (which can be None if run in onlyprepare” mode)

mergeCounters(outF, infileNames, sample=None)[source]

Merge counters

should be implemented by concrete modules

Parameters:

outF – output file (TFile pointer)
infileNames – input file names
sample – sample name

postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]

Postprocess: run plotIt

The list of plots is created if needed (from a representative file, this enables rerunning the postprocessing step on the results files), and then plotIt is executed

prepareTree(tree, sample=None, sampleCfg=None, backend=None)[source]

Create decorated tree, selection root (noSel), backend, and (run,LS) expressions

should be implemented by concrete modules

Parameters:

tree – decorated tree
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file
backend – type of backend (lazy/default, debug, compiled, distributed)

readCounters(resultsFile)[source]

Read counters from results file

should be implemented by concrete modules, and return a dictionary with counter names and the corresponding sums

Parameters:: resultsFile – TFile pointer to the results file

class bamboo.analysismodules.NanoAODHistoModule(args)[source]

A HistogramsModule for NanoAOD,: with decorations and merging of counters from NanoAODModule

__init__(args)[source]

Constructor

set up argument parsing, calling addArgs() and initialize()

Parameters:: args – list of command-line arguments that are not parsed by bambooRun

class bamboo.analysismodules.NanoAODModule(args)[source]

A AnalysisModule extension for NanoAOD,: adding decorations and merging of the counters

mergeCounters(outF, infileNames, sample=None)[source]: Merge the Runs trees

prepareTree(tree, sample=None, sampleCfg=None, description=None, backend=None)[source]

Add NanoAOD decorations, and create an RDataFrame backend

In addition to the arguments needed for the base class prepareTree`() method, a description of the tree, and settings for reading systematic variations or corrections from alternative branches, or calculating these on the fly, should be passed, such that the decorations can be constructed accordingly.

Parameters:: description – description of the tree format, and configuration for reading or calculating systematic variations and corrections, a NanoAODDescription instance (see also bamboo.treedecorators.NanoAODDescription.get())

readCounters(resultsFile)[source]: Sum over each leaf of the (merged) Runs tree (except run)

class bamboo.analysismodules.NanoAODSkimmerModule(args)[source]

A SkimmerModule for NanoAOD,: with decorations and merging of counters from NanoAODModule

__init__(args)[source]

Constructor

set up argument parsing, calling addArgs() and initialize()

Parameters:: args – list of command-line arguments that are not parsed by bambooRun

class bamboo.analysismodules.SkimmerModule(args)[source]

Base skimmer module

Left for backwards-compatibility, please use a HistogramsModule that defines one or more bamboo.plots.Skim products instead.

addArgs(parser)[source]

Hook for adding module-specific argument parsing (receives an argument group),: parsed arguments are available in self.args afterwards

definePlots(tree, noSel, sample=None, sampleCfg=None)[source]

Main method: define plots on the trees (for a give systematic variation)

should be implemented by concrete modules, and return a list of bamboo.plots.Plot objects. The structure (name, binning) of the histograms should not depend on the sample, era, and the list should be the same for all values (the weights and systematic variations associated with weights or collections may differ for data and different MC samples, so the actual set of histograms will not be identical).

Parameters:

tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file

defineSkimSelection(tree, noSel, sample=None, sampleCfg=None)[source]

Main method: define a selection for the skim

should be implemented by concrete modules, and return a bamboo.plots.Selection object

Parameters:

tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file

Returns:

the skim bamboo.plots.Selection, and a map { name: expression } of branches to store (to store all the branches of the original tree in addition, pass –keepOriginalBranches to bambooRun; individual branches can be added with an entry name: None entry)

Tree decorator customisation

Expressions are constructed by executing python code on decorated versions of decorated trees. The bamboo.treedecorators module contains helper methods to do so for commonly used formats, e.g. decorateNanoAOD() for CMS NanoAOD.

class bamboo.treedecorators.NanoSystematicVarSpec(nomName=None, origName=None, exclVars=None, isCalc=False)[source]

Interface for classes that specify how to incorporate systematics: or on-the-fly corrections in the decorated tree

See NanoAODDescription and decorateNanoAOD()

appliesTo(name)[source]

Return true if this systematic variation requires action: for this variable, group, or collection

changesTo(name)[source]: Return the new name(s) for a collection or group (assuming appliesTo(name) is True)

getVarName(branchName, collgrpname=None)[source]

Get the variable name and variation corresponding to an: (unprefixed, in case of groups or collections) branch name

nomName(name)[source]: Nominal systematic variation name for a group/collection

exclVars(name)[source]: Systematic variations to exclude for a group/collection

class bamboo.treedecorators.ReadVariableVarWithSuffix(commonName, sep='_', nomName='nominal', exclVars=None)[source]

Read variations of a single branch from branches with the same name with a suffix

appliesTo(name)[source]: True if name starts with the prefix

getVarName(branchName, collgrpname=None)[source]: Split into prefix and variation (if present, else nominal)

class bamboo.treedecorators.ReadJetMETVar(jetsName, metName, jetsNomName='nom', jetsOrigName='raw', metNomName='', metOrigName='raw', jetsExclVars=None, metExclVars=None, bTaggers=None, bTagWPs=None)[source]

Read jet and MET kinematic variations from different branches for automatic systematic uncertainties

Parameters:

jetsName – jet collection prefix (e.g. "Jet")
metName – MET prefix (e.g. "MET")
jetsNomName – name of the nominal jet variation ("nom" by default)
jetsOrigName – name of the original jet variation ("raw" by default)
metNomName – name of the nominal jet variation ("nom" by default)
metOrigName – name of the original jet variation ("raw" by default)
jetsExclVars – jet variations that are present but should be ignored (if not specified, only jetsOrigName is taken, so if specified this should usually be added explicitly)
metExclVars – MET variations that are present but should be ignored (if not specified, only metOrigName is taken, so if specified this should usually be added explicitly)
bTaggers – list of b-tagging algorithms, for scale factors stored in a branch
bTagWPs – list of b-tagging working points, for scale factors stored in a branch (shape should be included here, if wanted)

Note

The implementation of automatic systematic variations treats “xyzup” and “xyzdown” independently (since this is the most flexible). If a source of systematic uncertainty should be excluded, both the “up” and “down” variation should then be added to the list of variations to exclude (jetsExclVars or metExclVars).

appliesTo(name)[source]

Return true if this systematic variation requires action: for this variable, group, or collection

nomName(name)[source]: Nominal systematic variation name for a group/collection

exclVars(name)[source]: Systematic variations to exclude for a group/collection

getVarName(nm, collgrpname=None)[source]

Get the variable name and variation corresponding to an: (unprefixed, in case of groups or collections) branch name

class bamboo.treedecorators.NanoReadRochesterVar(systName=None)[source]

Read precalculated Rochester correction variations

Parameters:: systName – name of the systematic uncertainty, if variations should be enabled

appliesTo(name)[source]

Return true if this systematic variation requires action: for this variable, group, or collection

getVarName(nm, collgrpname=None)[source]

Get the variable name and variation corresponding to an: (unprefixed, in case of groups or collections) branch name

class bamboo.treedecorators.CalcCollectionsGroups(nomName='nominal', origName='raw', exclVars=None, changes=None, **colsAndAttrs)[source]

NanoSystematicVarSpec for on-the-fly corrections: and systematic variation calculation

appliesTo(name)[source]

Return true if this systematic variation requires action: for this variable, group, or collection

changesTo(name)[source]: Return the new name(s) for a collection or group (assuming appliesTo(name) is True)

getVarName(nm, collgrpname=None)[source]

Get the variable name and variation corresponding to an: (unprefixed, in case of groups or collections) branch name

class bamboo.treedecorators.NanoAODDescription(groups=None, collections=None, systVariations=None)[source]

Description of the expected NanoAOD structure, and configuration for systematics and corrections

Essentially, a collection of three containers:

collections a list of collections (by the name of the length leaf)
groups a list of non-collection groups (by prefix, e.g. HLT_)
systVariations a list of NanoSystematicVarSpec instances,
to configure reading systematics variations from branches, or calculating them on the fly

The recommended way to obtain a configuration is from the factory method get()

static get(tag, year='2016', isMC=False, addGroups=None, removeGroups=None, addCollections=None, removeCollections=None, systVariations=None)[source]

Create a suitable NanoAODDescription instance based on a production version

A production version is defined by a tag, data-taking year, and a flag to distinguish data from simulation. Any number of groups or collections can be added or removed from this. The systVariations option

Example:

>>> decorateNanoAOD(tree, NanoAODDescription.get(
>>>     "v5", year="2016", isMC=True,
>>>     systVariations=[nanoRochesterCalc, nanoJetMETCalc]))
>>> decorateNanoAOD(tree, NanoAODDescription.get(
>>>     "v5", year="2017", isMC=True,
>>>     systVariations=[nanoPUWeightVar, nanoReadJetMETVar_METFixEE2017]))

Parameters:

tag – production version (e.g. “v5”)
year – data-taking year
isMC – simulation or not
addGroups – (optional) list of groups of leaves to add (e.g. ["L1_", "HLT_"], if not present)
removeGroups – (optional) list of groups of leaves to remove (e.g. ["L1_"], if skimmed)
addCollections – (optional) list of containers to add (e.g. ["nMyJets"])
removeCollections – (optional) list of containers to remove (e.g. ["nPhoton", "nTau"])
systVariations – list of correction or systematic variation on-the-fly calculators or configurations to add (NanoSystematicVarSpec instances)

Helper functions

The bamboo.analysisutils module bundles a number of more specific helper methods that use the tree decorators and integrate with other components, connect to external services, or are factored out of the classes in bamboo.analysismodules to facilitate reuse.

class bamboo.analysisutils.YMLIncludeLoader(stream)[source]

Custom yaml loading to support including config files.: Use !include (file) to insert content of file at that position.

bamboo.analysisutils.addLumiMask(sel, jsonName, runRange=None, runAndLS=None, name='goodlumis')[source]

Refine selection with a luminosity block filter

Typically applied directly to the root selection (for data). runAndLS should be a tuple of expressions with the run number and luminosity block ID. The run range is used to limit the part of the JSON file to consider, see the LumiMask helper class for details.

bamboo.analysisutils.addPrintout(selection, funName, *args)[source]

Call a method with debugging printout, as part of the RDataFrame graph

This method is only meant to work with the default backend, since it works by inserting a Filter node that lets all events pass.

Parameters:

selection – selection for which to add the printout. The function call will be added to the RDataFrame graph in its current state, so if a plot causes a problem this method should be called before defining it.
funName – name of a C++ method to call. This method should always return true, and can take any number of arguments.
args – arguments to pass to the function

The following example would print the entry and event number for each event that passes some selection.

Example:

>>> from bamboo.root import gbl
>>> gbl.gInterpreter.Declare("""
... bool bamboo_printEntry(long entry, long event) {
...   std::cout << "Processing entry #" << entry << ": event " << event << std::endl;
... }""")
>>> addPrintout(sel, "bamboo_printEntry", op.extVar("ULong_t", "rdfentry_"), t.event)

bamboo.analysisutils.configureElectrons(variProxy, paramsFile, scale='', smearing='', jsonFileRandomGenerator='', randomDistr='stdnormal', addSystematics=False, enableSystematics=None, isMC=True, backend=None)[source]

Apply the energy correction for electrons (Run3 only)

Parameters:

variProxy – tau variatons proxy, e.g. tree._Electron for NanoAOD
paramsFile – path of the json file with correction parameters
scale – name of the scale correction, e.g. EGMScale_Compound_Ele_2022postEE
smearing – name of the smearing correction, e.g. EGMSmearAndSyst_ElePTsplit_2022postEE
jsonFileRandomGenerator – json file with random number generator (e.g. https://gitlab.cern.ch/cms-analysis/general/bamboo/-/blob/master/ tests/data/randomNumbers.json.gz)
randomDistr – random distribution, default is stdnormal
addSystematics – add systematic uncertaintis
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
isMC – MC or not
backend – backend pointer (returned from prepareTree())

bamboo.analysisutils.configureJets(variProxy, jsonFile, jetType, jec=None, jecLevels='default', smear=None, useGenMatch=True, genMatchDR=0.2, genMatchDPt=3.0, jesUncertaintySources=None, smearingToolName='JERSmear', jsonFileSmearingTool=None, splitJER=False, addHEM2018Issue=False, enableSystematics=None, jetAlgoSubjet='AK4PFPuppi', jecSubjet=None, jsonFileSubjet=None, jecLevelSubjet='default', subjets=None, isMC=False, jecJetType=None, backend=None)[source]

Reapply JEC, set up jet smearing, or prepare JER/JES uncertainties collections

Parameters:

variProxy – jet variations proxy, e.g. tree._Jet
jsonFile – json file with JEC corrections
jetType – jet type, e.g. AK4PFchs
smear – tag of resolution (and scalefactors) to use for smearing (no smearing is done if unspecified)
jec – tag of the new JEC to apply, or for the JES uncertainties (pass an empty list to jecLevels to produce only the latter without reapplying the JEC)
jecLevels – list of JEC levels to apply (if left out the recommendations are used: L1FastJet, L2Relative, L3Absolute, and also L2L3Residual for data)
jesUncertaintySources – list of jet energy scale uncertainty sources (see the JECUncertaintySources twiki),
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
useGenMatch – use matching to generator-level jets for resolution smearing
genMatchDR – DeltaR for generator-level jet matching (half the cone size is recommended, default is 0.2)
genMatchDPt – maximal relative PT difference (in units of the resolution) between reco and gen jet
jsonFileSmearingTool – json file with JER smearing tool (e.g. /cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/JME/jer_smear.json.gz )
smearingToolName – name of the JER smearing tool, default is JERSmear
splitJER – vary the JER uncertainty independently in six kinematic bins (see the JER uncertainty twiki)
addHEM2018Issue – add a JES uncertainty for the HEM issue in 2018 (see this hypernews post)
subjets – subjets proxy (tree.SubJet)
isMC – MC or not
jecJetType – jet type to use for JEC payload lookup (optional). Defaults to jetType if not specified. Useful when applying JEC of a different jet collection (e.g. AK4PFPuppi JEC applied to AK8PFPuppi jets).
backend – backend pointer (returned from prepareTree())

bamboo.analysisutils.configureMuons(variProxy, paramsFile, jsonFileRandomGenerator, randomDistr='stdflat', addSystematics=False, enableSystematics=None, isMC=True, backend=None)[source]

Apply the energy correction for muons (Run3 only)

Parameters:

variProxy – tau variatons proxy, e.g. tree._Muon for NanoAOD
paramsFile – path of the json file with correction parameters
jsonFileRandomGenerator – json file with random number generator (e.g. https://gitlab.cern.ch/cms-analysis/general/bamboo/-/blob/master/ tests/data/randomNumbers.json.gz)
randomDistr – random distribution, default is stdflat
addSystematics – add systematic uncertaintis
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
isMC – MC or not
backend – backend pointer (returned from prepareTree())

bamboo.analysisutils.configureRochesterCorrection(variProxy, paramsFile, isMC=False, backend=None, uName='')[source]

Apply the Rochester correction for muons

Parameters:

variProxy – muon variatons proxy, e.g. tree.._Muon for NanoAOD
paramsFile – path of the text file with correction parameters
isMC – MC or not
backend – backend pointer (returned from prepareTree())
uName – [deprecated, ignored] unique name for the correction calculator (sample name is a safe choice)

bamboo.analysisutils.configureSVfitCalculator(pathToSVfit='', backend=None)[source]

Configure SVfit

Parameters:

pathToSVfit – path to your SVfit installation
backend – backend pointer (returned from prepareTree())

bamboo.analysisutils.configureTaus(variProxy, paramsFile, tauIdAlgo, tauCorr='tau_energy_scale', tauWP=None, tauWPvsE=None, addSystematics=False, splitSystematics=False, enableSystematics=None, isMC=False, backend=None)[source]

Apply the energy correction for taus

Parameters:

variProxy – tau variatons proxy, e.g. tree._Tau for NanoAOD
paramsFile – path of the json file with correction parameters
tauIdAlgo – name of the algorithm for the tau identification, e.g. “DeepTau2017v2p1”
tauCorr – name of the correction, e.g. “tau_energy_scale”
tauWP – working point for DeepTau2018v2p5VSjet
tauWPvsE – working point for DeepTau2018v2p5VSe
addSystematics – add systematic uncertainties
splitSystematics – split systematic uncertainties (If splitSystematics is set to true, the TES systematic uncertainty is split between genuine taus and e->tau fakes.)
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
isMC – MC or not
backend – backend pointer (returned from prepareTree())

bamboo.analysisutils.configureType1MET(variProxy, jsonFile, jec=None, jecLevels='default', smear=None, isT1Smear=False, useGenMatch=True, genMatchDR=0.2, genMatchDPt=3.0, jesUncertaintySources=None, smearingToolName='JERSmear', jsonFileSmearingTool=None, splitJER=False, isXYCorrected=False, eraForXYCorrection=None, jsonFileXYCorrection=None, addHEM2018Issue=False, enableSystematics=None, isMC=False, backend=None)[source]

Reapply JEC, set up jet smearing, or prepare JER/JES uncertainties collections

Parameters:

variProxy – MET variations proxy, e.g. tree._MET
jsonFile – json file with JEC corrections
smear – tag of resolution (and scalefactors) to use for smearing (no smearing is done if unspecified)
isT1Smear – T1Smear (smeared as nominal, all variations with respect to that) if True, otherwise T1 (JES variations with respect to the unsmeared MET, jerup and jerdown variations are nominally smeared)
jec – tag of the new JEC to apply, or for the JES uncertainties
jesUncertaintySources –
list of jet energy scale uncertainty sources (see the JECUncertaintySources twiki),
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
useGenMatch – use matching to generator-level jets for resolution smearing
genMatchDR – DeltaR for generator-level jet matching (half the cone size is recommended, default is 0.2)
genMatchDPt – maximal relative PT difference (in units of the resolution) between reco and gen jet
jsonFileSmearingTool – json file with JER smearing tool (e.g. /cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/JME/jer_smear.json.gz )
smearingToolName – name of the JER smearing tool, default is JERSmear
splitJER –
vary the JER uncertainty independently in six kinematic bins (see the JER uncertainty twiki)
addHEM2018Issue –
add a JES uncertainty for the HEM issue in 2018 (see this hypernews post)
isXYCorrected – apply XY correction to MET
eraForXYCorrection – era for the XY Correction, e.g 2022, 2022EE, 2023, or 2023BPix
jsonFileXYCorrection – json file with XY corrections
isMC – MC or not
be – backend pointer (returned from prepareTree())

bamboo.analysisutils.forceDefine(arg, selection, includeSub=True)[source]

Force the definition of an expression as a column at a selection stage

Use only for really computation-intensive operations that need to be precalculated

Parameters:

arg – expression to define as a column
selection – Selection for which the expression should be defined
includeSub – also precalculate for data-driven background ‘shadow’ selections (bamboo.plots.SelectionWithSub ‘sub’-selections)

bamboo.analysisutils.getAFileFromAnySample(samples, resolveFiles=None, cfgDir='.')[source]

Helper method: get a file from any sample (minimizing the risk of errors)

Tries to find any samples with: - a list of files - a cache file - a SAMADhi path - a DAS path

If successful, a single read / query is sufficient to retrieve a file

bamboo.analysisutils.loadPlotIt(config, plotList, eras=None, workdir='.', resultsdir='.', readCounters=<function <lambda>>, vetoFileAttributes=None, plotDefaults=None)[source]

Load the plotit configuration with the plotIt python library

The plotIt YAML file writing and parsing is skipped in this case (to write the file, the writePlotIt() method should be used, with the same arguments).

Parameters:

config – parsed analysis configuration. Only the configuration (if present) and eras sections (to get the luminosities) are read.
plotList – list of plots to convert (name and plotopts, combined with the default style)
eras – list of eras to consider (None for all that are in the config)
workdir – output directory
resultsdir – directory with output ROOT files with histograms
readCounters – method to read the sum of event weights from an output file
vetoFileAttributes – list of per-sample keys that should be ignored (those specific to the bamboo part, e.g. job splitting and DAS paths)
plotDefaults – plot defaults to add (added to those from config["plotIt"]["plotdefaults"], with higher precedence if present in both)

bamboo.analysisutils.makeMultiPrimaryDatasetTriggerSelection(sampleName, datasetsAndTriggers)[source]

Construct a selection that prevents processing multiple times (from different primary datasets)

If an event is passes triggers for different primary datasets, it will be taken from the first of those (i.e. the selection will be ‘passes one of the triggers that select it for this primary dataset, and not for any of those that come before in the input dictionary).

Parameters:

sampleName – sample name
datasetsAndTriggers – a dictionary {primary-dataset, set-of-triggers}, where the key is either a callable that takes a sample name and returns true in case it originates from the corresponding primary datasets, or a string that is the first part of the sample name in that case. The value (second item) can be a single expression (e.g. a trigger flag, or an OR of them), or a list of those (in which case an OR-expression is constructed from them).

Returns:

an expression to filter the events in the sample with given name

Example:

>>> if not self.isMC(sample):
>>>     trigSel = noSel.refine("trigAndPrimaryDataset",
>>>         cut=makeMultiPrimaryDatasetTriggerSelection(sample, {
>>>               "DoubleMuon" : [t.HLT.Mu17_TrkIsoVVL_Mu8_TrkIsoVVL,
>>>                               t.HLT.Mu17_TrkIsoVVL_TkMu8_TrkIsoVVL],
>>>               "DoubleEG"   : t.HLT.Ele23_Ele12_CaloIdL_TrackIdL_IsoVL_DZ,
>>>               "MuonEG"     : [t.HLT.Mu23_TrkIsoVVL_Ele12_CaloIdL_TrackIdL_IsoVL,
>>>                               t.HLT.Mu8_TrkIsoVVL_Ele23_CaloIdL_TrackIdL_IsoVL ]
>>>               }))

bamboo.analysisutils.makePileupWeight(puWeights, numTrueInteractions, systName=None, nameHint=None, sel=None, defineOnFirstUse=True)[source]

Construct a pileup weight for MC, based on the weights in a JSON file

Parameters:

puWeights – path of the JSON file with weights (binned in NumTrueInteractions) for cp3-llbb JSON, or tuple of JSON path and correction name (correctionlib JSON)
numTrueInteractions – expression to get the number of true interactions (Poissonian expectation value for an event)
systName – name of the associated systematic nuisance parameter
sel – a selection in the current graph (only used to retrieve a pointer to the backend)

bamboo.analysisutils.printCutFlowReports(config, reportList, workdir='.', resultsdir='.', suffix=None, readCounters=<function <lambda>>, eras=('all', None), verbose=False)[source]

Print yields to the log file, and write a LaTeX yields table for each

Samples can be grouped (only for the LaTeX table) by specifying the yields-group key (overriding the regular groups used for plots). The sample (or group) name to use in this table should be specified through the yields-title sample key.

In addition, the following options in the plotIt section of the YAML configuration file influence the layout of the LaTeX yields table:

yields-table-stretch: \arraystretch value, 1.15 by default
yields-table-align: orientation, h (default), samples in rows,
or v, samples in columns
yields-table-text-align: alignment of text in table cells (default: c)
yields-table-numerical-precision-yields: number of digits after
the decimal point for yields (default: 1)
yields-table-numerical-precision-ratio: number of digits after
the decimal point for ratios (default: 2)

bamboo.analysisutils.readEnvConfig(explName=None)[source]

Read computing environment config file (batch system, storage site etc.)

For using a batch cluster, the [batch] section should have a ‘backend’ key, and there should be a section with the name of the backend (slurm, htcondor…), see bamboo.batch_<backend> for details. The storage site information needed to resolve the PFNs for datasets retrieved from DAS should be specified under the [das] section (sitename and storageroot).

bamboo.analysisutils.runPlotIt(cfgName, workdir='.', plotsdir='plots', plotIt='plotIt', eras=('all', None), verbose=False)[source]

Run plotIt

Parameters:

cfgName – plotIt YAML config file name
workdir – working directory (also the starting point for finding the histograms files, --i option)
plotsdir – name of the plots directory inside workdir (plots, by default)
plotIt – path of the plotIt executable
eras – (mode, eras), mode being one of "split", "combined", or "all" (both of the former), and eras a list of era names, or None for all
verbose – print the plotIt command being run

bamboo.analysisutils.splitVariation(variProxy, variation, regions, nomName='nom')[source]

Split a systematic variation between (kinematic) regions (to decorrelate the nuisance parameter)

Parameters:

variProxy – jet variations proxy, e.g. tree._Jet
variation – name of the variation that should be split (e.g. “jer”)
regions – map of region names and selections (for non-collection objects: boolean expression, for collection objects: a callable that returns a boolean for an item from the collection)
nomName – name of the nominal variation (“nom” for postprocessed, “nominal” for calculator)

Example:

>>> splitVariation(tree._Jet, "jer", {"forward" : lambda j : j.eta > 0.,
>>>                                   "backward" : lambda j : j.eta < 0.})

bamboo.analysisutils.writePlotIt(config, plotList, outName, eras=None, workdir='.', resultsdir='.', readCounters=<function <lambda>>, vetoFileAttributes=None, plotDefaults=None)[source]

Combine creation and saving of a plotIt config file

for convenience inside a HistogramsModule, the individual parts are also available in bamboo.analysisutils.

Parameters:

config – parsed analysis configuration. Only the configuration (if present) and eras sections (to get the luminosities) are read.
plotList – list of plots to convert (name and plotopts, combined with the default style)
outName – output YAML config file name
eras – valid era list
workdir – output directory
resultsdir – directory with output ROOT files with histograms
readCounters – method to read the sum of event weights from an output file
vetoFileAttributes – list of per-sample keys that should be ignored (those specific to the bamboo part, e.g. job splitting and DAS paths)
plotDefaults – plot defaults to add (added to those from config["plotIt"]["plotdefaults"], with higher precedence if present in both)

Scale factors

The bamboo.scalefactors module contains helper methods for configuring scale factors, fake rates etc.

The basic configuration parameter is the JSON file path for a set of scalefactors. There two basic types are

lepton scale factors (dependent on a number of object variables, e.g. pt and eta),
jet (b-tagging) scale factors (grouped set for different flavours, for convenience)

Different values (depending on the data-taking period) can be taken into account by weighting or by randomly sampling.

class bamboo.scalefactors.BtagSF(taggerName, csvFileName, wp=None, sysType='central', otherSysTypes=None, measurementType=None, getters=None, jesTranslate=None, sel=None, uName=None)[source]

Helper for b- and c-tagging scalefactors using the BTV POG reader

__call__(jet, nomVar=None, systVars=None)[source]

Evaluate the scalefactor for a jet

Please note that this only gives the applicable scalefactor: to obtain the event weight one of the recipes in the POG twiki should be used.

By default the nominal and systematic variations are taken from the bamboo.scalefactors.BtagSF instance, but they can be overriden with the nomVar and systVars keyword arguments. Please note that when using split uncertainties (e.g. for the reshaping method) some uncertainties only apply to specific jet flavours (e.g. c-jets) and the csv file contains zeroes for the other flavours. Then the user code should check the jet flavours, and call this method with the appropriate list of variations for each.

__init__(taggerName, csvFileName, wp=None, sysType='central', otherSysTypes=None, measurementType=None, getters=None, jesTranslate=None, sel=None, uName=None)[source]

Declare a BTagCalibration (if needed) and BTagCalibrationReader (unique, based on uName),: and decorate for evaluation

Warning

This function is deprecated. Use correctionlib and helpers in scalefactors instead.

Parameters:

taggerName – first argument for BTagCalibration
csvFileName – name of the CSV file with scalefactors
wp – working point (used as BTagEntry::OP_{wp.upper()})
sysType – nominal value systematic type ("central", by default)
otherSysTypes – other systematic types to load in the reader
measurementType – dictionary with measurement type per true flavour (B, C, and UDSG), or a string if the same for all (if not specified, "comb" will be used for b- and c-jets, and incl for light-flavour jets)
getters – dictionary of methods to get the kinematics and classifier for a jet (the keys Pt, Eta, JetFlavour, and Discri are used. For the former three, the defaults are for NanoAOD)
jesTranslate – translation function for JEC systematic variations, from the names in the CSV file to those used for the jets (the default should work for on-the-fly corrections)
sel – a selection in the current graph
uName – unique name, to declare the reader (e.g. sample name)

bamboo.scalefactors.get_bTagSF_fixWP(json_path, tagger, wp, flavour, sel, jet_pt_variation=None, light_method='incl', heavy_method='comb', syst_prefix='btagSF_fixWP_', decorr_wps=False, decorr_eras=False, era=None, full_scheme=False, syst_mapping=None, defineOnFirstUse=True)[source]

Build correction evaluator for fixed working point b-tagging scale factors

Loads the b-tagging scale factors as correction object from the JSON file, configures the systematic variations, and returns a callable that can be evaluated on a jet to return the scale factor.

Parameters:

json_path – JSON file path
tagger – name of the tagger inside the JSON (not the same as in the event!)
wp – working point of the tagger (“L”, “M”, “T”)
flavour – hadron flavour of the jet (0, 4, 5)
sel – a selection in the current graph (only used to retrieve a pointer to the backend)
jet_pt_variation – if specified, only use that specific systematic variation (e.g. the nominal) of the jet pt to evaluate the scale factors. By default, the scale factors are evaluated for each variation.
light_method – B-tagging measurement method for light-flavour jets (“incl” or “light”). Name depends on the naming convention (Run2 vs Run3).
heavy_method – B-tagging measurement method for heavy-flavour jets (“comb” or “mujets”).
syst_prefix – Prefix to prepend to the name of all resulting the b-tagging systematic variations. Variations for light or heavy jets will be prefixed resp. by {syst_prefix}light or {syst_prefix}heavy (unless the full scheme is used).
decorr_wps – If True, insert the working point into the systematic name for the uncorrelated/statistical component. Otherwise, all working points will be taken as fully correlated when using several in the analysis.
decorr_eras – If True, use the scale factor uncertainties split into “uncorrelated” and “correlated” parts, and insert the era name into the variation names. If False, only use the total scale factor uncertainties (not split).
era – Name of era, used in the name of systematic variations if one of decorr_eras or full_scheme is True.
full_scheme – If True, use split uncertainty sources as specified in the full_scheme argument
syst_mapping – Dictionary used to list the systematics to consider, and to map the naming of the full-scheme b-tagging uncertainties to variations defined elsewhere in the analysis, for varying them together when needed (see example below).
defineOnFirstUse – see description in get_correction()

Returns:

a callable that takes a jet and returns the correction (with systematic variations as configured here) obtained by evaluating the b-tagging scale factors on the jet

Example:

>>> btvSF_b = get_bTagSF_fixWP("btv.json", "deepJet", "M", 5, sel, syst_prefix="btagSF_",
>>>                            era="2018UL", full_scheme=True,
>>>                            syst_mapping={
>>>                                "pileup": "pileup",
>>>                                "type3": None,
>>>                                "jer0": "jer",
>>>                                "jer1": "jer"
>>>                             })

Will result in the following systematic uncertainties:

btagSF_statistic_2018UL{up/down}: mapped to {up/down}_statistic in the JSON
btagSF_pileup{up/down}: mapped to {up/down}_pileup in the JSON, and correlated
with the pileup{up/down} variations in the analysis
btagSF_type3{up/down}: mapped to {up/down}_type3 in the JSON
btagSF_jer0{up/down}: mapped to {up/down}_jer in the JSON, and correlated
with the jer0{up/down} variations in the analysis
btagSF_jer1{up/down}: mapped to {up/down}_jer in the JSON, and correlated
with the jer1{up/down} variations in the analysis

>>> btvSF_b = get_bTagSF_fixWP("btv.json", "deepJet", "M", 5, sel, syst_prefix="btagSF_",
>>>                            era="2018UL", decorr_wps=True, decorr_eras=True)

Will result in the following systematic uncertainties:

btagSF_heavy_M_2018UL{up/down}: mapped to {up/down}_uncorrelated in the JSON
btagSF_heavy{up/down}: mapped to {up/down}_correlated in the JSON

bamboo.scalefactors.get_bTagSF_itFit(json_path, tagger_json, tagger_jet, flavour, sel, jet_pt_variation=None, syst_prefix='btagSF_shape_', decorr_eras=False, era=None, syst_mapping=None, defineOnFirstUse=True)[source]

Build correction evaluator for continuous (iterativeFit) b-tagging scale factors

Loads the b-tagging scale factors as correction object from the JSON file, configures the systematic variations, and returns a callable that can be evaluated on a jet to return the scale factor.

Parameters:

json_path – JSON file path
tagger_json – name of the tagger inside the JSON
tagger_jet – name of the tagger in the tree
flavour – hadron flavour of the jet (0, 4, 5)
sel – a selection in the current graph (only used to retrieve a pointer to the backend)
jet_pt_variation – see description in get_bTagSF_fixWP()
syst_prefix – Prefix to prepend to the name of all resulting the b-tagging systematic variations.
decorr_eras – If True, insert the era into the variation name for statistical uncertainties
era – Name of era, used in the name of systematic variations if decorr_eras is True
syst_mapping – see description in get_bTagSF_fixWP(), with the difference that here the “basic” (non-JES-related) variations are already included no matter what.
defineOnFirstUse – see description in get_correction()

Returns:

a callable that takes a jet and returns the correction (with systematic variations as configured here) obtained by evaluating the b-tagging scale factors on the jet

Example:

>>> btvSF_b = get_bTagSF_itFit("btv.json", "deepJet", "btagDeepFlavB", 5, sel, syst_prefix="btagSF_",
>>>                            decorr_eras=True, era="2018UL",
>>>                            syst_mapping={"jesTotal": "jes"})

Will result in the following systematic uncertainties:

btagSF_hfstats1_2018UL{up/down}: mapped to {up/down}_hfstats1 in the JSON
btagSF_hfstats2_2018UL{up/down}: mapped to {up/down}_hfstats2 in the JSON
btagSF_lfstats1_2018UL{up/down}: mapped to {up/down}_lfstats1 in the JSON
btagSF_lfstats2_2018UL{up/down}: mapped to {up/down}_lfstats2 in the JSON
btagSF_hf{up/down}: mapped to {up/down}_hf in the JSON
btagSF_lf{up/down}: mapped to {up/down}_lf in the JSON
btagSF_jesTotal{up/down}: mapped to {up/down}_jes in the JSON, and correlated
with the jesTotal{up/down} variations in the analysis

(for c jets, the hf and lf variations are absent and replaced by cferr1 and cferr2)

bamboo.scalefactors.get_correction(path, correction, params=None, systParam=None, systNomName='nominal', systVariations=None, systName=None, defineOnFirstUse=True, sel=None)[source]

Load a correction from a CMS JSON file

The JSON file is parsed with correctionlib. The contents and structure of a JSON file can be checked with the correction script, e.g. correction summary sf.json

Parameters:

path – JSON file path
correction – name of the correction inside CorrectionSet in the JSON file
params – parameter definitions, a dictionary of values or functions
systParam – name of the parameter (category axis) to use for systematic variations
systNomName – name of the systematic category axis to use as nominal
systVariations – systematic variations list or {variation: name_in_json}
systName – systematic uncertainty name (to prepend to names, if ‘up’ and ‘down’)
defineOnFirstUse – wrap with defineOnFirstUse() (to define as a column and reuse afterwards), this is enabled by default since it is usually more efficient
sel – a selection in the current graph (only used to retrieve a pointer to the backend)

Returns:

a callable that takes (object) and returns the correction (with systematic variations, if present and unless a specific variation is requested) obtained by evaluating the remaining parameters with the object

Example:

>>> elIDSF = get_correction("EGM_POG_SF_UL.json", "electron_cutbased_looseID",
>>>     params={"pt": lambda el : el.pt, "eta": lambda el : el.eta},
>>>     systParam="weight", systNomName="nominal", systName="elID", systVariations=("up", "down")
>>>     )
>>> looseEl = op.select(t.Electron, lambda el : el.looseId)
>>> withDiEl = noSel.refine("withDiEl",
>>>     cut=(op.rng_len(looseEl) >= 2),
>>>     weight=[ elIDSF(looseEl[0]), elIDSF(looseEl[1]) ]
>>>     )

bamboo.scalefactors.get_scalefactor(objType, key, combine=None, additionalVariables=None, sfLib=None, paramDefs=None, lumiPerPeriod=None, periods=None, getFlavour=None, isElectron=False, systName=None, seedFun=None, defineOnFirstUse=True)[source]

Construct a scalefactor callable

Warning

This function is deprecated. Use correctionlib and get_correction() instead.

Parameters:

objType – object type: "lepton", "dilepton", or "jet"
key – key in sfLib (or tuple of keys, in case of a nested dictionary), or JSON path (or list thereof) if sfLib is not specified
sfLib – dictionary (or nested dictionary) of scale factors. A scale factor entry is either a path to a JSON file, or a list of pairs (periods, path), where periods is a list of periods found in lumiPerPeriod and path is a path to the JSON file with the scale factors corresponding to those run periods.
combine – combination strategy for combining different run periods ("weight" or "sample")
paramDefs – dictionary of binning variable definitions (name to callable)
additionalVariables – additional binning variable definitions (TODO: remove)
lumiPerPeriod – alternative definitions and relative weights of run periods
periods – Only combine scale factors for those periods
isElectron – if True, will use supercluster eta instead of eta (for "lepton" type only) (TODO: find a better way of doing that)
systName – name of the associated systematic nuisance parameter
seedFun – (only when combining scalefactor by sampling) callable to get a random generator seed for an object, e.g. lambda l : l.idx+42
defineOnFirstUse – wrap with defineOnFirstUse() (to define as a column and reuse afterwards), this is enabled by default since it is usually more efficient

Returns:

a callable that takes (object, variation="Nominal") and returns a floating-point number proxy

bamboo.scalefactors.lumiPerPeriod_default = {'Run2016B': 5750.491, 'Run2016C': 2572.903, 'Run2016D': 4242.292, 'Run2016E': 4025.228, 'Run2016F': 3104.509, 'Run2016G': 7575.824, 'Run2016H': 8650.628, 'Run2017B': 4793.97, 'Run2017C': 9632.746, 'Run2017D': 4247.793, 'Run2017E': 9314.581, 'Run2017F': 13539.905, 'Run2018A': 14027.614, 'Run2018B': 7066.552, 'Run2018C': 6898.817, 'Run2018D': 31747.582, 'Run271036to275783': 6274.191, 'Run275784to276500': 3426.131, 'Run276501to276811': 3191.207, 'Run315264to316360': 8928.97, 'Run316361to325175': 50789.746}: Integrated luminosity (in 1/pb) per data taking period

bamboo.scalefactors.makeBtagWeightItFit(jets, sfGetter)[source]

Construct the full event weight based on b-tagging scale factors (continous/iterativeFit)

Combines the b-tagging scale factors into the event weight needed to correct the simulation (the event weight can then directly be passed to a selection), by making a product of the scale factors over all jets. See the note about correcting the normalization when using these scale factors.

Parameters:

jets – the jet collection in the event
sfGetter – a callable that takes the the hadron flavour (int) and returns the correction object for the b-tagging scale factors of that jet flavour (i.e., itself a callable that takes the jet and returns the scale factor) See bamboo.scalefactors.get_bTagSF_itFit().

Returns:

a weight proxy (with all systematic variations configured in the scale factors)

Example:

>>> btvSF = lambda flav: get_bTagSF_itFit("btv.json", "deepJet", "btagDeepFlavB", flav, ...)
>>> btvWeight = makeBtagWeightItFit(tree.jet, btvSF)
>>> sel_btag = sel.refine("btag", cut=..., weight=btvWeight)

bamboo.scalefactors.makeBtagWeightMeth1a(jets, tagger, wps, workingPoints, sfGetter, effGetters)[source]

Construct the full event weight based on b-tagging scale factors (fixed working point) and efficiencies

Combines the b-tagging scale factors and MC efficiencies for fixed working points into the event weight needed to correct the simulation (the event weight can then directly be passed to a selection). The weight is computed according to Method 1a, # noqa: B950 with support for several working points.

While the scale factors can be loaded from the POG-provided JSON files, the efficiencies need to be computed by the analyzers.

Parameters:

jets – the jet collection in the event
tagger – the name of the tagger in the event
wps – a list of working points for which the b tagging must be corrected, e.g. [“L”, “T”] for computing the weight using scale factors for the L and T working points. Note: always provide the working points in increasing order of “tightness”!
workingPoints – a dictionary providing the working point value for the discriminator
sfGetter – a callable that takes the working point (str) and the hadron flavour (int) and returns the correction object for the b-tagging scale factors of that working point and jet flavour (i.e., itself a callable that takes the jet and returns the scale factor). See bamboo.scalefactors.get_bTagSF_fixWP().
effGetters – a dictionary with keys = working points and values = callables that can be evaluated on a jet and return the b-tagging efficiency for that working point. Typically, these would be correction objects parameterized using the jet pt, eta and hadron flavour.

Returns:

a weight proxy (with all systematic variations configured in the scale factors)

Example:

>>> btvSF = lambda wp, flav: get_bTagSF_fixWP("btv.json", "deepJet", wp, flav, ...)
>>> btvEff = {"M": get_correction("my_btag_eff.json", ...)}
>>> btvWeight = makeBtagWeightMeth1a(tree.jet, "btagDeepFlavB", ["M"], {"M": 0.2783},
>>>                                  btvSF, btvEff)
>>> sel_btag = sel.refine("btag", cut=..., weight=btvWeight)

ROOT utilities

The bamboo.root module collects a set of thin wrappers around ROOT methods, and centralizes the import of the Cling interpreter global namespace in PyROOT. For compatibility, it is recommended that user code uses from bamboo.root import gbl rather than import ROOT as gbl or from cppyy import gbl.

bamboo.root.addDynamicPath(libPath)[source]: Add a dynamic library path to the ROOT interpreter

bamboo.root.addIncludePath(incPath)[source]: Add an include path to the ROOT interpreter

bamboo.root.findLibrary(libName)[source]: Check if a library can be found, and returns the path in that case

bamboo.root.loadDependency(bambooLib=None, includePath=None, headers=None, dynamicPath=None, libraries=None)[source]

Load a C++ extension

Parameters:

bambooLib – name(s) of the bamboo extension libraries, if any
includePath – include directory for headers
headers – headers to load explicitly (which can depend on other headers in the inclue path)
dynamicPath – dynamic library path to add
libraries – additional shared libraries to load

bamboo.root.loadHeader(headerName)[source]: Include a C++ header in the ROOT interpreter

bamboo.root.loadLibrary(libName)[source]: Load a shared library in the ROOT interpreter

class bamboo.root.once(fun)[source]: Function decorator to make sure things are not loaded more than once