API Reference
This page lists the classes and methods that are necessary for building an analysis, but are not related to expressions (see Building expressions for a description of those)—the aim is to provide a set of easy-to-use classes and methods.
Plots and selections
The bamboo.plots
module provides high-level classes to represent
and manipulate selections and plots.
- class bamboo.plots.CategorizedSelection(parent=None, categories=None, name=None)[source]
Helper class to represent a group of similar selections on different categories
The interface is similar, but not identical to that of
Selection
(constructingPlot
objects is done through themakePlots()
method, which takes additional arguments). Each category selection can have a candidate, typically the object or group of object that differs between the categories. The axis variables can then either be expressions, or callables that will be passed this per-category object.- Example:
>>> muonSel = noSel.refine("hasMuon", cut=( >>> op.rng_len(muons) > 0, op.OR(op.rng_len(electrons) == 0, >>> muons[0].pt > electrons[0].pt))) >>> electronSel = noSel.refine("hasElectron", cut=( >>> op.rng_len(electrons) > 0, op.OR(op.rng_len(muons) == 0, >>> electrons[0].pt > muons[0].pt))) >>> oneLeptonSel = CategorizedSelection(categories={ ... "Mu" : (muonSel, muons[0]), ... "El" : (electronSel, electrons[0]) ... }) >>> oneLep2JSel = onLeptonSel.refine("hasLep2J", cut=(op.rng_len(jets) >= 2)) >>> plots += oneLep2JSel.makePlots("J1PT", jets[0].pt, EqB(50, 0., 150.)) >>> plots += oneLep2JSel.makePlots("LJ1Mass", ... (lambda l : op.invariant_mass(jets[0].p4, l.p4)), EqB(50, 0., 200.))
- __init__(parent=None, categories=None, name=None)[source]
Construct a group of related selections
- Parameters:
name – name (optional)
parent – parent CategorizedSelection (optional)
categories – dictionary of a
Selection
and candidate (any python object) per category (key is category name), see theaddCategory()
method below
- addCategory(catName, selection, candidate=None)[source]
Add a category
- Parameters:
catName – category name
selection –
Selection
for this categorycandidate – any python object with event-level quantities specific to this category
- makePlots(name, axisVariables, binnings, construct=None, savePerCategory=True, saveCombined=True, combinedPlotType=<class 'bamboo.plots.SummedPlot'>, **kwargs)[source]
Make a plot for all categories, and/or a combined one
- Parameters:
name – plot name (per-category plot names will be
"{name}_{category}"
)axisVariables – one or more axis variables
binnings – as many binnings as variables
construct – plot factory method, by default the
make{N}D
method ofPlot
(with N the number of axis variables)savePerCategory – save the individual plots (enabled by default)
saveCombine – save the combined plot (enabled by default)
combinedPlotType – combined plot type,
SummedPlot
by default
- Returns:
a list of plots
- refine(name, cut=None, weight=None, autoSyst=True)[source]
Equivalent of
refine()
, but for all categories at a time- Parameters:
name – common part of the name for the new category selections (individual names will be
"{name}_{category}
)cut – cut(s) to add. If callable, the category’s candidate will be passed
weight – weight(s) to add. If callable, the category’s candidate will be passed
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)
- Returns:
the new
CategorizedSelection
- class bamboo.plots.CutFlowReport(name, selections=None, recursive=False, titles=None, autoSyst=False, cfres=None, printInLog=False)[source]
Collect and print yields at different selection stages, and cut efficiencies
The simplest way to use this, just to get an overview of the number of events passing each selection stage in the log file, is by adding a
CutFlowReport("yields", selections=<list of selections>, recursive=True, printInLog=True)
to the list of plots.recursive=True
will add all parent selections recursively, so only the final selection categories need to be passed to theselections
keyword argument.It is also possible to output a LaTeX yields table, and specify exactly which selections and row or column headers are used. Then the
CutFlowReport
should be constructed like this:yields = CutFlowReport("yields") plots.append(yields) yields.add(<selection1-or-list-of-selections1>, title=title1) yields.add(<selection2-or-list-of-selections2>, title=title2) ...
Each
yields.add
call will then add one entry in the yields table, with the yield the one of the corresponding selection, or the sum over the list (e.g. different categories that should be taken together); the other dimension are the samples (or sample groups). The sample (group) titles and formatting of the table can be customised in the same way as in plotIt, seeprintCutFlowReports()
for a detailed description of the different options.- __init__(name, selections=None, recursive=False, titles=None, autoSyst=False, cfres=None, printInLog=False)[source]
Constructor.
name
is mandatory, all other are optional; for full control theadd()
should be used to add entries.Using the constructor with a list of
Selection
instances passed to theselections
keyword argument, andrecursive=True, printInLog=True
is the easiest way to get debugging printout of the numbers of passing events.
- produceResults(bareResults, fbe, key=None)[source]
Main interface method, called by the backend
- Parameters:
bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)
- Returns:
an iterable with ROOT objects to save to the output file
- readFromResults(resultsFile)[source]
Reconstruct the
CutFlowReport
, reading counters from a results file
- class bamboo.plots.DerivedPlot(name, dependencies, **kwargs)[source]
Base class for a plot with results based on other plots’ results
The
dependencies
attribute that lists thePlot
-like objects this one depends on (which may be used e.g. to order operations). The other necessary properties (binnings, titles, labels, etc.) are taken from the keyword arguments to the constructor, or the first dependency. TheproduceResults()
method, which is called by the backend to retrieve the derived results, should be overridden with the desired calculation.Typical use cases are summed histograms, background subtraction, etc. (the results are combined for different subjobs with hadd, so derived quantities that require the full statistics should be calculated from the postprocessing step; alternative or additional systematic variations calculated from the existing ones can be added by subclassing
Plot
).- collectDependencyResults(fbe, key=None)[source]
helper method: collect all results of the dependencies
- Returns:
[ (nominalResult, {"variation" : variationResult}) ]
- produceResults(bareResults, fbe, key=None)[source]
Main interface method, called by the backend
- Parameters:
bareResults – iterable of histograms for this plot produced by the backend (none)
fbe – reference to the backend, can be used to retrieve the histograms for the dependencies, e.g. with
collectDependencyResults()
key – key under which the backend stores the results (if any)
- Returns:
an iterable with ROOT objects to save to the output file
- class bamboo.plots.FactoryBackend[source]
Interface for factory backend (to separate Plots classes from ROOT::RDataFrame part)
- class bamboo.plots.LateSplittingSelection(parent, name, cuts=None, weights=None, autoSyst=True, keepInclusive=None)[source]
A drop-in replacement for
Selection
to efficiently split a sampleThe concept is quite similar to
SelectionWithDataDriven
, but with very different performance trade-offs: the former creates two parallel branches of the RDF graph, each for their own set of events (with a typically small performance overhead due to dupliation), whereas this is for cases where all events should be processed identically until they are filled into histograms (e.g. separating subprocesses based on MC truth). It is worth defining columns with these categories early on, such that the splitting does not need to do it many times for different selections and categories.- __init__(parent, name, cuts=None, weights=None, autoSyst=True, keepInclusive=None)[source]
- Constructor. Prefer using
refine()
instead (except for the ‘root’ selection)
- Parameters:
parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)
- Constructor. Prefer using
- static create(parent, name, splitCuts=None, keepInclusive=True, cut=None, weight=None, autoSyst=True)[source]
Create a selection that will lazily split into categories
- Parameters:
name – name of the new selection (after applying the cut and weight, as in
bamboo.plots.Selection.refine()
)splitCuts – dictionary of regions, the values should be the cuts that define the region
keepInclusive – also produce the plots without splitting
cut – common selection
weight – common weight
autoSyst – automatically propagate systematic uncertainties
- class bamboo.plots.Plot(name, variables, selection, binnings, weight=None, title='', axisTitles=(), axisBinLabels=(), plotopts=None, autoSyst=True, key=None)[source]
A
Plot
object contains all information needed to produce a histogram: the variable(s) to plot, binnings and options (axis titles, optionally some style information), and a reference to aSelection
(which holds all cuts and weights to apply for the plot).Note
All
Plot
(andSelection
) instances need to have a unique name. This name is used to construct output filenames, and internally to define DataFrame columns with readable names. The constructor will raise an exception if an existing name is used.- __init__(name, variables, selection, binnings, weight=None, title='', axisTitles=(), axisBinLabels=(), plotopts=None, autoSyst=True, key=None)[source]
Generic constructor. Please use the static
make1D()
,make2D()
andmake3D()
methods, which provide a more convenient interface to construct histograms (filling in some defaults requires knowing the dimensionality).
- clone(name=None, variables=None, selection=None, binnings=None, weight=None, title=None, axisTitles=None, axisBinLabels=None, plotopts=None, autoSyst=True, key=None)[source]
Helper method: create a copy with optional re-setting of attributes
- classmethod make1D(name, variable, selection, binning, **kwargs)[source]
Construct a 1-dimensional histogram plot
- Parameters:
name – unique plot name
variable – x-axis variable expression
selection – the
Selection
with cuts and weights to applybinning – x-axis binning
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, taken from plot title by default)
xBinLabels – x-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)
- Returns:
the new
Plot
instance with a 1-dimensional histogram- Example:
>>> hasTwoEl = noSel.refine(cut=(op.rng_len(t.Electron) >= 2)) >>> mElElPlot = Plot.make1D( >>> "mElEl", op.invariant_mass(t.Electron[0].p4, t.Electron[1].p4), hasTwoEl, >>> EquidistantBinning(80, 50., 130.), title="Invariant mass of the leading-PT electrons")
- classmethod make2D(name, variables, selection, binnings, **kwargs)[source]
Construct a 2-dimensional histogram plot
- Parameters:
name – unique plot name
variables – x- and y-axis variable expression (iterable, e.g. tuple or list)
selection – the
Selection
with cuts and weights to applybinnings – x- and y-axis binnings (iterable, e.g. tuple or list)
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, empty by default)
yTitle – y-axis title (optional, empty by default)
xBinLabels – x-axis bin labels (optional)
yBinLabels – y-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)
- Returns:
the new
Plot
instance with a 2-dimensional histogram
- classmethod make3D(name, variables, selection, binnings, **kwargs)[source]
Construct a 3-dimensional histogram
- Parameters:
name – unique plot name
variables – x-, y- and z-axis variable expression (iterable, e.g. tuple or list)
selection – the
Selection
with cuts and weights to applybinnings – x-, y-, and z-axis binnings (iterable, e.g. tuple or list)
weight – per-entry weight (optional, multiplied with the selection weight)
title – plot title
xTitle – x-axis title (optional, empty by default)
yTitle – y-axis title (optional, empty by default)
zTitle – z-axis title (optional, empty by default)
xBinLabels – x-axis bin labels (optional)
yBinLabels – y-axis bin labels (optional)
zBinLabels – z-axis bin labels (optional)
plotopts – dictionary of options to pass directly to plotIt (optional)
autoSyst – automatically add systematic variations (True by default - set to False to turn off)
- Returns:
the new
Plot
instance with a 3-dimensional histogram
- produceResults(bareResults, fbe, key=None)[source]
Trivial implementation of
produceResults()
Subclasses can e.g. calculate additional systematic variation histograms from the existing ones
- Parameters:
bareResults – list of nominal and systematic variation histograms for this
Plot
fbe – reference to the backend
key – key under which the backend stores the results (if any)
- Returns:
bareResults
- class bamboo.plots.Product(name, key=None)[source]
Interface for output products (plots, counters etc.)
- produceResults(bareResults, fbe, key=None)[source]
Main interface method, called by the backend
- Parameters:
bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)
- Returns:
an iterable with ROOT objects to save to the output file
- class bamboo.plots.Selection(parent, name, cuts=None, weights=None, autoSyst=True)[source]
A
Selection
object groups a set of selection criteria (cuts) and weight factors that belong to a specific stage of the selection and analysis. Selections should be constructed by calling therefine()
method on a ‘root’ selection (which may include overall selections and weights, e.g. a lumi mask for data and pileup reweighting for MC).Note
All
Selection
(andPlot
) instances need to have a unique name. This name is used internally to define DataFrame columns with readable names. The constructor will raise an exception if an existing name is used.- __init__(parent, name, cuts=None, weights=None, autoSyst=True)[source]
- Constructor. Prefer using
refine()
instead (except for the ‘root’ selection)
- Parameters:
parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)
- Constructor. Prefer using
- refine(name, cut=None, weight=None, autoSyst=True)[source]
Create a new selection by adding cuts and/or weight factors
- Parameters:
name – unique name of the new selection
cut – expression (or list of expressions) with additional selection criteria (combined using logical AND)
weight – expression (or list of expressions) with additional weight factors
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)
- Returns:
the new
Selection
- class bamboo.plots.SelectionWithDataDriven(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]
- A main
Selection
with the corresponding “shadow” Selection
instances for evaluating data-driven backgrounds (alternative cuts and/or weights)
- static create(parent, name, ddSuffix, cut=None, weight=None, autoSyst=True, ddCut=None, ddWeight=None, ddAutoSyst=True, enable=True)[source]
Create a selection with a data-driven shadow selection
Drop-in replacement for a
bamboo.plots.Selection.refine()
call: the main selection is made from the parent withcut
andweight
, the shadow selection is made from the parent withddCut
andddWeight
. Withenable=False
no shadow selection is made (this may help to avoid duplication in the calling code).
- A main
- class bamboo.plots.SelectionWithSub(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]
- A common base class for
Selection
subclasses with related/alternative/sub-
Selection
instances attached
A dictionary of additional selections is kept in the
sub
attribute (could beNone
to disable).- __init__(parent, name, cuts=None, weights=None, autoSyst=True, sub=None)[source]
- Constructor. Prefer using
refine()
instead (except for the ‘root’ selection)
- Parameters:
parent – backend or parent selection
name – (unique) name of the selection
cuts – iterable of selection criterion expressions (optional)
weights – iterable of weight factors (optional)
- Constructor. Prefer using
- static getSubsForPlot(p, requireActive=False, silent=False)[source]
Helper method: gather the sub-selections for which a plot is produced
- initSub()[source]
- Initialize related selections
(no-op by default, subclasses can request to call this to enable some functionality)
- refine(name, cut=None, weight=None, autoSyst=True)[source]
Create a new selection by adding cuts and/or weight factors
- Parameters:
name – unique name of the new selection
cut – expression (or list of expressions) with additional selection criteria (combined using logical AND)
weight – expression (or list of expressions) with additional weight factors
autoSyst – automatically add systematic variations (True by default - set to False to turn off; note that this would also turn off automatic systematic variations for any selections and plots that derive from the one created by this method)
- Returns:
the new
Selection
- A common base class for
- class bamboo.plots.Skim(name, branches, selection, keepOriginal=None, maxSelected=-1, treeName=None, key=None)[source]
Save selected branches for events that pass the selection to a skimmed tree
- __init__(name, branches, selection, keepOriginal=None, maxSelected=-1, treeName=None, key=None)[source]
Skim constructor
- Parameters:
name – name of the skim (also default name of the TTree)
branches – dictionary of branches to keep (name and definition for new branches, or name and
None
for specific branches from the input tree)selection –
Selection
of events to savekeepOriginal – list of branch names to keep,
bamboo.plots.Skim.KeepRegex
instances with patterns of branch names to keep, orbamboo.plots.Skim.KeepAll
to keep all branches from the input treemaxSelected – maximal number of events to keep (default: no limit)
- Example:
>>> plots.append(Skim("dimuSkim", { >>> "run": None, # copy from input >>> "luminosityBlock": None, >>> "event": None, >>> "dimu_m": op.invariant_mass(muons[0].p4, muons[1].p4), >>> "mu1_pt": muons[0].pt, >>> "mu2_pt": muons[1].pt, >>> }, twoMuSel, >>> keepOriginal=[ >>> Skim.KeepRegex("PV_.*"), >>> "nOtherPV", >>> Skim.KeepRegex("OtherPV_.*") >>> ])
- produceResults(bareResults, fbe, key=None)[source]
Main interface method, called by the backend
- Parameters:
bareResults – iterable of histograms for this plot produced by the backend
fbe – reference to the backend
key – key under which the backend stores the results (if any)
- Returns:
an iterable with ROOT objects to save to the output file
- class bamboo.plots.SummedPlot(name, termPlots, **kwargs)[source]
A
DerivedPlot
implementation that sums histograms- produceResults(bareResults, fbe, key=None)[source]
Main interface method, called by the backend
- Parameters:
bareResults – iterable of histograms for this plot produced by the backend (none)
fbe – reference to the backend, can be used to retrieve the histograms for the dependencies, e.g. with
collectDependencyResults()
key – key under which the backend stores the results (if any)
- Returns:
an iterable with ROOT objects to save to the output file
Analysis modules
Minimally, bambooRun
needs a class with a constructor that takes a single argument
(the list of command-line arguments that it does not recognize as its own), and a
run
method that takes no arguments.
bamboo.analysismodules
provides more interesting base classes, starting from
AnalysisModule
, which implements a large part of
the common functionality for loading samples and distributing worker tasks.
HistogramsModule
specializes this further
for modules that output stack histograms, and
NanoAODHistoModule
supplements this
with loading the decorations for NanoAOD, and merging of the counters for generator weights etc.
Note
When defining a base class that should also be usable
for other things than only making plots or only making skims
(e.g. both of these) it should not inherit from
HistogramsModule
or
SkimmerModule
(but the concrete classes should); otherwise a concrete class
may end up inheriting from both (at which point the method
resolution order will decide whether it behaves as a skimmer
or a plotter, and the result may not be obvious).
A typical case should look like this:
class MyBaseClass(NanoAODModule):
... # define addArgs, prepareTree etc.
class MyPlotter(MyBaseClass, HistogramsModule):
...
class MySkimmer(MyBaseClass, SkimmerModule):
...
- class bamboo.analysismodules.AnalysisModule(args)[source]
Base analysis module
Adds common infrastructure for parsing analysis config files and running on a batch system, with customization points for concrete classes to implement
- __init__(args)[source]
Constructor
set up argument parsing, calling
addArgs()
andinitialize()
- Parameters:
args – list of command-line arguments that are not parsed by
bambooRun
- addArgs(parser)[source]
- Hook for adding module-specific argument parsing (receives an argument group),
parsed arguments are available in
self.args
afterwards
- customizeAnalysisCfg(analysisCfg)[source]
- Hook to modify the analysis configuration before jobs are created
(only called in driver or non-distributed mode)
- getATree(fileName=None, sampleName=None, config=None)[source]
- Retrieve a representative TTree, e.g. for defining the plots or interactive inspection,
and a dictionary with metadatas
- getTasks(analysisCfg, resolveFiles=None, **extraOpts)[source]
Get tasks from analysis configs (and args), called in for driver or sequential mode
- Returns:
a list of
SampleTask
instances
- initialize()[source]
Hook for module-specific initialization (called from the constructor after parsing arguments)
- postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]
Do postprocessing on the results of the tasks, if needed
should be implemented by concrete modules
- Parameters:
taskList –
(inputs, output), kwargs
for the tasks (list, string, and dictionary)config – parsed analysis configuration file
workdir – working directory for the current run
resultsdir – path with the results files
- run()[source]
Main method
Depending on the arguments passed, this will:
if
-i
or--interactive
, callinteract()
(which could do some initialization and start an IPython shell)if
--distributed=worker
callprocessTrees()
with the appropriate input, output, treename, lumi mask and run rangeif
--distributed=driver
or not given (sequential mode): parse the analysis configuration file, construct the tasks withgetTasks()
, run them (on a batch cluster or in the same process withprocessTrees()
), and finally callpostProcess()
with the results.
- class bamboo.analysismodules.DataDrivenBackgroundAnalysisModule(args)[source]
AnalysisModule
with support for data-driven backgroundsA number of contributions can be defined, each based on a list of samples or groups needed to evaluate the contribution (typically just data) and a list of samples or groups that should be left out when making the plot with data-driven contributions. The contributions should be defined in the analysis YAML file, with a block
datadriven
(at the top level) that could look as follows:datadriven: chargeMisID: uses: [ data ] replaces: [ DY ] nonprompt: uses: [ data ] replaces: [ TTbar ]
The
--datadriven
command-line switch then allows to specify a scenario for data-driven backgrounds, i.e. a list of data-driven contributions to include (all
andnone
are also possible, the latter is the default setting). The parsed contributions are available asself.datadrivenContributions
, and the scenarios (each list is a list of contributions) asself.datadrivenScenarios
.
- class bamboo.analysismodules.DataDrivenBackgroundHistogramsModule(args)[source]
HistogramsModule
with support for data-driven backgroundssee the
DataDrivenBackgroundAnalysisModule
class for more details about configuring data-driven backgrounds, and theSelectionWithDataDriven
class for ensuring the necessary histograms are filled correctly.HistogramsModule
writes the histograms for the data-driven contributions to different files. This one runsplotIt
for the different scenarios.- postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]
Do postprocessing on the results of the tasks, if needed
should be implemented by concrete modules
- Parameters:
taskList –
(inputs, output), kwargs
for the tasks (list, string, and dictionary)config – parsed analysis configuration file
workdir – working directory for the current run
resultsdir – path with the results files
- class bamboo.analysismodules.DataDrivenContribution(name, config)[source]
Configuration helper class for data-driven contributions
An instance is constructed for each contribution in any of the scenarios by the
bamboo.analysismodules.DataDrivenBackgroundAnalysisModule.initialize()
method, with the name and configuration dictionary found in YAML file. TheusesSample()
:,replacesSample()
: andmodifiedSampleConfig()
: methods can be customised for other things than using the data samples to estimate a background contribution.- modifiedSampleConfig(sampleName, sampleConfig, lumi=None)[source]
Construct the sample configuration for the reweighted counterpart of a sample
The default implementation assumes a data sample and turns it into a MC sample (the luminosity is set as
generated-events
to avoid changing the normalisation).
- class bamboo.analysismodules.HistogramsModule(args)[source]
Base histogram analysis module
- __init__(args)[source]
Constructor
Defines a
plotList
member variable, which will store a list of plots (the result ofdefinePlots()
, which will be called afterprepareTree()
). ThepostProcess()
method specifies what to do with the results.
- addArgs(parser)[source]
- Hook for adding module-specific argument parsing (receives an argument group),
parsed arguments are available in
self.args
afterwards
- definePlots(tree, noSel, sample=None, sampleCfg=None)[source]
Main method: define plots on the trees (for a give systematic variation)
should be implemented by concrete modules, and return a list of
bamboo.plots.Plot
objects. The structure (name, binning) of the histograms should not depend on the sample, era, and the list should be the same for all values (the weights and systematic variations associated with weights or collections may differ for data and different MC samples, so the actual set of histograms will not be identical).- Parameters:
tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file
- getPlotList(fileHint=None, sampleHint=None, resultsdir=None, config=None)[source]
Helper method for postprocessing: construct the plot list
The path (and sample name) of an input file can be specified, otherwise the results directory is searched for a skeleton tree. Please note that in the latter case, the skeleton file is arbitrary (in practice it probably corresponds to the first sample encountered when running in sequential or
--distributed=driver
mode), so if the postprocessing depends on things that are different between samples, one needs to be extra careful to avoid surprises.- Parameters:
fileHint – name of an input file for one of the samples
sampleHint – sample name for the input file passed in
fileHint
resultsdir – directory with the produced results files (mandatory if no
fileHint
andsampleHint
are passed)config – analysis config (to override the default - optional)
- makeBackendAndPlotList(inputFiles, tree=None, certifiedLumiFile=None, runRange=None, sample=None, sampleCfg=None, inputFileLists=None, backend=None)[source]
Prepare and plotList definition (internal helper)
- Parameters:
inputFiles – input file names
tree – key name of the tree inside the files
certifiedLumiFile – lumi mask json file name
runRange – run range to consider (for efficiency of the lumi mask)
sample – sample name (key in the samples block of the configuration file)
sampleCfg – that sample’s entry in the configuration file
inputFileLists – names of files with the input files (optional, to avoid rewriting if this already exists)
backend – type of backend (lazy/default, debug, compiled, distributed)
- Returns:
the backend and plot list (which can be None if run in onlyprepare” mode)
- mergeCounters(outF, infileNames, sample=None)[source]
Merge counters
should be implemented by concrete modules
- Parameters:
outF – output file (TFile pointer)
infileNames – input file names
sample – sample name
- postProcess(taskList, config=None, workdir=None, resultsdir=None)[source]
Postprocess: run plotIt
The list of plots is created if needed (from a representative file, this enables rerunning the postprocessing step on the results files), and then plotIt is executed
- prepareTree(tree, sample=None, sampleCfg=None, backend=None)[source]
Create decorated tree, selection root (noSel), backend, and (run,LS) expressions
should be implemented by concrete modules
- Parameters:
tree – decorated tree
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file
backend – type of backend (lazy/default, debug, compiled, distributed)
- class bamboo.analysismodules.NanoAODHistoModule(args)[source]
- A
HistogramsModule
for NanoAOD, with decorations and merging of counters from
NanoAODModule
- __init__(args)[source]
Constructor
set up argument parsing, calling
addArgs()
andinitialize()
- Parameters:
args – list of command-line arguments that are not parsed by
bambooRun
- A
- class bamboo.analysismodules.NanoAODModule(args)[source]
- A
AnalysisModule
extension for NanoAOD, adding decorations and merging of the counters
- prepareTree(tree, sample=None, sampleCfg=None, description=None, backend=None)[source]
Add NanoAOD decorations, and create an RDataFrame backend
In addition to the arguments needed for the base class
prepareTree`()
method, a description of the tree, and settings for reading systematic variations or corrections from alternative branches, or calculating these on the fly, should be passed, such that the decorations can be constructed accordingly.- Parameters:
description – description of the tree format, and configuration for reading or calculating systematic variations and corrections, a
NanoAODDescription
instance (see alsobamboo.treedecorators.NanoAODDescription.get()
)
- A
- class bamboo.analysismodules.NanoAODSkimmerModule(args)[source]
- A
SkimmerModule
for NanoAOD, with decorations and merging of counters from
NanoAODModule
- __init__(args)[source]
Constructor
set up argument parsing, calling
addArgs()
andinitialize()
- Parameters:
args – list of command-line arguments that are not parsed by
bambooRun
- A
- class bamboo.analysismodules.SkimmerModule(args)[source]
Base skimmer module
Left for backwards-compatibility, please use a
HistogramsModule
that defines one or morebamboo.plots.Skim
products instead.- addArgs(parser)[source]
- Hook for adding module-specific argument parsing (receives an argument group),
parsed arguments are available in
self.args
afterwards
- definePlots(tree, noSel, sample=None, sampleCfg=None)[source]
Main method: define plots on the trees (for a give systematic variation)
should be implemented by concrete modules, and return a list of
bamboo.plots.Plot
objects. The structure (name, binning) of the histograms should not depend on the sample, era, and the list should be the same for all values (the weights and systematic variations associated with weights or collections may differ for data and different MC samples, so the actual set of histograms will not be identical).- Parameters:
tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file
- defineSkimSelection(tree, noSel, sample=None, sampleCfg=None)[source]
Main method: define a selection for the skim
should be implemented by concrete modules, and return a
bamboo.plots.Selection
object- Parameters:
tree – decorated tree
noSel – base selection
sample – sample name (as in the samples section of the analysis configuration file)
sampleCfg – that sample’s entry in the configuration file
- Returns:
the skim
bamboo.plots.Selection
, and a map{ name: expression }
of branches to store (to store all the branches of the original tree in addition, pass –keepOriginalBranches to bambooRun; individual branches can be added with an entryname: None
entry)
Tree decoratorator customisation
Expressions are constructed by executing python code on decorated versions of
decorated trees. The bamboo.treedecorators
module contains helper
methods to do so for commonly used formats, e.g. decorateNanoAOD()
for CMS NanoAOD.
- class bamboo.treedecorators.NanoSystematicVarSpec(nomName=None, origName=None, exclVars=None, isCalc=False)[source]
- Interface for classes that specify how to incorporate systematics
or on-the-fly corrections in the decorated tree
See
NanoAODDescription
anddecorateNanoAOD()
- appliesTo(name)[source]
- Return true if this systematic variation requires action
for this variable, group, or collection
- changesTo(name)[source]
Return the new name(s) for a collection or group (assuming appliesTo(name) is True)
- class bamboo.treedecorators.ReadVariableVarWithSuffix(commonName, sep='_', nomName='nominal', exclVars=None)[source]
Read variations of a single branch from branches with the same name with a suffix
- class bamboo.treedecorators.ReadJetMETVar(jetsName, metName, jetsNomName='nom', jetsOrigName='raw', metNomName='', metOrigName='raw', jetsExclVars=None, metExclVars=None, bTaggers=None, bTagWPs=None)[source]
Read jet and MET kinematic variations from different branches for automatic systematic uncertainties
- Parameters:
jetsName – jet collection prefix (e.g.
"Jet"
)metName – MET prefix (e.g.
"MET"
)jetsNomName – name of the nominal jet variation (
"nom"
by default)jetsOrigName – name of the original jet variation (
"raw"
by default)metNomName – name of the nominal jet variation (
"nom"
by default)metOrigName – name of the original jet variation (
"raw"
by default)jetsExclVars – jet variations that are present but should be ignored (if not specified, only
jetsOrigName
is taken, so if specified this should usually be added explicitly)metExclVars – MET variations that are present but should be ignored (if not specified, only
metOrigName
is taken, so if specified this should usually be added explicitly)bTaggers – list of b-tagging algorithms, for scale factors stored in a branch
bTagWPs – list of b-tagging working points, for scale factors stored in a branch (
shape
should be included here, if wanted)
Note
The implementation of automatic systematic variations treats “xyzup” and “xyzdown” independently (since this is the most flexible). If a source of systematic uncertainty should be excluded, both the “up” and “down” variation should then be added to the list of variations to exclude (
jetsExclVars
ormetExclVars
).
- class bamboo.treedecorators.NanoReadRochesterVar(systName=None)[source]
Read precalculated Rochester correction variations
- Parameters:
systName – name of the systematic uncertainty, if variations should be enabled
- class bamboo.treedecorators.NanoReadTauESVar(systName=None)[source]
Read precalculated Tau energy scale variations
- Parameters:
systname – name of the systematic uncertainty, if variations should be enabled
- class bamboo.treedecorators.CalcCollectionsGroups(nomName='nominal', origName='raw', exclVars=None, changes=None, **colsAndAttrs)[source]
NanoSystematicVarSpec
for on-the-fly correctionsand systematic variation calculation
- appliesTo(name)[source]
- Return true if this systematic variation requires action
for this variable, group, or collection
- class bamboo.treedecorators.NanoAODDescription(groups=None, collections=None, systVariations=None)[source]
Description of the expected NanoAOD structure, and configuration for systematics and corrections
- Essentially, a collection of three containers:
collections
a list of collections (by the name of the length leaf)groups
a list of non-collection groups (by prefix, e.g.HLT_
)systVariations
a list ofNanoSystematicVarSpec
instances,to configure reading systematics variations from branches, or calculating them on the fly
The recommended way to obtain a configuration is from the factory method
get()
- static get(tag, year='2016', isMC=False, addGroups=None, removeGroups=None, addCollections=None, removeCollections=None, systVariations=None)[source]
Create a suitable NanoAODDescription instance based on a production version
A production version is defined by a tag, data-taking year, and a flag to distinguish data from simulation. Any number of groups or collections can be added or removed from this. The
systVariations
option- Example:
>>> decorateNanoAOD(tree, NanoAODDescription.get( >>> "v5", year="2016", isMC=True, >>> systVariations=[nanoRochesterCalc, nanoJetMETCalc])) >>> decorateNanoAOD(tree, NanoAODDescription.get( >>> "v5", year="2017", isMC=True, >>> systVariations=[nanoPUWeightVar, nanoReadJetMETVar_METFixEE2017]))
- Parameters:
tag – production version (e.g. “v5”)
year – data-taking year
isMC – simulation or not
addGroups – (optional) list of groups of leaves to add (e.g.
["L1_", "HLT_"]
, if not present)removeGroups – (optional) list of groups of leaves to remove (e.g.
["L1_"]
, if skimmed)addCollections – (optional) list of containers to add (e.g.
["nMyJets"]
)removeCollections – (optional) list of containers to remove (e.g.
["nPhoton", "nTau"]
)systVariations – list of correction or systematic variation on-the-fly calculators or configurations to add (
NanoSystematicVarSpec
instances)
See also
decorateNanoAOD()
- bamboo.treedecorators.decorateNanoAOD(aTree, description=None)[source]
Decorate a CMS NanoAOD Events tree
Variation branches following the NanoAODTools conventions (e.g. Jet_pt_nom) are automatically used (but calculators for the same collection take precendence, if requested).
- Parameters:
aTree – TTree to decorate
description – description of the tree format, and configuration for reading or calculating systematic variations and corrections, a
NanoAODDescription
instance (see alsoNanoAODDescription.get()
)
Helper functions
The bamboo.analysisutils
module bundles a number of more
specific helper methods that use the tree decorators and integrate with
other components, connect to external services, or are factored out of the
classes in bamboo.analysismodules
to facilitate reuse.
- class bamboo.analysisutils.YMLIncludeLoader(stream)[source]
- Custom yaml loading to support including config files.
Use !include (file) to insert content of file at that position.
- bamboo.analysisutils.addLumiMask(sel, jsonName, runRange=None, runAndLS=None, name='goodlumis')[source]
Refine selection with a luminosity block filter
Typically applied directly to the root selection (for data). runAndLS should be a tuple of expressions with the run number and luminosity block ID. The run range is used to limit the part of the JSON file to consider, see the LumiMask helper class for details.
- bamboo.analysisutils.addPrintout(selection, funName, *args)[source]
Call a method with debugging printout, as part of the RDataFrame graph
This method is only meant to work with the default backend, since it works by inserting a
Filter
node that lets all events pass.- Parameters:
selection – selection for which to add the printout. The function call will be added to the RDataFrame graph in its current state, so if a plot causes a problem this method should be called before defining it.
funName – name of a C++ method to call. This method should always return
true
, and can take any number of arguments.args – arguments to pass to the function
The following example would print the entry and event number for each event that passes some selection.
- Example:
>>> from bamboo.root import gbl >>> gbl.gInterpreter.Declare(""" ... bool bamboo_printEntry(long entry, long event) { ... std::cout << "Processing entry #" << entry << ": event " << event << std::endl; ... }""") >>> addPrintout(sel, "bamboo_printEntry", op.extVar("ULong_t", "rdfentry_"), t.event)
- bamboo.analysisutils.configureJets(variProxy, jetType, jec=None, jecLevels='default', smear=None, useGenMatch=True, genMatchDR=0.2, genMatchDPt=3.0, jesUncertaintySources=None, regroupTag='', uncertaintiesFallbackJetType=None, splitJER=False, addHEM2018Issue=False, enableSystematics=None, subjets=None, mcYearForFatJets=None, isTau21DDT=False, jms=None, jmr=None, gms=None, gmr=None, cachedir=None, mayWriteCache=False, isMC=False, backend=None, uName='')[source]
Reapply JEC, set up jet smearing, or prepare JER/JES uncertainties collections
- Parameters:
variProxy – jet variations proxy, e.g.
tree._Jet
jetType – jet type, e.g. AK4PFchs
smear – tag of resolution (and scalefactors) to use for smearing (no smearing is done if unspecified)
jec – tag of the new JEC to apply, or for the JES uncertainties (pass an empty list to jecLevels to produce only the latter without reapplying the JEC)
jecLevels – list of JEC levels to apply (if left out the recommendations are used: L1FastJet, L2Relative, L3Absolute, and also L2L3Residual for data)
jesUncertaintySources – list of jet energy scale uncertainty sources (see the JECUncertaintySources twiki),
"All"
, or"Merged"
, for all regrouped uncertainties (in which caseregroupTag
can be specified)regroupTag – version of the regrouped uncertainties to use
uncertaintiesFallbackJetType – jet type from which to use the (regrouped) JES uncertainties if those for jetType are not found (e.g. AK4PFchs, see JME HN)
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
useGenMatch – use matching to generator-level jets for resolution smearing
genMatchDR – DeltaR for generator-level jet matching (half the cone size is recommended, default is 0.2)
genMatchDPt – maximal relative PT difference (in units of the resolution) between reco and gen jet
splitJER – vary the JER uncertainty independently in six kinematic bins (see the JER uncertainty twiki)
addHEM2018Issue – add a JES uncertainty for the HEM issue in 2018 (see this hypernews post)
subjets – subjets proxy (
tree.SubJet
)mcYearForFatJets – data-taking year for fat jet parameters ((softdrop) mass scale and resolution, these should not be passed for data). They can also be passed explicitly, see the following parameters. If none are passed, no jet mass scale corrections are applied.
isTau21DDT – if used in combinaton with mcYearForFatJets, will use different values for the softdrop mass. Warning: differently from nanoAOD-Tools, these will be propagated to the JEC uncertainties, and this combination of settings has not been validated. Please check carefully if you need to use this.
jms – jet mass scale correction (nominal, up, down), for fat jets
jmr – jet mass resolution (nominal, up, down), for fat jets
gms – jet groomed mass scale correction (nominal, up, down), for fat jets, same as
jms
by defaultgmr – jet groomed mass resolution (nominal, up, down), for fat jets, same as
jmr
by defaultcachedir – alternative root directory to use for the txt files cache, instead of
$XDG_CACHE_HOME/bamboo
(usually~/.cache/bamboo
)mayWriteCache – flag to indicate if this task is allowed to write to the cache status file (set to False for worker tasks to avoid corruption due to concurrent writes)
isMC – MC or not
backend – backend pointer (returned from
prepareTree()
)uName – [deprecated, ignored] unique name for the correction calculator (sample name is a safe choice)
- bamboo.analysisutils.configureRochesterCorrection(variProxy, paramsFile, isMC=False, backend=None, uName='')[source]
Apply the Rochester correction for muons
- Parameters:
variProxy – muon variatons proxy, e.g.
tree.._Muon
for NanoAODparamsFile – path of the text file with correction parameters
isMC – MC or not
backend – backend pointer (returned from
prepareTree()
)uName – [deprecated, ignored] unique name for the correction calculator (sample name is a safe choice)
- bamboo.analysisutils.configureSVfitCalculator(pathToSVfit='', backend=None)[source]
Configure SVfit
- Parameters:
pathToSVfit – path to your SVfit installation
backend – backend pointer (returned from
prepareTree()
)
- bamboo.analysisutils.configureTauESCorrection(variProxy, paramsFile, tauIdAlgo, backend=None)[source]
Apply the energy correction for taus
- Parameters:
variProxy – tau variatons proxy, e.g.
tree._Tau
for NanoAODparamsFile – path of the json file with correction parameters
tauIdAlgo – name of the algorithm for the tau identification, e.g. “DeepTau2017v2p1”
backend – backend pointer (returned from
prepareTree()
)
- bamboo.analysisutils.configureType1MET(variProxy, jec=None, smear=None, isT1Smear=False, useGenMatch=True, genMatchDR=0.2, genMatchDPt=3.0, jesUncertaintySources=None, regroupTag='', splitJER=False, addHEM2018Issue=False, enableSystematics=None, cachedir=None, mayWriteCache=False, isMC=False, backend=None, uName='')[source]
Reapply JEC, set up jet smearing, or prepare JER/JES uncertainties collections
- Parameters:
variProxy – MET variations proxy, e.g.
tree._MET
smear – tag of resolution (and scalefactors) to use for smearing (no smearing is done if unspecified)
isT1Smear – T1Smear (smeared as nominal, all variations with respect to that) if True, otherwise T1 (JES variations with respect to the unsmeared MET, jerup and jerdown variations are nominally smeared)
jec – tag of the new JEC to apply, or for the JES uncertainties
jesUncertaintySources –
list of jet energy scale uncertainty sources (see the JECUncertaintySources twiki),
"All"
, or"Merged"
, for all regrouped uncertainties (in which caseregroupTag
can be specified)regroupTag – version of the regrouped uncertainties to use
enableSystematics – filter systematics variations to enable (collection of names or callable that takes the variation name; default: all that are available for MC, none for data)
useGenMatch – use matching to generator-level jets for resolution smearing
genMatchDR – DeltaR for generator-level jet matching (half the cone size is recommended, default is 0.2)
genMatchDPt – maximal relative PT difference (in units of the resolution) between reco and gen jet
splitJER –
vary the JER uncertainty independently in six kinematic bins (see the JER uncertainty twiki)
addHEM2018Issue –
add a JES uncertainty for the HEM issue in 2018 (see this hypernews post)
cachedir – alternative root directory to use for the txt files cache, instead of
$XDG_CACHE_HOME/bamboo
(usually~/.cache/bamboo
)mayWriteCache – flag to indicate if this task is allowed to write to the cache status file (set to False for worker tasks to avoid corruption due to concurrent writes)
be – backend pointer
uName – [deprecated, ignored] unique name for the correction calculator (sample name is a safe choice)
isMC – MC or not
- bamboo.analysisutils.forceDefine(arg, selection, includeSub=True)[source]
Force the definition of an expression as a column at a selection stage
Use only for really computation-intensive operations that need to be precalculated
- Parameters:
arg – expression to define as a column
selection –
Selection
for which the expression should be definedincludeSub – also precalculate for data-driven background ‘shadow’ selections (
bamboo.plots.SelectionWithSub
‘sub’-selections)
- bamboo.analysisutils.getAFileFromAnySample(samples, resolveFiles=None, cfgDir='.')[source]
Helper method: get a file from any sample (minimizing the risk of errors)
Tries to find any samples with: - a list of files - a cache file - a SAMADhi path - a DAS path
If successful, a single read / query is sufficient to retrieve a file
- bamboo.analysisutils.loadPlotIt(config, plotList, eras=None, workdir='.', resultsdir='.', readCounters=<function <lambda>>, vetoFileAttributes=None, plotDefaults=None)[source]
Load the plotit configuration with the plotIt python library
The plotIt YAML file writing and parsing is skipped in this case (to write the file, the
writePlotIt()
method should be used, with the same arguments).- Parameters:
config – parsed analysis configuration. Only the
configuration
(if present) anderas
sections (to get the luminosities) are read.plotList – list of plots to convert (
name
andplotopts
, combined with the default style)eras – list of eras to consider (
None
for all that are in the config)workdir – output directory
resultsdir – directory with output ROOT files with histograms
readCounters – method to read the sum of event weights from an output file
vetoFileAttributes – list of per-sample keys that should be ignored (those specific to the bamboo part, e.g. job splitting and DAS paths)
plotDefaults – plot defaults to add (added to those from
config["plotIt"]["plotdefaults"]
, with higher precedence if present in both)
- bamboo.analysisutils.makeMultiPrimaryDatasetTriggerSelection(sampleName, datasetsAndTriggers)[source]
Construct a selection that prevents processing multiple times (from different primary datasets)
If an event is passes triggers for different primary datasets, it will be taken from the first of those (i.e. the selection will be ‘passes one of the triggers that select it for this primary dataset, and not for any of those that come before in the input dictionary).
- Parameters:
sampleName – sample name
datasetsAndTriggers – a dictionary
{primary-dataset, set-of-triggers}
, where the key is either a callable that takes a sample name and returns true in case it originates from the corresponding primary datasets, or a string that is the first part of the sample name in that case. The value (second item) can be a single expression (e.g. a trigger flag, or an OR of them), or a list of those (in which case an OR-expression is constructed from them).
- Returns:
an expression to filter the events in the sample with given name
- Example:
>>> if not self.isMC(sample): >>> trigSel = noSel.refine("trigAndPrimaryDataset", >>> cut=makeMultiPrimaryDatasetTriggerSelection(sample, { >>> "DoubleMuon" : [t.HLT.Mu17_TrkIsoVVL_Mu8_TrkIsoVVL, >>> t.HLT.Mu17_TrkIsoVVL_TkMu8_TrkIsoVVL], >>> "DoubleEG" : t.HLT.Ele23_Ele12_CaloIdL_TrackIdL_IsoVL_DZ, >>> "MuonEG" : [t.HLT.Mu23_TrkIsoVVL_Ele12_CaloIdL_TrackIdL_IsoVL, >>> t.HLT.Mu8_TrkIsoVVL_Ele23_CaloIdL_TrackIdL_IsoVL ] >>> }))
- bamboo.analysisutils.makePileupWeight(puWeights, numTrueInteractions, systName=None, nameHint=None, sel=None, defineOnFirstUse=True)[source]
Construct a pileup weight for MC, based on the weights in a JSON file
- Parameters:
puWeights – path of the JSON file with weights (binned in NumTrueInteractions) for cp3-llbb JSON, or tuple of JSON path and correction name (correctionlib JSON)
numTrueInteractions – expression to get the number of true interactions (Poissonian expectation value for an event)
systName – name of the associated systematic nuisance parameter
sel – a selection in the current graph (only used to retrieve a pointer to the backend)
- bamboo.analysisutils.printCutFlowReports(config, reportList, workdir='.', resultsdir='.', suffix=None, readCounters=<function <lambda>>, eras=('all', None), verbose=False)[source]
Print yields to the log file, and write a LaTeX yields table for each
Samples can be grouped (only for the LaTeX table) by specifying the
yields-group
key (overriding the regulargroups
used for plots). The sample (or group) name to use in this table should be specified through theyields-title
sample key.In addition, the following options in the
plotIt
section of the YAML configuration file influence the layout of the LaTeX yields table:yields-table-stretch
:\arraystretch
value, 1.15 by defaultyields-table-align
: orientation,h
(default), samples in rows,or
v
, samples in columns
yields-table-text-align
: alignment of text in table cells (default:c
)yields-table-numerical-precision-yields
: number of digits afterthe decimal point for yields (default: 1)
yields-table-numerical-precision-ratio
: number of digits afterthe decimal point for ratios (default: 2)
- bamboo.analysisutils.readEnvConfig(explName=None)[source]
Read computing environment config file (batch system, storage site etc.)
For using a batch cluster, the [batch] section should have a ‘backend’ key, and there should be a section with the name of the backend (slurm, htcondor…), see bamboo.batch_<backend> for details. The storage site information needed to resolve the PFNs for datasets retrieved from DAS should be specified under the [das] section (sitename and storageroot).
- bamboo.analysisutils.runPlotIt(cfgName, workdir='.', plotsdir='plots', plotIt='plotIt', eras=('all', None), verbose=False)[source]
Run plotIt
- Parameters:
cfgName – plotIt YAML config file name
workdir – working directory (also the starting point for finding the histograms files,
--i
option)plotsdir – name of the plots directory inside workdir (
plots
, by default)plotIt – path of the
plotIt
executableeras –
(mode, eras)
, mode being one of"split"
,"combined"
, or"all"
(both of the former), and eras a list of era names, orNone
for allverbose – print the plotIt command being run
- bamboo.analysisutils.splitVariation(variProxy, variation, regions, nomName='nom')[source]
Split a systematic variation between (kinematic) regions (to decorrelate the nuisance parameter)
- Parameters:
variProxy – jet variations proxy, e.g.
tree._Jet
variation – name of the variation that should be split (e.g. “jer”)
regions – map of region names and selections (for non-collection objects: boolean expression, for collection objects: a callable that returns a boolean for an item from the collection)
nomName – name of the nominal variation (“nom” for postprocessed, “nominal” for calculator)
- Example:
>>> splitVariation(tree._Jet, "jer", {"forward" : lambda j : j.eta > 0., >>> "backward" : lambda j : j.eta < 0.})
- bamboo.analysisutils.writePlotIt(config, plotList, outName, eras=None, workdir='.', resultsdir='.', readCounters=<function <lambda>>, vetoFileAttributes=None, plotDefaults=None)[source]
Combine creation and saving of a plotIt config file
for convenience inside a
HistogramsModule
, the individual parts are also available inbamboo.analysisutils
.- Parameters:
config – parsed analysis configuration. Only the
configuration
(if present) anderas
sections (to get the luminosities) are read.plotList – list of plots to convert (
name
andplotopts
, combined with the default style)outName – output YAML config file name
eras – valid era list
workdir – output directory
resultsdir – directory with output ROOT files with histograms
readCounters – method to read the sum of event weights from an output file
vetoFileAttributes – list of per-sample keys that should be ignored (those specific to the bamboo part, e.g. job splitting and DAS paths)
plotDefaults – plot defaults to add (added to those from
config["plotIt"]["plotdefaults"]
, with higher precedence if present in both)
Scale factors
The bamboo.scalefactors
module contains helper methods
for configuring scale factors, fake rates etc.
The basic configuration parameter is the JSON file path for a set of scalefactors. There two basic types are
lepton scale factors (dependent on a number of object variables, e.g. pt and eta),
jet (b-tagging) scale factors (grouped set for different flavours, for convenience)
Different values (depending on the data-taking period) can be taken into account by weighting or by randomly sampling.
- class bamboo.scalefactors.BtagSF(taggerName, csvFileName, wp=None, sysType='central', otherSysTypes=None, measurementType=None, getters=None, jesTranslate=None, sel=None, uName=None)[source]
Helper for b- and c-tagging scalefactors using the BTV POG reader
- __call__(jet, nomVar=None, systVars=None)[source]
Evaluate the scalefactor for a jet
Please note that this only gives the applicable scalefactor: to obtain the event weight one of the recipes in the POG twiki should be used.
By default the nominal and systematic variations are taken from the
bamboo.scalefactors.BtagSF
instance, but they can be overriden with thenomVar
andsystVars
keyword arguments. Please note that when using split uncertainties (e.g. for the reshaping method) some uncertainties only apply to specific jet flavours (e.g. c-jets) and the csv file contains zeroes for the other flavours. Then the user code should check the jet flavours, and call this method with the appropriate list of variations for each.
- __init__(taggerName, csvFileName, wp=None, sysType='central', otherSysTypes=None, measurementType=None, getters=None, jesTranslate=None, sel=None, uName=None)[source]
- Declare a BTagCalibration (if needed) and BTagCalibrationReader (unique, based on
uName
), and decorate for evaluation
Warning
This function is deprecated. Use correctionlib and helpers in
scalefactors
instead.- Parameters:
taggerName – first argument for
BTagCalibration
csvFileName – name of the CSV file with scalefactors
wp – working point (used as
BTagEntry::OP_{wp.upper()}
)sysType – nominal value systematic type (
"central"
, by default)otherSysTypes – other systematic types to load in the reader
measurementType – dictionary with measurement type per true flavour (B, C, and UDSG), or a string if the same for all (if not specified,
"comb"
will be used for b- and c-jets, andincl
for light-flavour jets)getters – dictionary of methods to get the kinematics and classifier for a jet (the keys
Pt
,Eta
,JetFlavour
, andDiscri
are used. For the former three, the defaults are for NanoAOD)jesTranslate – translation function for JEC systematic variations, from the names in the CSV file to those used for the jets (the default should work for on-the-fly corrections)
sel – a selection in the current graph
uName – unique name, to declare the reader (e.g. sample name)
- Declare a BTagCalibration (if needed) and BTagCalibrationReader (unique, based on
- bamboo.scalefactors.get_bTagSF_fixWP(json_path, tagger, wp, flavour, sel, jet_pt_variation=None, heavy_method='comb', syst_prefix='btagSF_fixWP_', decorr_wps=False, decorr_eras=False, era=None, full_scheme=False, syst_mapping=None, defineOnFirstUse=True)[source]
Build correction evaluator for fixed working point b-tagging scale factors
Loads the b-tagging scale factors as correction object from the JSON file, configures the systematic variations, and returns a callable that can be evaluated on a jet to return the scale factor.
- Parameters:
json_path – JSON file path
tagger – name of the tagger inside the JSON (not the same as in the event!)
wp – working point of the tagger (“L”, “M”, “T”)
flavour – hadron flavour of the jet (0, 4, 5)
sel – a selection in the current graph (only used to retrieve a pointer to the backend)
jet_pt_variation – if specified, only use that specific systematic variation (e.g. the nominal) of the jet pt to evaluate the scale factors. By default, the scale factors are evaluated for each variation.
heavy_method – B-tagging measurement method for heavy-flavour jets (“comb” or “mujets”). For light jets, there is only the “incl” method.
syst_prefix – Prefix to prepend to the name of all resulting the b-tagging systematic variations. Variations for light or heavy jets will be prefixed resp. by {syst_prefix}light or {syst_prefix}heavy (unless the full scheme is used).
decorr_wps – If True, insert the working point into the systematic name for the uncorrelated/statistical component. Otherwise, all working points will be taken as fully correlated when using several in the analysis.
decorr_eras – If True, use the scale factor uncertainties split into “uncorrelated” and “correlated” parts, and insert the era name into the variation names. If False, only use the total scale factor uncertainties (not split).
era – Name of era, used in the name of systematic variations if one of decorr_eras or full_scheme is True.
full_scheme – If True, use split uncertainty sources as specified in the full_scheme argument
syst_mapping – Dictionary used to list the systematics to consider, and to map the naming of the full-scheme b-tagging uncertainties to variations defined elsewhere in the analysis, for varying them together when needed (see example below).
defineOnFirstUse – see description in
get_correction()
- Returns:
a callable that takes a jet and returns the correction (with systematic variations as configured here) obtained by evaluating the b-tagging scale factors on the jet
- Example:
>>> btvSF_b = get_bTagSF_fixWP("btv.json", "deepJet", "M", 5, sel, syst_prefix="btagSF_", >>> era="2018UL", full_scheme=True, >>> syst_mapping={ >>> "pileup": "pileup", >>> "type3": None, >>> "jer0": "jer", >>> "jer1": "jer" >>> })
- Will result in the following systematic uncertainties:
btagSF_statistic_2018UL{up/down}: mapped to {up/down}_statistic in the JSON
- btagSF_pileup{up/down}: mapped to {up/down}_pileup in the JSON, and correlated
with the pileup{up/down} variations in the analysis
btagSF_type3{up/down}: mapped to {up/down}_type3 in the JSON
- btagSF_jer0{up/down}: mapped to {up/down}_jer in the JSON, and correlated
with the jer0{up/down} variations in the analysis
- btagSF_jer1{up/down}: mapped to {up/down}_jer in the JSON, and correlated
with the jer1{up/down} variations in the analysis
>>> btvSF_b = get_bTagSF_fixWP("btv.json", "deepJet", "M", 5, sel, syst_prefix="btagSF_", >>> era="2018UL", decorr_wps=True, decorr_eras=True)
- Will result in the following systematic uncertainties:
btagSF_heavy_M_2018UL{up/down}: mapped to {up/down}_uncorrelated in the JSON
btagSF_heavy{up/down}: mapped to {up/down}_correlated in the JSON
- bamboo.scalefactors.get_bTagSF_itFit(json_path, tagger_json, tagger_jet, flavour, sel, jet_pt_variation=None, syst_prefix='btagSF_shape_', decorr_eras=False, era=None, syst_mapping=None, defineOnFirstUse=True)[source]
Build correction evaluator for continuous (iterativeFit) b-tagging scale factors
Loads the b-tagging scale factors as correction object from the JSON file, configures the systematic variations, and returns a callable that can be evaluated on a jet to return the scale factor.
- Parameters:
json_path – JSON file path
tagger_json – name of the tagger inside the JSON
tagger_jet – name of the tagger in the tree
flavour – hadron flavour of the jet (0, 4, 5)
sel – a selection in the current graph (only used to retrieve a pointer to the backend)
jet_pt_variation – see description in
get_bTagSF_fixWP()
syst_prefix – Prefix to prepend to the name of all resulting the b-tagging systematic variations.
decorr_eras – If True, insert the era into the variation name for statistical uncertainties
era – Name of era, used in the name of systematic variations if decorr_eras is True
syst_mapping – see description in
get_bTagSF_fixWP()
, with the difference that here the “basic” (non-JES-related) variations are already included no matter what.defineOnFirstUse – see description in
get_correction()
- Returns:
a callable that takes a jet and returns the correction (with systematic variations as configured here) obtained by evaluating the b-tagging scale factors on the jet
- Example:
>>> btvSF_b = get_bTagSF_itFit("btv.json", "deepJet", "btagDeepFlavB", 5, sel, syst_prefix="btagSF_", >>> decorr_eras=True, era="2018UL", >>> syst_mapping={"jesTotal": "jes"})
- Will result in the following systematic uncertainties:
btagSF_hfstats1_2018UL{up/down}: mapped to {up/down}_hfstats1 in the JSON
btagSF_hfstats2_2018UL{up/down}: mapped to {up/down}_hfstats2 in the JSON
btagSF_lfstats1_2018UL{up/down}: mapped to {up/down}_lfstats1 in the JSON
btagSF_lfstats2_2018UL{up/down}: mapped to {up/down}_lfstats2 in the JSON
btagSF_hf{up/down}: mapped to {up/down}_hf in the JSON
btagSF_lf{up/down}: mapped to {up/down}_lf in the JSON
- btagSF_jesTotal{up/down}: mapped to {up/down}_jes in the JSON, and correlated
with the jesTotal{up/down} variations in the analysis
(for c jets, the hf and lf variations are absent and replaced by cferr1 and cferr2)
- bamboo.scalefactors.get_correction(path, correction, params=None, systParam=None, systNomName='nominal', systVariations=None, systName=None, defineOnFirstUse=True, sel=None)[source]
Load a correction from a CMS JSON file
The JSON file is parsed with correctionlib. The contents and structure of a JSON file can be checked with the
correction
script, e.g.correction summary sf.json
- Parameters:
path – JSON file path
correction – name of the correction inside
CorrectionSet
in the JSON fileparams – parameter definitions, a dictionary of values or functions
systParam – name of the parameter (category axis) to use for systematic variations
systNomName – name of the systematic category axis to use as nominal
systVariations – systematic variations list or
{variation: name_in_json}
systName – systematic uncertainty name (to prepend to names, if ‘up’ and ‘down’)
defineOnFirstUse – wrap with
defineOnFirstUse()
(to define as a column and reuse afterwards), this is enabled by default since it is usually more efficientsel – a selection in the current graph (only used to retrieve a pointer to the backend)
- Returns:
a callable that takes
(object)
and returns the correction (with systematic variations, if present and unless a specific variation is requested) obtained by evaluating the remaining parameters with the object- Example:
>>> elIDSF = get_correction("EGM_POG_SF_UL.json", "electron_cutbased_looseID", >>> params={"pt": lambda el : el.pt, "eta": lambda el : el.eta}, >>> systParam="weight", systNomName="nominal", systName="elID", systVariations=("up", "down") >>> ) >>> looseEl = op.select(t.Electron, lambda el : el.looseId) >>> withDiEl = noSel.refine("withDiEl", >>> cut=(op.rng_len(looseEl) >= 2), >>> weight=[ elIDSF(looseEl[0]), elIDSF(looseEl[1]) ] >>> )
- bamboo.scalefactors.get_scalefactor(objType, key, combine=None, additionalVariables=None, sfLib=None, paramDefs=None, lumiPerPeriod=None, periods=None, getFlavour=None, isElectron=False, systName=None, seedFun=None, defineOnFirstUse=True)[source]
Construct a scalefactor callable
Warning
This function is deprecated. Use correctionlib and
get_correction()
instead.- Parameters:
objType – object type:
"lepton"
,"dilepton"
, or"jet"
key – key in
sfLib
(or tuple of keys, in case of a nested dictionary), or JSON path (or list thereof) ifsfLib
is not specifiedsfLib – dictionary (or nested dictionary) of scale factors. A scale factor entry is either a path to a JSON file, or a list of pairs
(periods, path)
, whereperiods
is a list of periods found inlumiPerPeriod
andpath
is a path to the JSON file with the scale factors corresponding to those run periods.combine – combination strategy for combining different run periods (
"weight"
or"sample"
)paramDefs – dictionary of binning variable definitions (name to callable)
additionalVariables – additional binning variable definitions (TODO: remove)
lumiPerPeriod – alternative definitions and relative weights of run periods
periods – Only combine scale factors for those periods
isElectron – if True, will use supercluster eta instead of eta (for
"lepton"
type only) (TODO: find a better way of doing that)systName – name of the associated systematic nuisance parameter
seedFun – (only when combining scalefactor by sampling) callable to get a random generator seed for an object, e.g.
lambda l : l.idx+42
defineOnFirstUse – wrap with
defineOnFirstUse()
(to define as a column and reuse afterwards), this is enabled by default since it is usually more efficient
- Returns:
a callable that takes
(object, variation="Nominal")
and returns a floating-point number proxy
- bamboo.scalefactors.lumiPerPeriod_default = {'Run2016B': 5750.491, 'Run2016C': 2572.903, 'Run2016D': 4242.292, 'Run2016E': 4025.228, 'Run2016F': 3104.509, 'Run2016G': 7575.824, 'Run2016H': 8650.628, 'Run2017B': 4793.97, 'Run2017C': 9632.746, 'Run2017D': 4247.793, 'Run2017E': 9314.581, 'Run2017F': 13539.905, 'Run2018A': 14027.614, 'Run2018B': 7066.552, 'Run2018C': 6898.817, 'Run2018D': 31747.582, 'Run271036to275783': 6274.191, 'Run275784to276500': 3426.131, 'Run276501to276811': 3191.207, 'Run315264to316360': 8928.97, 'Run316361to325175': 50789.746}
Integrated luminosity (in 1/pb) per data taking period
- bamboo.scalefactors.makeBtagWeightItFit(jets, sfGetter)[source]
Construct the full event weight based on b-tagging scale factors (continous/iterativeFit)
Combines the b-tagging scale factors into the event weight needed to correct the simulation (the event weight can then directly be passed to a selection), by making a product of the scale factors over all jets. See the note about correcting the normalization when using these scale factors.
- Parameters:
jets – the jet collection in the event
sfGetter – a callable that takes the the hadron flavour (int) and returns the correction object for the b-tagging scale factors of that jet flavour (i.e., itself a callable that takes the jet and returns the scale factor) See
bamboo.scalefactors.get_bTagSF_itFit()
.
- Returns:
a weight proxy (with all systematic variations configured in the scale factors)
- Example:
>>> btvSF = lambda flav: get_bTagSF_itFit("btv.json", "deepJet", "btagDeepFlavB", flav, ...) >>> btvWeight = makeBtagWeightItFit(tree.jet, btvSF) >>> sel_btag = sel.refine("btag", cut=..., weight=btvWeight)
- bamboo.scalefactors.makeBtagWeightMeth1a(jets, tagger, wps, workingPoints, sfGetter, effGetters)[source]
Construct the full event weight based on b-tagging scale factors (fixed working point) and efficiencies
Combines the b-tagging scale factors and MC efficiencies for fixed working points into the event weight needed to correct the simulation (the event weight can then directly be passed to a selection). The weight is computed according to Method 1a, # noqa: B950 with support for several working points.
While the scale factors can be loaded from the POG-provided JSON files, the efficiencies need to be computed by the analyzers.
- Parameters:
jets – the jet collection in the event
tagger – the name of the tagger in the event
wps – a list of working points for which the b tagging must be corrected, e.g. [“L”, “T”] for computing the weight using scale factors for the L and T working points. Note: always provide the working points in increasing order of “tightness”!
workingPoints – a dictionary providing the working point value for the discriminator
sfGetter – a callable that takes the working point (str) and the hadron flavour (int) and returns the correction object for the b-tagging scale factors of that working point and jet flavour (i.e., itself a callable that takes the jet and returns the scale factor). See
bamboo.scalefactors.get_bTagSF_fixWP()
.effGetters – a dictionary with keys = working points and values = callables that can be evaluated on a jet and return the b-tagging efficiency for that working point. Typically, these would be correction objects parameterized using the jet pt, eta and hadron flavour.
- Returns:
a weight proxy (with all systematic variations configured in the scale factors)
- Example:
>>> btvSF = lambda wp, flav: get_bTagSF_fixWP("btv.json", "deepJet", wp, flav, ...) >>> btvEff = {"M": get_correction("my_btag_eff.json", ...)} >>> btvWeight = makeBtagWeightMeth1a(tree.jet, "btagDeepFlavB", ["M"], {"M": 0.2783}, >>> btvSF, btvEff) >>> sel_btag = sel.refine("btag", cut=..., weight=btvWeight)
ROOT utilities
The bamboo.root
module collects a set of thin wrappers around ROOT
methods, and centralizes the import of the Cling interpreter global namespace
in PyROOT. For compatibility, it is recommended that user code uses
from bamboo.root import gbl
rather than import ROOT as gbl
or
from cppyy import gbl
.
- bamboo.root.findLibrary(libName)[source]
Check if a library can be found, and returns the path in that case
- bamboo.root.loadDependency(bambooLib=None, includePath=None, headers=None, dynamicPath=None, libraries=None)[source]
Load a C++ extension
- Parameters:
bambooLib – name(s) of the bamboo extension libraries, if any
includePath – include directory for headers
headers – headers to load explicitly (which can depend on other headers in the inclue path)
dynamicPath – dynamic library path to add
libraries – additional shared libraries to load