Building expressions
In order to efficiently process the input files, bamboo builds up an object representation
of the expressions (cuts, weights, plotted variables) needed to fill the histograms, and
dynamically generates C++ code that is passed to RDataFrame.
The expression trees are built up throug proxy classes, which mimic the final type (there are
e.g. integer and floating-point number proxy classes that overload the basic mathematical operators),
and generate a new proxy when called.
As an example: t.Muon[0].charge
gives an integer proxy to the operation corresponding
to Muon_charge[0]
; when the addition operator is called in t.Muon[0].charge+t.Muon[1].charge
,
an integer proxy to (the object representation of) Muon_charge[0]+Muon_charge[1]
is returned.
The proxy classes try to behave as much as possible as the objects they represent, so in most cases
they can be used as if they really were a number, boolean, momentum fourvector… or a muon,
electron, jet etc.—simple ‘struct’ types for those are generated when decorating the tree,
based on the branches that are found.
Some operations, however, cannot easily be implemented in this way, for instance mathematical functions and
operations on containers. Therefore, the bamboo.treefunctions
module provides a set of
additional helper methods ( such that the user does not need to know about the implementation details
in the bamboo.treeoperations
and bamboo.treeproxies
modules). In order to keep
the analysis code compact, it is recommended to import it with
from bamboo import treefunctions as op
inside every analysis module. The available functions are listed below.
List of functions
- bamboo.treefunctions.typeOf(arg)[source]
Get the inferred C++ type of a bamboo expression (proxy or TupleOp)
- bamboo.treefunctions.c_int(num, typeName='int', cast=None)[source]
- Construct an integer number constant (static_cast inserted automatically if not ‘int’,
a boolean can be passed to ‘cast’ to force or disable this)
- bamboo.treefunctions.c_float(num, typeName='double', cast=None)[source]
- Construct a floating-point number constant (static_cast inserted automatically if not ‘double’,
a boolean can be passed to ‘cast’ to force or disable this)
- bamboo.treefunctions.switch(test, trueBranch, falseBranch, checkTypes=True)[source]
Pick one or another value, based on a third one (ternary operator in C++)
- Example:
>>> op.switch(runOnMC, mySF, 1.) ## incomplete pseudocode
- bamboo.treefunctions.multiSwitch(*args)[source]
Construct arbitrary-length switch (if-elif-elif-…-else sequence)
- Example:
>>> op.multiSwitch((lepton.pt > 30, 4.), (lepton.pt > 15 && op.abs(lepton.eta) < 2.1, 5.), 3.)
is equivalent to:
>>> if lepton.pt > 30: >>> return 5. >>> elif lepton.pt > 15 and abs(lepton.eta) < 2.1: >>> return 4. >>> else: >>> return 3.
- bamboo.treefunctions.extMethod(name, returnType=None)[source]
Retrieve a (non-member) C(++) method
- Parameters:
name – name of the method
returnType – return type (otherwise deduced by introspection)
- Returns:
a method proxy, that can be called and returns a value decorated as the return type of the method
- Example:
>>> phi_0_2pi = op.extMethod("ROOT::Math::VectorUtil::Phi_0_2pi") >>> dphi_2pi = phi_0_2pi(a.Phi()-b.Phi())
- bamboo.treefunctions.extVar(typeName, name)[source]
Use a variable or object defined outside bamboo
- Parameters:
typeName – C++ type name
name – name in the current scope
- Returns:
a proxy to the variable or object
- bamboo.treefunctions.construct(typeName, args)[source]
Construct an object
- Parameters:
typeName – C++ type name
args – constructor argumnts
- Returns:
a proxy to the constructed object
- bamboo.treefunctions.static_cast(typeName, arg)[source]
Compile-time type conversion
mostly for internal use, prefer higher-level functions where possible
- Parameters:
typeName – C++ type to cast to
arg – value to cast
- Returns:
a proxy to the casted value
- bamboo.treefunctions.initList(typeName, valueType, elements)[source]
Construct a C++ initializer list
mostly for internal use, prefer higher-level functions where possible
- Parameters:
typeName – C++ type to use for the proxy (note that initializer lists do not have a type)
valueType – C++ type of the elements in the list
elements – list elements
- Returns:
a proxy to the list
- bamboo.treefunctions.array(valueType, *elements)[source]
Helper to make a constructing a std::array easier
- Parameters:
valueType – array element C++ type
elements – array elements
- Returns:
a proxy to the array
- bamboo.treefunctions.define(typeName, definition, nameHint=None)[source]
Define a variable as a symbol with the interpreter
- Parameters:
typeName – result type name
definition – C++ definition string, with
<<name>>
instead of the variable name (which will be replaced by nameHint or a unique name)nameHint – (optional) name for the variable
Caution
nameHint (if given) should be unique (for the sample), otherwise an exception will be thrown
- bamboo.treefunctions.defineOnFirstUse(sth)[source]
Construct an expression that will be precalculated (with an RDataFrame::Define node) when first used
This may be useful for expensive function calls, and should prevent double work in most cases. Sometimes it is useful to explicitly insert the Define node explicitly, in that case
bamboo.analysisutils.forceDefine()
can be used.
- bamboo.treefunctions.abs(sth)[source]
Return the absolute value
- Example:
>>> op.abs(t.Muon[0].p4.Eta())
- bamboo.treefunctions.sign(sth)[source]
Return the sign of a number
- Example:
>>> op.sign(t.Muon[0].p4.Eta())
- bamboo.treefunctions.sum(*args, **kwargs)[source]
Return the sum of the arguments
- Example:
>>> op.sum(t.Muon[0].p4.Eta(), t.Muon[1].p4.Eta())
- bamboo.treefunctions.product(*args)[source]
Return the product of the arguments
- Example:
>>> op.product(t.Muon[0].p4.Eta(), t.Muon[1].p4.Eta())
- bamboo.treefunctions.sqrt(sth)[source]
Return the square root of a number
- Example:
>>> m1, m2 = t.Muon[0].p4, t.Muon[1].p4 >>> m12dR = op.sqrt( op.pow(m1.Eta()-m2.Eta(), 2) + op.pow(m1.Phi()-m2.Phi(), 2) )
- bamboo.treefunctions.pow(base, exp)[source]
Return a power of a number
- Example:
>>> m1, m2 = t.Muon[0].p4, t.Muon[1].p4 >>> m12dR = op.sqrt( op.pow(m1.Eta()-m2.Eta(), 2) + op.pow(m1.Phi()-m2.Phi(), 2) )
- bamboo.treefunctions.exp(sth)[source]
Return the exponential of a number
- Example:
>>> op.exp(op.abs(t.Muon[0].p4.Eta()))
- bamboo.treefunctions.log(sth)[source]
Return the natural logarithm of a number
- Example:
>>> op.log(t.Muon[0].p4.Pt())
- bamboo.treefunctions.log10(sth)[source]
Return the base-10 logarithm of a number
- Example:
>>> op.log10(t.Muon[0].p4.Pt())
- bamboo.treefunctions.sin(sth)[source]
Return the sine of a number
- Example:
>>> op.sin(t.Muon[0].p4.Phi())
- bamboo.treefunctions.cos(sth)[source]
Return the cosine of a number
- Example:
>>> op.cos(t.Muon[0].p4.Phi())
- bamboo.treefunctions.tan(sth)[source]
Return the tangent of a number
- Example:
>>> op.tan(t.Muon[0].p4.Phi())
- bamboo.treefunctions.asin(sth)[source]
Return the arcsine of a number
- Example:
>>> op.asin(op.c_float(3.1415))
- bamboo.treefunctions.acos(sth)[source]
Return the arccosine of a number
- Example:
>>> op.ascos(op.c_float(3.1415))
- bamboo.treefunctions.atan(sth)[source]
Return the arctangent of a number
- Example:
>>> op.atan(op.c_float(3.1415))
- bamboo.treefunctions.max(a1, a2)[source]
Return the maximum of two numbers
- Example:
>>> op.max(op.abs(t.Muon[0].eta), op.abs(t.Muon[1].eta))
- bamboo.treefunctions.min(a1, a2)[source]
Return the minimum of two numbers
- Example:
>>> op.min(op.abs(t.Muon[0].eta), op.abs(t.Muon[1].eta))
- bamboo.treefunctions.in_range(low, arg, up)[source]
Check if a value is inside a range (boundaries excluded)
- Example:
>>> op.in_range(10., t.Muon[0].p4.Pt(), 20.)
- bamboo.treefunctions.withMass(arg, massVal)[source]
Construct a Lorentz vector with given mass (taking the other components from the input)
- Example:
>>> pW = withMass((j1.p4+j2.p4), 80.4)
- bamboo.treefunctions.invariant_mass(*args)[source]
Calculate the invariant mass of the arguments
- Example:
>>> mElEl = op.invariant_mass(t.Electron[0].p4, t.Electron[1].p4)
Note
Unlike in the example above,
bamboo.treefunctions.combine()
should be used to make N-particle combinations in most practical cases
- bamboo.treefunctions.invariant_mass_squared(*args)[source]
Calculate the squared invariant mass of the arguments using
ROOT::Math::VectorUtil::InvariantMass2
- Example:
>>> m2ElEl = op.invariant_mass2(t.Electron[0].p4, t.Electron[1].p4)
- bamboo.treefunctions.deltaPhi(a1, a2)[source]
Calculate the difference in azimutal angles (using
ROOT::Math::VectorUtil::DeltaPhi
)- Example:
>>> elelDphi = op.deltaPhi(t.Electron[0].p4, t.Electron[1].p4)
- bamboo.treefunctions.deltaR(a1, a2)[source]
Calculate the Delta R distance (using
ROOT::Math::VectorUtil::DeltaR
)- Example:
>>> elelDR = op.deltaR(t.Electron[0].p4, t.Electron[1].p4)
- bamboo.treefunctions.rng_len(sth)[source]
Get the number of elements in a range
- Parameters:
rng – input range
- Example:
>>> nElectrons = op.rng_len(t.Electron)
- bamboo.treefunctions.rng_sum(rng, fun=<function <lambda>>, start=None)[source]
Sum the values of a function over a range
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
start – initial value (0. by default)
- Example:
>>> totalMuCharge = op.rng_sum(t.Muon, lambda mu : mu.charge)
- bamboo.treefunctions.rng_count(rng, pred=None)[source]
Count the number of elements passing a selection
- Parameters:
rng – input range
pred – selection predicate (a callable that takes an element of the range and returns a boolean)
- Example:
>>> nCentralMu = op.rng_count(t.Muon, lambda mu : op.abs(mu.p4.Eta() < 2.4))
- bamboo.treefunctions.rng_product(rng, fun=<function <lambda>>)[source]
Calculate the production of a function over a range
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Example:
>>> overallMuChargeSign = op.rng_product(t.Muon, lambda mu : mu.charge)
- bamboo.treefunctions.rng_max(rng, fun=<function <lambda>>)[source]
Find the highest value of a function in a range
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Example:
>>> mostForwardMuEta = op.rng_max(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_min(rng, fun=<function <lambda>>)[source]
Find the lowest value of a function in a range
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Example:
>>> mostCentralMuEta = op.rng_min(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_max_element_index(rng, fun=<function <lambda>>)[source]
Find the index of the element for which the value of a function is maximal
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Returns:
the index of the maximal element in the base collection if rng is a collection, otherwise (e.g. if rng is a vector or array proxy) the index of the maximal element in rng
- Example:
>>> i_mostForwardMu = op.rng_max_element_index(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_max_element_by(rng, fun=<function <lambda>>)[source]
Find the element for which the value of a function is maximal
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Example:
>>> mostForwardMu = op.rng_max_element_by(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_min_element_index(rng, fun=<function <lambda>>)[source]
Find the index of the element for which the value of a function is minimal
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Returns:
the index of the minimal element in the base collection if rng is a collection, otherwise (e.g. if rng is a vector or array proxy) the index of the minimal element in rng
- Example:
>>> i_mostCentralMu = op.rng_min_element_index(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_min_element_by(rng, fun=<function <lambda>>)[source]
Find the element for which the value of a function is minimal
- Parameters:
rng – input range
fun – function whose value should be used (a callable that takes an element of the range and returns a number)
- Example:
>>> mostCentralMu = op.rng_min_element_by(t.Muon. lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_mean(rng)[source]
Return the mean of a range
- Parameters:
rng – input range
- Example:
>>> pdf_mean = op.rng_mean(t.LHEPdfWeight)
- bamboo.treefunctions.rng_stddev(rng)[source]
Return the (sample) standard deviation of a range
- Parameters:
rng – input range
- Example:
>>> pdf_uncertainty = op.rng_stddev(t.LHEPdfWeight)
- bamboo.treefunctions.rng_any(rng, pred=<function <lambda>>)[source]
Test if any item in a range passes a selection
- Parameters:
rng – input range
pred – selection predicate (a callable that takes an element of the range and returns a boolean)
- Example:
>>> hasCentralMu = op.rng_any(t.Muon. lambda mu : op.abs(mu.p4.Eta()) < 2.4)
- bamboo.treefunctions.rng_find(rng, pred=<function <lambda>>)[source]
Find the first item in a range that passes a selection
- Parameters:
rng – input range
pred – selection predicate (a callable that takes an element of the range and returns a boolean)
- Example:
>>> leadCentralMu = op.rng_find(t.Muon, lambda mu : op.abs(mu.p4.Eta()) < 2.4)
- bamboo.treefunctions.select(rng, pred=<function <lambda>>)[source]
Select elements from the range that pass a cut
- Parameters:
rng – input range
pred – selection predicate (a callable that takes an element of the range and returns a boolean)
- Example:
>>> centralMuons = op.select(t.Muon, lambda mu : op.abs(mu.p4.Eta()) < 2.4)
- bamboo.treefunctions.sort(rng, fun=<function <lambda>>)[source]
Sort the range (ascendingly) by the value of a function applied on each element
- Parameters:
rng – input range
fun – function by whose value the elements should be sorted
- Example:
>>> muonsByCentrality = op.sort(t.Muon, lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.map(rng, fun, valueType=None)[source]
Create a list of derived values for a collection
This is useful for storing a derived quantity each item of a collection on a skim, and also for filling a histogram for each entry in a collection.
- Parameters:
rng – input range
fun – function to calculate derived values
valueType – stored return type (optional,
fun(rng[i])
should be convertible to this type)
- Example:
>>> muon_absEta = op.map(t.Muon, lambda mu : op.abs(mu.p4.Eta()))
- bamboo.treefunctions.rng_pickRandom(rng, seed=0)[source]
Pick a random element from a range
- Parameters:
rng – range to pick an element from
seed – seed for the random generator
Caution
empty placeholder, to be implemented
- bamboo.treefunctions.svFitMTT(MET, lepton1, lepton2, category)[source]
Calculate the mass of the reconstructed tau pair using the SVfit algorithm. It employs the ClassicSVfit method.
- Parameters:
MET – Missing transverse energy. It must include a covariance matrix.
lepton1 – 1st lepton (e/mu/tau) from the tau pair.
lepton2 – 2nd lepton (e/mu/tau) from the tau pair.
category – Tau pair category. Only “1mu1tau,” “1ele1tau,” and “2tau” are supported.
Caution
This function works only if the SVfit package is installed.
- bamboo.treefunctions.svFitFastMTT(MET, lepton1, lepton2, category)[source]
Calculate the four-vector of the reconstructed tau pair using the SVfit algorithm. It employs the FastMTT method.
- Parameters:
MET – Missing transverse energy. It must include a covariance matrix.
lepton1 – 1st lepton (e/mu/tau) from the tau pair.
lepton2 – 2nd lepton (e/mu/tau) from the tau pair.
category – Tau pair category. Only “1mu1tau,” “1ele1tau,” and “2tau” are supported.
Caution
This function works only if the SVfit package is installed.
- bamboo.treefunctions.combine(rng, N=None, pred=<function <lambda>>, samePred=<function <lambda>>)[source]
Create N-particle combination from one or several ranges
- Parameters:
rng – range (or iterable of ranges) with basic objects to combine
N – number of objects to combine (at least 2), in case of multiple ranges it does not need to be given (
len(rng)
will be taken; if specified they should match)pred – selection to apply to candidates (a callable that takes the constituents and returns a boolean)
samePred – additional selection for objects from the same base container (a callable that takes two objects and returns a boolean, it needs to be true for any sorted pair of objects from the same container in a candidate combination). The default avoids duplicates by keeping the indices (in the base container) sorted;
None
will not apply any selection, and consider all combinations, including those with the same object repeated.
- Example:
>>> osdimu = op.combine(t.Muon, N=2, pred=lambda mu1,mu2 : mu1.charge != mu2.charge) >>> firstosdimu = osdimu[0] >>> firstosdimu_Mll = op.invariant_mass(firstosdimu[0].p4, firstosdimu[1].p4) >>> oselmu = op.combine((t.Electron, t.Muon), pred=lambda el,mu : el.charge != mu.charge) >>> trijet = op.combine(t.Jet, N=3, samePred=lambda j1,j2 : j1.pt > j2.pt) >>> trijet = op.combine( >>> t.Jet, N=3, pred=lambda j1,j2,j3 : op.AND(j1.pt > j2.pt, j2.pt > j3.pt), samePred=None)
Note
The default value for
samePred
undoes the sorting that may have been applied between the base container(s) and the argument(s) inrng
. The third and fourth examples above are equivalent, and show how to get three-jet combinations, with the jets sorted by decreasing pT. The latter is more efficient since it avoids the unnecessary comparisonj1.pt > j3.pt
, which follows from the other two. In that case no other sorting should be done, otherwise combinations will only be retained if sorted by both criteria; this can be done by passingsamePred=None
.samePred=(lambda o1,o2 : o1.idx != o2.idx)
can be used to get all permutations.
- bamboo.treefunctions.systematic(nominal, name=None, **kwargs)[source]
Construct an expression that will change under some systematic variations
This is useful when e.g. changing weights for some systematics. The expressions for different variations are assumed (but not checked) to be of the same type, so this should only be used for simple types (typically a number or a range of numbers); containers etc. need to be taken into account in the decorators.
- Example:
>>> psWeight = op.systematic(tree.ps_nominal, name="pdf", up=tree.ps_up, down=tree.ps_down) >>> addSys10percent = op.systematic( >>> op.c_float(1.), name="additionalSystematic1", up=op.c_float(1.1), down=op.c_float(0.9)) >>> importantSF = op.systematic(op.c_float(1.), mySF_systup=op.c_float(1.1), mySF_systdown=op.c_float(0.9), mySF_statup=1.04, mySF_statdown=.97)
- Parameters:
nominal – nominal expression
kwargs – alternative expressions. “up” and “down” (any capitalization) will be prefixed with name, if given
name – optional name of the systematic uncertainty source to prepend to “up” or “down”
- bamboo.treefunctions.getSystematicVariations(expr)[source]
Get the list of systematic variations affecting an expression
- bamboo.treefunctions.forSystematicVariation(expr, varName)[source]
Get the equivalent expression with a specific systematic uncertainty variation
- Parameters:
expr – an expression (or proxy)
varName – name of the variation (e.g.
jesTotalup
)
- Returns:
the expression for the chosen variation (frozen, so without variations)
- class bamboo.treefunctions.MVAEvaluator(evaluate, returnType=None, toArray=False, toVector=True, useSlots=False)[source]
Small wrapper to make sure MVA evaluation is cached
- bamboo.treefunctions.mvaEvaluator(fileName, mvaType=None, otherArgs=None, nameHint=None)[source]
Declare and initialize an MVA evaluator
The C++ object is defined (with
bamboo.treefunctions.define()
), and can be used as a callable to evaluate. The result of any evaluation will be cached automatically.Currently the following formats are supported:
.xml (
mvaType='TMVA'
) TMVA weights file, evaluated with aTMVA::Experimental::RReader
.pt (
mvaType='Torch'
) pytorch script files (loaded withtorch::jit::load
).- .pb (
mvaType='Tensorflow'
) tensorflow graph definition (loaded with Tensorflow-C). The
otherArgs
keyword argument should be(inputNodeNames, outputNodeNames)
, where each of the two can be a single string, or an iterable of them. In the case of multiple input nodes, the input values for each should also be passed as separate arguments when evaluating (see below). Input values for multi-dimensional nodes should be flattened (row-order per node, and then the different nodes). The output will be flattened in the same way if the output node has more than one dimension, or if there are multiple output nodes.
- .pb (
.json (
mvaType='lwtnn'
) lwtnn json. TheotherArgs
keyword argument should be passed the lists of input and output nodes/values, as C++ initializer list strings, e.g.'{ { "node_0", "variable_0" }, { "node_0", "variable_1" } ... }'
and'{ "out_0", "out_1" }'
..onnx (
mvaType='ONNXRuntime'
) ONNX file, evaluated with ONNX Runtime. TheotherArgs
keyword argument should the name of the output node (or a list of those).hxx (
mvaType='SOFIE'
) ROOT SOFIE generated header file TheotherArgs
keyword argument should be the path to the.dat
weights file (if not specified, it will taken by replacing the weight file extension from.hxx
to.dat
). Note: only available in ROOT>=6.26.04.
- Parameters:
fileName – file with MVA weights and structure
mvaType – type of MVA, or library used to evaluate it (Tensorflow, Torch, or lwtnn). If absent, this is guessed from the fileName extension
otherArgs – other arguments to construct the MVA evaluator (either as a string (safest), or as an iterable)
nameHint – name hint, see
bamboo.treefunctions.define()
- Returns:
a proxy to a method that takes the inputs as arguments, and returns a
std::vector<float>
of outputs
For passing the inputs to the evaluator, there are two options
if a list of numbers is passed, as in the example below, they will be converted to an array of
float
(with astatic_cast
). The rationale is that this is the most common simple case, which should be made as convenient as possible.if the MVA takes inputs in a different type than
float
or has multiple input nodes (supported for Tensorflow and ONNX Runtime), an array-like object of the correct type should be passed for each of the input nodes. No other conversions will be automatically inserted, so these should be done when constructing the inputs (e.g. witharray()
andinitList()
)). This is a bit more work, but gives maximal control over the generated code.
- Example:
>>> mu = tree.Muon[0] >>> nn1 = mvaEvaluator("nn1.pt") >>> Plot.make1D("mu_nn1", nn1(mu.pt, mu.eta, mu.phi), hasMu)
Warning
By default the MVA output will be added as a column (
Define
node in the RDataFrame graph) when used, because it is almost always more efficient. In some cases, e.g. if the MVA should only be evaluated if some condition is true, this can cause problems. To avoid this, defineOnFirstUse=False should be passed when calling the evaluation, e.g. nn1(mu.pt, mu.eta, mu.phi, defineOnFirstUse=False) in the example above.