Most likely, you are already aware of energy and momentum conservation laws.
Energy and momentum of an isolated system should be constant,
or as they say, conserved.
There are plenty of other quantities in physics that are
kept at constant level during an isolated system of illusion.
Thanks to one of the most profound theorems
proven at the first half of 20th century by Emmy Noether
every conservation law is tightly connected to certain kind of symmetry in the universe.
For example, energy consideration is connected to time uniformity.
For example, it doesn't matter if you start the experiment today or tomorrow,
it will develop the same way under assumption of isolated system of course.
Let's recall what Lepton Flavor is.
For every decay, we can compute the number of
the first generation lepton second-generation leptons and third-generation leptons.
Similarly, there is a quark flavor number.
It is known not to be conserved or violated.
Look at the decay of B meson.
It consist of Anti-B and s quark.
After transformation of W boson,
we end up with a charm,
anti-charm pair that is called gepside and a strange,
anti-strange pair that joins into particle that is called Phi.
Neutrinos can also transform into each other.
The Nobel Prize in 2015 was awarded to Takaaki Kajita
and Arthur McDonald's exactly for predictions of this possibility.
However, standard model predictions for charged lepton is negligible.
At the same time, so the expectation for charged Lepton Flavor Violation are much higher,
one of the simplest decay that conserve energy charge lepton number,
but violates the lepton flavor number is presented at the slide bottom.
It is a transformation of muon into an electron and a gamma quant.
So on this slide,
you see example of more complicated decays that violate lepton flavor.
So for example on top right diagram,
you have a Tau particle that decays into three muons.
Probability of this decay in Standard Model is very low,
so it is less than 10 to -40,
so we can not measure process of
such probability using LHC or other existing technologies.
But according to new physics predictions for example in supersymmetry,
such decay can happen thanks to other particles or
bosons that do not exist in regular standard model.
So you can see on the right example and you can see below examples of
such decays and probability of those might
be much higher than predictions from standard model.
The analysis strategy for the decay of Tau to three muons could be the following.
Since we are looking for the decay of something like depicted below,
we want our trigger to catch muons that come from a single vertex.
Having three of those reduces the amount of background quite drastically.
Additional constraint comes from the fact that
Tau flies some distance from the proton collision point,
so the source of those muons should be distant from primary vertex.
There might be other restriction on muon momentum and energy.
Then we have to design an event selection technique based on machine learning.
But not to spoil the fun,
we have to hide particular part of the data before we are happy with the classifier.
This is called blinding.
We train our classifier on the mixture of data and simulated data.
As we're satisfied whether its properties,
we can apply to signal region and estimate number of events passing the selection,
but then we have to convert this number into branching
fractions somehow to compare it with the predictions of standard model.
That's why we might need normalization and calibration channel.
If we apply the same selection to this channel,
we get the number of events that correspond to the well known branching fraction.
Since the topology of the calibration channel is very similar,
it is assumed that the ratio of counted events to branching fraction should be the same.
So why blending is useful.
So let's recap that signal region it is region of mass spectrum with high probability of
a signal and the probability of a signal is very
different from the probability of a background at this region.
So it is due to the Feynman diagram and the nature of the process of the decay.
So in this region is hidden during
analysis to avoid psychological or experimental is bias.
So to avoid having decisions on which
cut should we apply or when should we stop analysis or search for bug,
just looking at the data that we have to analyze and come to the final conclusion.
So this plot shows a hypothetical distributions for example
signal is in blue color and distribution of the background in
the black and the innermost region is the signal region.
The distribution in the outer region is
used to interpolate the background contribution in the signal region,
and the narrow regions are used for analysis optimization.
In rare decays searches,
blinding is done by defining the entire analysis prior to
evaluating the part of the data in which your signal is saught for.
This part also referred to as signal region.
In case of Tau to three muons,
the candidates with invariant mass between M Tau minus 20 MeV.
M Tau plus 20 MeV were removed from the analysis
from the dataset for the development of the strategy and the classifier optimization.
Once the analysis is defined,
the signal region is analyzed.
This means that the number of candidates is
evaluated and can be compared to the expectation.
So, to discriminate between signal and background,
we should include such features as vertex fit quality.
How well muons actually come together,
displacement from the primary vertex to
how distant the trajectory of a particle or a secondary vertex from the primary vertex.
Track quality, track isolation,
and samples that are used for training of classifier are
taken from Monte Carlo simulation for the signal and real data for the background.
Similar channel with similar topology of the s decaying into Phi and pi,
used for calibration and normalization of the classifier,
and as a metric or proxy metrics because we are interested in branching fraction,
which might not be directly related to this metric.
We use a AUC area and ROC curve metric.
So, as we mentioned before,
the signal in the mass region or a single region,
have very different shape from the background.
So, you can probably spot the problem,
that if we give mass or feature that correlates with the mass to the classifier,
we might get a biased estimation of number of background classifier.
Number of background events will come to this point a little bit later.
But let's continue with the strategy.
For example, we got the model,
we got the classifier that gives the best area under ROC curve we can imagine.
Of course, there is a question if two classifiers get the same error in the ROC curve,
which one should we choose?
But let's, for the educational part,
leave this question aside.
Get the best threshold for such a classifier that maximizes this fraction on the slide.
Essentially, it is true positive rate squared or number of signals squared
over number of background that are misclassified as a signal,
which gives us roughly estimation of efficiency of our classifier.
Then, we apply such classifier with such a threshold though our real-data sample,
with signal region still hidden.
So, we estimate amount of background events in
the signal region by extrapolating side bends to the signal region.
We'll show it on the next slide.
Then, we unblind this signal region,
and one, apply classifier to it and count the number of events in this region.
Then, we unblind or apply
this classifier to the same signal region of normalization channel,
and count number of events which is Ncal there.
Then, we check hypotheses p-value.