Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions TrkQual/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# TrkQual

## Introduction
The TrkQual algorithm is trained to classify tracks as either "high quality" or "low quality". At the moment, the model implemented in Offline is an Artificial Neural Network (see [here](https://github.com/Mu2e/Offline/blob/main/TrkDiag/src/TrackQuality_module.cc))

This README covers:
* where an analyzer can find things they might like to know (e.g. the definition of high quality and low quality),
* instructions for those interested in retraining or improving on the model, and
* a table of commits for each training version

## For the Interested Analyzer
The [jupyter notebook](TrkQualTrain.ipynb) contains lots of information close to the top of the file includings:
* EventNtuple datasets used (search for ```training_dataset```),
* definitions of high quality and low quality (search for ```high_qual``` and ```low_qual```),
* the features trained on (search for ```feature```), and
* the model structure (search for "Model Definitions")

## For the Interested (Re)Trainer
For those who are interested in either (a) retraining the current algorithm (e.g. we have updated reconstruction), or (b) investigating or updating an old model

### General Overview
There are two steps to releasing an updated TrkQual algorithm:

1. Train the algorithm and save the model as an ONNX file, and
1. Convert the trained model into C++ inference code to copy into Offline

Each of these will be done in a different environment.

### General Setup
You will need to create your own fork of the repository:

* go to www.github.com/Mu2e/MLTrain and click "fork"
* then in your terminal:
```
cd /path/to/your/work/area/

# only need to do this once
git clone https://www.github.com/YourGitHubUsername/MLTrain.git
cd MLTrain/
git remote add -f mu2e https://www.github.com/Mu2e/TrkQual.git

# do these whenever you are doing new development
git fetch mu2e main # get the latest and greatest
git checkout --no-track -b your-new-branchname mu2e/main
```

### Training a Model
For training, you need to ssh into a mu2egpvm machine with a port forwarded, and setup the correct python environment:

```
ssh -L XXXX:localhost:XXXX username@mu2egpvmYY.fnal.gov # XXXX is any port number, and YY is the gpvm number
cd /path/to/your/work/area/
mu2einit
pyenv rootana 2.0.0
```

You can start a jupyter notebook like so:

```
cd MLTrain/TrkQual
jupyter-notebook --no-browser --port=XXXX # XXXX is the same port that you forwarded when you ssh'd in
```

and copy and paste the URL to your browser to open it.

You will see a directory listing of the TrkQual directory. Click the TrkQualTrain.ipynb to open the notebook in your browser.

Make any changes that you want to make:
* if this is just a retraining with updated datasets, you can just change the dataset names in section "Common Definitions", and the ```training_version_numbers``` in "Model Definitions"
* if you want to add or remove features, you can do that in "Common Definitions" too
* if you want to add a new model, then you can do that in the cell that says ```A new model can go in this cell```
* if you want to modify the ANN1 model (e.g. change structure, or activation functioon), then I would copy it into this new cell and call it ANN2
* if you want to try a brandh new model (e.g. a BDT), then you may need to write a new ```save_func``` etc.

Once ready, click "Kernel->Restart & Run All". You will see a bunch of plots, including some comparisons to previous models. Your model will be saved in the model/ directory along with a ```*plots.root``` file containing histograms.


### Converting a Model for Use in Offline
There are two things we need to get the model running in Offline:
* a ```.hxx``` file containing code, and
* a ```.dat``` file containing parameters

For creating inference code, we use a different environment than for training:

```
ssh username@mu2egpvmYY.fnal.gov
cd /path/to/your/work/area/
mu2einit
muse setup EventNtuple
cd MLTrain/TrkQual/
```

You can then generate the inference code using TMVA::SOFIE like so:

```
root -l -b scripts/CreateInference.C\(\"TrkQual_ANN1_v2\"\)
```

If you did not change the model, then you should just need to copy the .dat file to Offline. However, we have found that TMVA::SOFIE sometimes changes the node names and so a new .hxx file is made with a new .dat file. If the structure of the ANN truly hasn't changed then, instead of copying the new .hxx file, you can convert the .dat file from the new format to the old format like so:

```
python3 scripts/sortdat.py code/TrkQual_ANN1_v2.dat code_TrkQual_ANN1_v2.dat_conv
```

(Note: you may need to change the new node names in the ```name_dict``` dictionary. The left-hand strings are the new names, and the right-hand strings are the names we want to convert to)

You can then copy the converted file to Offline like so:

```
cp code/TrkQual_ANN1_v2.dat_conv ../Offline/TrkDiag/data/TrkQual_ANN1_v2.dat
```

and make sure that the new .dat file is used in the TrackQuality module. (For example, change EventNtuple/fcl/prolog.fcl)

If you modified the ANN model, then you need to copy both the .hxx and .dat file

```
cp code/TrkQual_ANN2_v1.hxx ../Offline/TrkDiag/inc/
cp code/TrkQual_ANN2_v1.dat ../Offline/TrkDiag/data/
```

and make sure that the new model is implemented correctly the TrackQuality module.

If you trained a different model, then you are entering new territory and should discuss with experts how best to implement. Either:
* we make the ```TrackQuality``` module model agnostic, or
* we write separate ```TrackQuality``` modules for different models...

## Version History

| Model | Version | Commit |
|-------|---------|--------|
| ANN1 | v2 | `034f7c3` |
| ANN1 | v1.1 |`fd008e6` (previous repo) |
| ANN1 | v1 | `3d8a9b8` (previous repo) |
Loading