Retraining the pre-trained network#
Retraining the classification network is often the key step to ensure high-performance of cellfinder. We recommend that the presupplied network is retrained for each new application (e.g. microscope, labelling strategy etc.). The design of the cellfinder software means that this is different, but often simpler than other deep-learning-based analysis tools you may have used.
Theory#
Hint
It may be useful to consult the original PLOS Computational Biology paper to get a better idea of the ideas behind the software.
Pre-trained network#
cellfinder is supplied with a network that was trained on approximately 100,000 manually annotated cell candidates (with a roughly 50/50 split between cells and non-cells). These came from serial-section two-photon data with whole-cell labelling. This is likely to be different from your data in some ways, e.g.:
Microscopy technique
Fluorescent label
Labelled brain regions
Labelled cell types
However, we usually find that this network is a good starting point in a new analysis.
Workflow#
The typical workflow for using cellfinder on new data is:
Run cellfinder using the pre-trained network (or a network you or a collaborator has already trained)
Assess the performance of the network
Generate training data to “correct” the network in areas it has performed poorly
Retrain the network
Run cellfinder with the new network
Repeat steps 2 to 5
Considering generating training data requires the input of a skilled human, but the other steps can be run automatically, we suggest that only small amounts of training data are generated at a time. Training data can be pooled from separate batches of annotations, so the network can be iteratively improved step by step.
Training data generation strategy#
There is no “correct” way to create training data, but it is usually best to target the areas in which the current network performs worse. Typically, we recommend generating 1000-5000 cell candidates with a roughly even split between;
Correctly classified cells
Correctly classified artefacts
Incorrectly classified cells
Incorrectly classified artefacts
This process would usually take 2-4 hours (with practice it usually becomes much quicker!). Once the network is retrained, the process can be repeated until performance is satisfactory for the application.
Caution
Make sure you save all your training data, you can reuse it later (or share it with others).
Training the network#
Generate training data#
To generate training data, you will need:
The cellfinder output file,
cell_classification.xml
as saved using either the napari plugin or the Python API (it’s automatically saved in thebrainmapper
points
subdirectory).The raw data used initially for cellfinder
To generate training data for a single brain, use the napari plugin.
Start training#
You can then use these yaml files for training, either using the napari plugin, or the following command-line tool.
Hint
If you would like to use the data that was originally used to train the supplied network, please see Using supplied training data
cellfinder_train -y yaml_1.yml yaml_2.yml -o /path/to/output/directory/
Arguments#
-y
or--yaml
The path to the yaml files defining training data-o
or--output
Output directory for the trained model (or model weights)
Optional
--continue-training
Continue training from an existing trained model. If no model or model weights are specified, this will continue from the included model.--trained-model
Path to a trained model to continue training--model-weights
Path to existing model weights to continue training--network-depth
Resnet depth (based on He et al. (2015)). Choose from(18, 34, 50, 101 or 152). In theory, a deeper network should classify better,
at the expense of a larger model, and longer training time. Default: 50
--batch-size
Batch size for training (how many cell candidates to process at once). Default: 16--epochs
How many times to use each sample for training. Default: 1000--test-fraction
What fraction of data to keep for validation. Default: 0.1--learning-rate
Learning rate for training the model--no-augment
Do not use data augmentation--save-weights
Only store the model weights, and not the full model. Useful to save storage space.--no-save-checkpoints
Do not save the model after each training epoch. Useful to save storage space if you are happy to wait for the chosen number of epochs to complete. Each model file can be large, and if you don’t have much training data, they can be generated quickly.--tensorboard
Log tooutput_directory/tensorboard
. Usetensorboard --logdir outputdirectory/tensorboard
to view.--save-progress
Save training progress to a .csv file (output_directory/training.csv
).
Further help#
All cellfinder_train
options can be found by running:
cellfinder_train -h