Reconstructing connectomes from human, macaque, and mouse cortex with Voxelytics

Interview with Daniel Werner

Today, we chat with Daniel Werner, data scientist at scalable minds. For their recent publication “Connectomic comparison of mouse and human cortex”, the Max Planck Institute for Brain Research commissioned scalable minds to process and analyze their microscopy data. 
Daniel played an essential role in this project, as he managed and carried out the connectomic reconstruction of 9 EM datasets.

Neuron reconstruction of Human cortex from volume EM. Connectomes for 9 datasets across species were generated with Voxelytics. Segmentation, connectome, and animation by scalable minds.

What is your role at scalable minds and how do you like it?
I have been employed as a software engineer and data scientist at scalable minds for about 10 years. My work there is very diverse and challenging. I constantly learn new things, which I really enjoy. Sometimes, I work on a typical software engineering problem or manage a project (like the one we are talking about today), and some other times, analyze reconstruction results or even read scientific papers to improve our machine learning models.
Most of all, it means a lot to me that my work is meaningful and contributes to a better understanding of the human body and diseases.

Can you tell us more about the project? What was it about, what was its scope…?
Sure. So, the lab contacted us, wanting to make a comparative study across three species: mouse, macaque, and human. For this, they had acquired multiple datasets and wanted us to process them. In the past, we usually focused on a single dataset at a time and iterated until we achieved satisfactory results. This was the first time we could not focus on every single dataset but instead needed a more general solution. We were aiming to generate connectomes for 9 different datasets, from different species and different microscopes: a real challenge! We first tried out our existing machine learning models to see how well they generalized. After the first encouraging results, our work officially started.
The project lasted about a year, from May 2021 to April 2022. Much of the time was spent evaluating the quality of the intermediate reconstruction results and training some new machine learning models for the species we hadn’t worked with before. Once we had everything in place and were clear on the workflow, the connectome reconstruction of the datasets was much faster: it took us one day to process one dataset… So 9 days for 9 datasets!
Generating a connectome means segmenting the neurons, classifying cell types, detecting the synapses and their types, and combining all these things. This is a very complex task for which you must handle massive amounts of data. To give you an idea, each dataset we processed is between 200 GB to 1 TB in size and most of them have around 700.000 synapses — one mouse dataset even has 1.5 million synapses.


Exploring a connectome in webKnossos: segmented EM data and agglomerate skeleton of a neuron with all its synapses.

This is why it means a lot to us that this study was such a great success. The paper has been published in Science, and this is the first human connectome ever to have been accepted in a scientific journal! The scientists extracted some significant insights, advancing brain research one step further.

9 datasets in 9 days, this sounds like a challenge. How did you manage to do that?
Well, this was essentially possible thanks to our machine learning toolbox Voxelytics. We established a reusable workflow with pre-trained ML models, which enabled us to compute a full connectome from the raw EM data.
Let me go a bit deeper into this whole process: Voxelytics offers multiple pre-trained ML models to, for example, segment cells, predict types, and detect synapses. Additionally, it comes with a variety of predefined tasks, for example, tasks to train a model from scratch, perform voxel prediction inference, agglomerate a segmentation, or compute a connectome. From these tasks, you can establish a workflow in which you declaratively describe what the reconstruction pipeline should look like and configure the parameters for each step. You can then use this pipeline as often as you like. In our case: put a new dataset in, get a new connectome out. Of course, we iterated a little on the pipeline at the beginning of the project. But once it was established, we simply had to run it on all datasets — with one click.


High-level diagram of the pipeline we used to process datasets: connectome reconstruction from raw EM data.

To configure the workflow, we usually choose a small bounding box of the global dataset and run the pipeline for the first time, “exploratively”. We look at the results of the intermediate steps with WEBKNOSSOS. This software is great for quickly looking at the data and for deep error investigations. How good is the segmentation? Which agglomeration state is the best? What is the precision and recall of the synapse detection? We then tweak some parameters and run the next iteration.


Example of two tasks run in Voxelytics and their results viewed in webKnossos

The reports offered by Voxelytics are really helpful not to lose track of these iterations. They contain the setup of the pipeline, the configuration of the tasks, the links to each version to visualize in webKnossos, and more.
And last but not least: Voxelytics is designed for working on HPC (High-performance computing) clusters. This means that Voxelytics breaks down the datasets into smaller chunks and distributes the work over the cluster. The different parts get processed in parallel and are combined in the end to give a coherent result. Intermediate results are persisted and computations can be paused, resumed, and reused.
Okay, so if I understand correctly, during this explorative phase where you try to define the pipeline, you regularly switch between Voxelytics and WEBKNOSSOS. How complicated is this switching?
This inspection of intermediate results is essential which is why we made it very easy. Let me show you a video that will explain it better than words!

Workflow examples using Voxelytics and webKnossos.

This looks like a very smooth workflow indeed. Did you encounter any challenges at all?
Oh for sure, we did. Earlier, I told you that right at the beginning we had some “very encouraging results”. This is completely true for the 5 Mouse datasets — almost no manual tweaking was needed, although the datasets were obtained using different microscopes! However, our pipeline did not generalize very well for the two other species, macaque and human, which makes sense since it had never seen such data before. For example, there are significantly fewer synapses per volume in primates than in mice, which our model did not know about. So we had to adjust our pipeline and re-train some of the models with added species-specific training data to obtain satisfying results. Our collaborators at the Max Planck Institute were of great help and worked closely with us, for example by delivering the needed training and evaluation data very fast.

Another challenge was the so-called super-mergers in some of the datasets. To eliminate them, we implemented a new task in our pipeline which we called “soma exclusion”. This task will automatically correct merge errors if it finds two somata in one process or if an axon and a dendrite are touching without a soma in between.

In my view, that’s what’s great about encountering such challenges. They force us to come up with solutions that improve our pipeline and that all future reconstructions will benefit from!