Interacting with cloud atlas data through Python#

This example demonstrates how to calculate the volume of the retrosplenial area in the Allen mouse brain atlas at 25um resolution. It does so avoiding use of BrainGlobe tools, and using minimal other dependencies. It is intended to showcase the use of atlas data directly.

This tutorial will guide through the steps needed to extract the volume of the retrosplenial area (RSP) in the Allen Mouse Brain Atlas, programmatically - but without using BrainGlobe. To do this, we need some preliminary knowledge about how BrainGlobe atlases such as this are structured under the hood:

To compute the volume of RSP we will therefore

  • access the terminologies file

  • determine the id of the RSP region and all its children from the terminologies file

  • access the annotation file

  • count how many pixels in the annotation file correspond to any of the ids

  • convert the pixels to cubic millimeters

Now the conceptual part is covered, let’s dive into the code:

We start by importing some Python libraries that we will need

import dask.array as da
import ngff_zarr as nz
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import colormaps as cm

Next, we use pandas to access the terminologies files using its S3 URI.

s3_bucket_stub = "s3://brainglobe/atlas/{}"
annotation_uri = s3_bucket_stub.format("annotation-sets/allen-adult-mouse-annotation/2017/annotation.ome.zarr")
terminologies_uri = s3_bucket_stub.format("terminologies/allen-adult-mouse-terminology/2017/terminology.csv")

terminologies_df = pd.read_csv(terminologies_uri, storage_options={"anon": True})

By printing the dataframe, we can observe that the terminologies file contains one row per region, with the first column (index 0) containing the region abbreviation, and the second column (index 1) containing the region ID:

terminologies_df.head()
identifier parent_identifier annotation_value name abbreviation color_hex_triplet root_identifier_path
0 997 NaN 997 root root #FFFFFF [997]
1 8 997.0 8 Basic cell groups and regions grey #BFDAE3 [997, 8]
2 567 8.0 567 Cerebrum CH #B0F0FF [997, 8, 567]
3 688 567.0 688 Cerebral cortex CTX #B0FFB8 [997, 8, 567, 688]
4 695 688.0 695 Cortical plate CTXpl #70FF70 [997, 8, 567, 688, 695]


We know that all child regions of RSP have an abbreviation starting with “RSP”, which we can use to identify our IDs of interest and store them in a list called rsp_ids.

terminologies_filtered = terminologies_df[terminologies_df["abbreviation"].str.startswith("RSP")]

rsp_ids = terminologies_filtered["annotation_value"].tolist()

Equipped with this information, we can now access the annotations file for the atlas. Annotation files are stored in an OME-zarr file in the cloud.

annotations = nz.from_ngff_zarr(annotation_uri, storage_options={"anon": True})

print(annotations.metadata)
Metadata(axes=[Axis(name='z', type='space', unit='millimeter', orientation={'type': 'anatomical', 'direction': 'anterior-to-posterior'}), Axis(name='y', type='space', unit='millimeter', orientation={'type': 'anatomical', 'direction': 'superior-to-inferior'}), Axis(name='x', type='space', unit='millimeter', orientation={'type': 'anatomical', 'direction': 'right-to-left'})], datasets=[Dataset(path='0', coordinateTransformations=[Scale(scale=[0.01, 0.01, 0.01], type='scale')]), Dataset(path='1', coordinateTransformations=[Scale(scale=[0.025, 0.025, 0.025], type='scale')]), Dataset(path='2', coordinateTransformations=[Scale(scale=[0.05, 0.05, 0.05], type='scale')]), Dataset(path='3', coordinateTransformations=[Scale(scale=[0.1, 0.1, 0.1], type='scale')])], coordinateTransformations=None, omero=None, name='image', type=None, metadata=None)

We can see from the metadata that the zarr contains multiple resolution levels (a pyramid). For this tutorial, we will use the highest resolution level (level 0).

pyramid_level = 0
annotation_image = annotations.images[pyramid_level]

print(annotation_image)
NgffImage(data=dask.array<from-zarr, shape=(1320, 800, 1140), dtype=uint32, chunksize=(83, 100, 143), chunktype=numpy.ndarray>, dims=('z', 'y', 'x'), scale={'z': 0.01, 'y': 0.01, 'x': 0.01}, translation={'z': 0.0, 'y': 0.0, 'x': 0.0}, name='image', axes_units={'z': 'millimeter', 'y': 'millimeter', 'x': 'millimeter'}, axes_orientations=None, computed_callbacks=[])

By plotting a slice of the array contents, we can see the various regions encoded by integer values:

# Get the middle section and plot
middle_section = annotation_image.data.shape[0] // 2

# Create a cyclic colormap due to the high values in the Allen atlas
N = 512
colors = cm.get_cmap('tab20').resampled(N)
lut = colors(np.arange(N))

# Map label image to lookup table and plot
plt.imshow(lut[annotation_image.data[middle_section,:,:] % N])
explore atlas
<matplotlib.image.AxesImage object at 0x7fbf53526f30>

Combining the annotation data with our RSP IDs allows us to calculate the volume of the RSP, which we finally print. This reaches the goal of this tutorial.

print(annotation_image.scale)
print(annotation_image.axes_units)

num_pixels = da.isin(annotation_image.data[:], rsp_ids).sum().compute()
pixel_size_list = list(annotation_image.scale.values())
pixel_volume = pixel_size_list[0] * pixel_size_list[1] * pixel_size_list[2]  # in cubic millimeters

print(f"RSP has volume of around {np.round(num_pixels*pixel_volume)} cubic millimeters")
{'z': 0.01, 'y': 0.01, 'x': 0.01}
{'z': 'millimeter', 'y': 'millimeter', 'x': 'millimeter'}
RSP has volume of around 11.0 cubic millimeters

In conclusion, this tutorial shows how to programmatically access and process atlas data for the specific purpose of calculating a region volume. More generally, it also constitutes an example for BrainGlobe atlas data being used independently of the BrainGlobe software tools.

Total running time of the script: (0 minutes 28.508 seconds)

Gallery generated by Sphinx-Gallery