Co-culture leukemia high-content image analysis using supervised machine learning

Hayden L Bell

Published: 2023-07-29 DOI: 10.17504/protocols.io.rm7vzxy52gx1/v1

Disclaimer

Not intended for medical purposes. This protocol is not intended to diagnose or treat any medical condition and should not be used for any medical purpose. This protocol is intended for use only in a research capacity.

The author/s accept no responsibility for the accuracy of data resulting from this protocol. The author/s further assume no responsibility or liability for any errors or omissions in the content of this protocol. The information contained in this protocol is provided on an "as is" basis with no guarantees of completeness, accuracy, usefulness or timeliness and without any warranties of any kind whatsoever, express or implied.

Abstract

This protocol describes a non-informatics based approach to high-content image analysis of acute leukemia cells in co-culture with mesenchymal stromal cells (MSCs) using supervised machine learning. The analysis pipeline leverages two powerful, open-source software applications - Cell Profiler and Ilastik. The aim of this protocol is to provide a basic skeleton pipeline for image analysis to, at minimum, determine absolute cell numbers for each cell class from a fluorescence microscopy image of cells stained with a DNA dye.

This protocol is a detailed companion walkthrough for the Github repository available at https://github.com/hayden-bell/Image_Analysis.

Before start

Two open-source applications are required for this image analysis pipeline: Ilastik (https://github.com/ilastik/ilastik) and Cell Profiler (https://github.com/CellProfiler).

This protocol uses the project files from the Github repository available at https://github.com/hayden-bell/Image_Analysis. Download the BaseProject.ilp and BaseProject.cppipe files before starting.

Use high-quality fluorescence microscopy images in a lossless high-resolution file format such as TIF.

Steps

Training a supervised machine learning model (semantic segmentation)

Open the Ilastik software and load the BaseProject.ilp project.

In the Input Data tab, load several different images (up to ~10) for training which are representative of different experimental conditions.

For example, images from positive and negative controls whereby cell number is maximised/minimised.

In the Feature Selection tab, click Select Features... and ensure all 37 features are selected.

In the Training tab, ensure there are three separate Labels/classes in order as:

4.1.

Using the Brush Cursor, manually annotate within several nuclei of each class using the respective Label class. Use the zoom to view the image large to ensure precision in annotation.

Errors can be corrected using the Eraser Cursor and the image contrast can be changed using the Window Leveling tool to better visualise dimmer nuclei.

Example annotation for each semantic segmentation class. PDX, yellow; MSCs, yellow; background, bg.

4.2.

Use the Live Update feature to view a real-time overlay of the probability map for each class over the original training images.

4.3.

Iteratively refine the annotations across the training dataset images until performance is good.

Note

Avoid over-annotating the training dataset in Ilastik as this will result in poorer performance on unseen image data sets since the model will learn characteristics of the test data set and compromise generalisability of the model.

In the Prediction Export panel, select Source: Probabilities.

5.1.

Click 'Choose Export Image Settings...' and ensure the output file is tif format with the axis order yxc.

5.2.

Choose the Output File destination as {dataset_dir}/probabilities/{nickname}_{result_type}.tif

In the Batch Processing tab, click 'Select Raw Data Files...' to import all of the test image data files.

Click 'Process all files' .

Quantifying individual cell nuclei (instance segmentation)

Open the Cell Profiler software and import the BaseProject.cppipe pipeline (File > Import > Pipeline from File...) .

In the Images module, load the probability map images generated from the Ilastik project.

Note: do not load the original images at this step.

10.

Optional: In the Metadata tab, regular expressions (regex) can be used to extract meaningful data from each image filename such as plate id, well id, etc.

By default, the pipeline will attempt to extract the well id of each image in the format A1 or A01.

11.

Optional: Outlines of how well the pipeline identifies individual PDX or MSC nuclei can be visualised using the OverlayOutlines module. Select the checkpoint to enable this module and save the output by using the SaveImages module.

12.

In the ExportToDatabase module, modify the Experiment name and SQLite database filename to better identify the experimental data output.

Note

The default output location can be modified by clicking the 'Output Settings' button.

13.

Click Analyze Images to process the imported dataset and export data as SQLite database format.

Reading the data output

14.

Data can be read using any database software application which can open SQLite file format.

Data can be retrieved from the _[Experiment name]Per_Image data table.

Recorded data include:

Predicted PDX nuclei counts (Image_Count_LeukaemicNuclei)
Predicted MSC nuclei counts (Image_Count_MSCNuclei)
Image file name (Image_FileName_CyQ)
Image well location (Image_Metadata_Well)
Plus any additional data exported from separate modules or metadata extractions.

Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R and GenePattern Notebook implementations of CoGAPS