LI Detector Analytical Pipeline
Saurin B Parikh
beneficial
normalization method
genetic screen
phenomics
microbiology
yeast
genomics
high-throughput
screening
Abstract

The LI Detector framework consists of integrated experimental and analytical pipelines . A. The pin-copy-upscale experimental pipeline from frozen glycerol stocks (top) to imaging (bottom). Each box represents a pinning step, and the steps within the sky-blue highlighted portion can be repeated until the desired colony density is reached. Illustrations to the right of the flowchart is a simplified representation of four experimental plates. A reference population (grey) is introduced on every plate during the first upscale step. The analytical pipeline uses this population for spatial bias correction and relative fitness estimations for the mutant strains of interest (purple). B. Workflow of the analysis pipeline where columns from left to right represent user inputs, analytical steps, and outputs. User inputs consist of raw colony size estimates and the strain layout of the plates. The analytical pipeline performs: i) local artifact correction, ii) source normalization, iii) reference-based background colony size estimation using a 2-dimensional linear interpolation, iv) corrects for spatial bias by dividing the local artifact corrected colony sizes with the background colony sizes and provides a measure of relative fitness, and iv) assigns empirical p-values using the reference strain relative fitness distribution. The outputs include local artifact corrected colony sizes, background colony sizes, spatially corrected relative fitness, and mutant strains identified as having a mean colony size that is significantly larger or smaller than the reference strain.
Before start
LI Detector analytical pipeline can only be applied to experiments conducted in accordance to the LI Detector experimental pipeline. Please refer to the LI Detector manuscript for best practices on conducting the colony-based high-throughput experiment.
Steps
Files
Plate maps of the starting density plate
- A .xlsx file with one plate per sheet
- Cells contain strain-id
- Example
Table specifying strain-id to orf-name relationship
- A .xlsx file containing unique strain_id to each orf_name
- First column is strain_id
- Second column is orf_name
- Each strain_id from Step 1 should have an associated orf_name
- Example
*orf_name variable is used for names of the mutants in the experiment.
Download LID and dependencies
Dependencies:
- Install Database Toolbox from the APPS > Get More Apps option within MATLAB
- Download and unzip mysql connector JDBC driver from here.
Download LID and associated scripts from Github in your MATLAB folder.
~$ cd MATLAB
~/MATLAB$ git clone https://github.com/sauriiiin/Matlab-Colony-Analyzer-Toolkit.git
~/MATLAB$ git clone https://github.com/sauriiiin/bean-matlab-toolkit.git
~/MATLAB$ git clone https://github.com/sauriiiin/lidetector.git
~/MATLAB$ git clone https://github.com/sauriiiin/sau-matlab-toolkit.git
Make LID bash scripts executable.
~/MATLAB$ cd lidetector
~/MATLAB/lidetector$ chmod +x initialize.sh
~/MATLAB/lidetector$ chmod +x buildraw.sh
~/MATLAB/lidetector$ chmod +x lid.sh
Initialize
Information to keep in hand before proceeding:
- MySQL credentials - username, password, database name
- Name of experiment - this will be used as a prefix for all the tables that will be generated
- Upscale patterns from the experiment - ie in what combinations were the lower density plates condensed to form the higher density plates
- Name (orf_name) of reference strain used
- File path to plate map .xlsx file from Step 1
- File path to the strain_id to orf_name .xlsx file from Step 2
Execute the initialize bash script from within the lidetector folder.
~/MATLAB/lidetector$ ./initialize.sh
Successful run will create the following tables
- _pos2coor = position ids and their corresponding plate coordinate (density, plate number, column number and row number).
- _pos2orf_name = position ids and the corresponding orf-name
- _pos2rep = position ids of lowest density plates to their replicates at higher density plates based on the upscale pattern
- _pos2strain_id = position ids and their corresponding strain ids
- _strainid2orf_name = same as table from Step 2
Example files can be found in Data.zip.
Colony Size Data
Organize colony size estimations from your favorite colony size estimator, like the MATLAB Colony Analyzer Toolkit (MCAT), in ascending order of hours, plate number, column number, row number.
Below is the structure of such a file. Here image1,2,3 are pixel counts from 3 different images of the same plate. Average column consists of the average pixel count of image1,2,3.
A | B | C | D | E |
---|---|---|---|---|
hours | image1 | image2 | image3 | average |
Combine the above table with positions ids from _pos2coor table using the below command.
~/MATLAB/lidetector$ ./buildraw.sh
Successful completion of this command will generate:
- _RAW = raw colony size estimations per hour per position id of all the images
- _ smudgebox = position ids to be excluded from analysis that correspond to the user defined coordinates
- _JPEG = clean version of the raw table with border colonies, colonies corresponding to the smudge box and those colonies with pixel count of less than 10 NULL'd
Example files can be found in Data.zip.
Users can skip step 8 & 9 to use LI Detector's imageanalyzer function if they choose to utilize MCAT as their desired tool for colony size estimation.
LID: imageanalyzerSkip this step if you have successfully executed step 8 & 9.
Spatial Bias Correction
Information to keep in handy before proceeding:
- Path to MALTAB directory
- Path to lidetector directory
- Path to where the JDBC driver was unzipped from Step 3
Execute the LI Detector
~/MATLAB/lidetector$ ./lid.sh
Successful run will create the following tables:
-
_NORM = position ids and their corresponding relative fitness measurements along with the background pixel count measurement based on references
-
_FITNESS = similar to _NORM but with strain ids and orf-names included
-
_FITNESS_STAT = strain-id-wise mean, median and standard deviation of relative fitness
-
_PVALUE = strain-id-wise empirical p-values where stat = (strain mean fitness - reference mean fitness)/reference fitness standard deviation
es = (strain mean fitness - reference mean fitness)reference mean fitness
Example files can be found in Data.zip.