- Docker (to build and run pipeline)
- Output from SBU BMI Cancer and Lymphocyte prediction pipelines
Once you clone/fork/download this repository, set it as your working directory and build the container as below
git clone https://github.com/SBU-BMI/til_align.git cd til_align docker build -t til_analyses . You can then call the alignment and analytics functions.
The functionality for this repository is split into two functions, callAlign.sh and callAnalytics.sh. They should be run in this order, as callAnalytics.sh requires output from callALign.sh. Please see details below. You can also call their BaSH level help functions using the -h flag after building the container.
This script calls the alignment portion of the pipeline. Assuming appropriate input it will:
- Read and parse cancer and lymphocyte prediction probabilities
- Threshold data into yes/no calls based on algorithm specific thresholds
- Overlay lymphocyte and cancer maps
- Return invasion metrics
- You will need a base directory with two subfolders, each containing the outputs of our prediction pipelines (either txt or json). Note: If the subdirs are not named exactly like below, you will need to specify the folder name in the align call (see help for how to update path)
- tilPreds
- cancPreds
- If the file names within the directories are exactly the same, you don't need more information. If they vary, you will need
- A csv in in basedir, specified with a -s flag in the align call, with
- Col 1 is sample name
- Col 2 is lymph prediction file name for that sample
- Col 3 is cancer prediction file name for the same sample
- For a sample basedir, look at "data_for_sample_run/"
- A csv in in basedir, specified with a -s flag in the align call, with
- Histogram of Percent Invasion
- A csv with 7 columns
- Sample Name: Col 1 in sampFile or file name
- n_Canc_patch: total n of cancer patches in WSI
- n_TIL_patch: total n of lymph patches in WSI
- n_TIL_patch_overlap: total n of patches containing both Canc and TIL
- percent_pos: n_TIL_patch_overlap / n_Canc_patch
- scaled_PP: percent_pos / standard deviation of percent_pos
- TIL_Class: Binned low vs high around mean invasion value
- -t -- tilPreds: "tilPreds" ## A directory of lymph predictions (see below)
- -T -- tilThresh: .5 ## The probability threshold that separates a Lymph call from a no-lymph call (Inception, 0.1; VGG, 0.42; ResNet, 0.5, overwritten by -a)
- -c -- cancPreds: "cancPreds" ## A directory of cancer predictions (see below)
- -C -- cancThresh: .5 ## The probability threshold that separates a cancer call from a no-lymph call (0.5)
- -s -- sampFile: "" ## Optional file with structure broken down below (to be used if file names differ between canc and lymph)
- -o -- outputFile: "Percent_Invasion.csv" (file name for csv)
- -O -- outputDir: "outputs" ## folder within mounted volume to save all outputs
- -w -- writePNG ##If passed as a flag, will make a folder PNGs/ within outputDir and write thresholded pngs for all images
- -h -- help ## Print help
To see help, run
docker run til_analyses callAlign.sh -h To run with default parameters (only works if all prediction files have exact same names), run
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAlign.sh If you know the algorithm your lymphocyte predictions were made using, run:
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAlign.sh -a [First letter of algorithm (i,v, or r)] If you need to specify sample pairs between Lymph and Canc predictions AND want to write the overlaid maps, run
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAlign.sh -s /path/to/csv (can be relative) -w This script will take the invasion metrics calculated by callAlign.sh and calculate:
- Descriptive statistics:
- Overall Invasion distribution
- Continuous invasion distribution faceted by variables of interest
- Invasion class calls faceted by variables of interest
- Survival correlations:
- Univariable Kaplan-Meier and Cox regression of class calls
- Univariable Cox regression of scaled invasion (Percent Invasion scaled by standard deviation)
- Bivariable KM of class calls and variables of interest
- Bivariable Cox regressions of scaled invasion and variables of interest
- Fully modified Cox regression using scaled invasion and all variables of interest
You will need a csv with the following columns
-
scaled_PP (output from callAlign)
-
TIL_Class (output from callAlign)
-
variables of interest (can be any valid column name, code will grab all)
- Note: this code will use ALL columns (excludes survival columns from descriptive stats), so please trim any excess columns before running
-
Column of 0s and 1s indicating outcome censor status names survCensor
- This code expects right censored data, with 0's indicating censor and 1's indicating event
-
Column of time to event (numeric), whatever your endpoint of choice may be, named survTime
- You may use different colnames for censor and time, but they must be specific in the call (see samples)
- -p -- csvPath: "" ## Path to csv to analyze (will assume it is in basedir [/data] of mounted volume)
- -s -- excludeSurv ## If passed, code will skip survival analyses portion
- -t -- survTime: "survivalA" ## Column name in csv with time to event
- -c -- survCensor: .5 ## Column name in csv indicating if patient is censored (0) or has an event (1)
- -h -- help ## Print help
- A pdf or html of descriptive statistics and survival correlations (if requested)
- A csv of stratification metrics
- Do we want to write test outcomes for each plot?
- Do we want to write Hazard ratios and confidence intervals for each group?
To run using all defaults, run
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAnalytics.sh -p CSVFILENAME To run with different survival column names
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAnalytics.sh -p CSVFILENAME -c censorColname -t timeColName To run without survival information
docker run -v /PATH/TO/BASEDIR:/data til_analyses callAnalytics.sh -p CSVFILENAME -s FALSE Happy analyzing! Please report any issues you may find through the issues tab.