1 Abstract

Drug response profiling plays an important role in the development of personalized care of cancer patients. Despite the increasing amount of drug response profiling data, a robust computational data processing pipeline is currently lacking, which can lead to low reproducibility among different studies. In addition, technical artefacts, such as edge effect, have not been adequately addressed and could potentially distorted drug response patterns among samples. In order to tackle above issues, we developed an R/Bioconductor package, DrugScreenExplorer, to streamline the tasks of processing drug response profiling data, including data import, normalization, quality assessment, and reporting. We also included a novel 2D-sigmoid curve fitting method to correct incubation effect. We show the 2D-sigmoid fitting method better captures the shape of the edge effect than current previous method based on local regression. DrugScreenExplorer facilitates and standardize the processing of drug response profiling data and therefore increase the their reproducibility and translational value.

2 Introduction

3 Getting started

DrugScreenExplorer is an R package designed for streamlined processing, quality control and visualizing of high-throughput drug sensitivity screen data.

library(DrugScreenExplorer)
library(tidyr)
library(dplyr)
library(gridExtra)
library(tools)

4 Data import

4.1 Introduction of input file types

For creating the complete drug screen datasets, three types of data are needed:

  1. the raw screening data in text format, which stores the values out from a plate reader;

  2. the well annotation or plate layout, which stores the metadata for all the wells on a screening plate. The most important annotations are drug names and concentrations;

  3. the plate annotation, which stores the metadata for each plate. For example, the sample information, if each plate represents one sample.

The examples for each type of the input files can be found in the test data affiliated with this package:

Raw screening data

system.file("testData/rawData", package = "DrugScreenExplorer")
## [1] "/Users/lu/Library/R/x86_64/4.2/library/DrugScreenExplorer/testData/rawData"

If the text files containing the raw screening data are organized by subfolders, the subfolders will be considered as batches.

Plate annotation file

system.file("testData/plateAnno/plateAnno.tsv", package = "DrugScreenExplorer")
## [1] "/Users/lu/Library/R/x86_64/4.2/library/DrugScreenExplorer/testData/plateAnno/plateAnno.tsv"

The fileName column is mandatory. If the plates do not all have the same layout, for example, several plates are used for the same sample but with different drug collections, a plateID column can be included to indicate the plate index.

Well annotation file

system.file("testData/wellAnno/wellAnno.tsv", package = "DrugScreenExplorer")
## [1] "/Users/lu/Library/R/x86_64/4.2/library/DrugScreenExplorer/testData/wellAnno/wellAnno.tsv"

The wellID column is mandatory. If plates have different layouts, a plateID column can be used as in the plate annotation file.

4.2 Functions for helping create annotation files

Besides create the well and plate annotations files manually, there are two functions can be used to help create the annotation input files.

4.2.1 createPlateInput()

The function creates the framework for the plate annotation input based on the raw data files and folder structure in the raw data folder.

rawFolder <- system.file("testData/rawData", package = "DrugScreenExplorer")
createPlateInput(rawDir = rawFolder,
                 file = "plateAnno.tsv",
                 entries = c("sampleID", "patientID"))

A tab-separated values (tsv) file named “plateAnno.tsv” will be created and it has three columns fileName, sampleID and patientID. The fileName column is automatically filled based on the files in the raw data folder, while sampleID and patientID columns need to be filled manually.

If the text files containing the raw data are grouped in sub-folders, and you want the sub-folders to be considered as batches. The createPlateInput function can also generate the batch information automatically if the batchAsFolder is TRUE.

rawFolder <- system.file("testData/rawData", package = "DrugScreenExplorer")
createPlateInput(rawDir = rawFolder,
                 file = "plateAnno.tsv",
                 entries = c("sampleID", "patientID"),
                 batchAsFolder = TRUE)

In the output plateAnno.tsv file, there’s an additional column batch, which contains the batch information. Notice that the folder names are not used as batch names. Instead, the series of integer numbers will be assigned to batches based on the ordering of the folder names.

4.2.2 createWellInput()

The function creates the framework for the well annotation input.

createWellInput(file = "wellAnno.tsv", colNum = 24, rowNum = 16,
                entries = c("name", "concentration"))

A wellAnno.tsv file will be created for a 384-well (16*24) plate. The wellID is automatically filled by the name and concentration columns need to be filled manual.

If more than one plate layouts for a sample are used in the screen, for example, when one plate is not enough for testing all the drugs, the platePerSample parameter can be used to specify how many different plate layouts are used for each sample.

createWellInput(file = "wellAnno.tsv", colNum = 24, rowNum = 16,
                entries = c("name", "concentration"),
                platePerSample = 2)

In the wellAnno.tsv file, an additional column plateID will be created, indicating the index of the plate layout type. It is important that a plateID column must also be created manually in the plate annotation file, to specify which plate layout a certain plate is using.

4.3 Read in the whole experiment

After preparing the raw data folder, the plate annotation file and the well annotation file, the whole experiment can be read in using the readScreen() function.

plateFile <- system.file("testData/plateAnno/plateAnno.tsv",
                         package = "DrugScreenExplorer")
wellFile <- system.file("testData/wellAnno/wellAnno.tsv",
                         package = "DrugScreenExplorer")
screenData <- readScreen(rawDir = rawFolder, plateAnnotationFile = plateFile,
                         wellAnnotationFile = wellFile, 
                         rowRange = c(3,18), colRange = 2, 
                         sep = "[;\t]",
                         negWell <- c("DMSO","PBS"), posWell = c())
head(screenData)
## # A tibble: 6 × 14
##   wellID rowID colID  value fileName batch sampl…¹ patie…² Mutat…³ Expre…⁴ Group
##   <chr>  <chr> <chr>  <dbl> <chr>    <fct> <chr>   <chr>     <dbl>   <dbl> <chr>
## 1 A001   A0    01    507840 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## 2 A002   A0    02    569760 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## 3 A003   A0    03    487880 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## 4 A004   A0    04     13400 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## 5 A005   A0    05    528920 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## 6 A006   A0    06    244640 CTGLumi… 1     12PB02… P0007         0     2.2 B    
## # … with 3 more variables: name <chr>, concentration <dbl>, wellType <chr>, and
## #   abbreviated variable names ¹​sampleID, ²​patientID, ³​Mutation, ⁴​Expression
## # ℹ Use `colnames()` to see all variable names

A tidy table containing the drug screening result and metadata will be created.
Because each platform can generate different raw data format, the rowRange and colRange function can tell the function the staring and ending place of the numeric data that represent the signal. Different delimiters can be specified by sep and regular expression is supported.
The postie control and negative control wells can be specified by negWell and posWell. The input for negWell or posWell can be either a character vector of wellIDs or drug names if the name column is given in the well annotation file. Wells that are not labeled as positive controls (“pos”) or negative controls (“neg”) will be labelled as “drug” in the “wellType” column in the final output table. This column is important for plate-wise normalization.

The inhibition rate for each well can also be calculated at this stage by specifying “normalization = TRUE”. Currently, two methods are supported: “negatives”, which calculates the inhibition rate by simple dividing each measurement on the plate by the median of the values of negative control wells on the same plate; “both”, which uses the “normalized percent inhibition (NPI)” is used. For an inhibition assay, this method divides the difference between each measurement on a plate and the median measurement of the positive controls on that plate by the difference between the median measurement of negative controls and positive controls on that well.

screenData <- readScreen(rawDir = rawFolder, plateAnnotationFile = plateFile,
                         wellAnnotationFile = wellFile, 
                         rowRange = c(3,18), colRange = 2, 
                         sep = "[;\t]",
                         negWell <- c("DMSO","PBS"), posWell = c(),
                         normalization = TRUE, method = "negatives", discardLayer = 2)
head(screenData %>% dplyr::select(wellID, value, sampleID, name, concentration, wellType, normVal))
## # A tibble: 6 × 7
##   wellID  value sampleID name      concentration wellType normVal
##   <chr>   <dbl> <chr>    <chr>             <dbl> <chr>      <dbl>
## 1 A001   507840 12PB0265 DMSO                0.3 neg       1.14  
## 2 A002   569760 12PB0265 DMSO                0.3 neg       1.28  
## 3 A003   487880 12PB0265 A-1210477          10   sample    1.09  
## 4 A004    13400 12PB0265 Afatinib           10   sample    0.0301
## 5 A005   528920 12PB0265 A-1210477           2   sample    1.19  
## 6 A006   244640 12PB0265 Afatinib            2   sample    0.549

A new column, “normVal”, is created, which contains the normalized values. Inhibition rate can also be calculated later by using normalizePlate() function.

Reading in the screen data and normalization can also be separated into two steps.

  1. Read in the screen data without normalization
#disable normalization by setting "normalization = FALSE"
screenData <- readScreen(rawDir = rawFolder, plateAnnotationFile = plateFile,
                         wellAnnotationFile = wellFile, 
                         rowRange = c(3,18), colRange = 2, 
                         sep = "[;\t]",
                         negWell <- c("DMSO","PBS"), posWell = c(),
                         normalization = FALSE)
  1. Perform normalization
screenData <- normalizePlate(screenData, method = "negatives", discardLayer = 2)

head(screenData %>% dplyr::select(wellID, value, sampleID, name, concentration, wellType, normVal))
## # A tibble: 6 × 7
##   wellID  value sampleID name      concentration wellType normVal
##   <chr>   <dbl> <chr>    <chr>             <dbl> <chr>      <dbl>
## 1 A001   507840 12PB0265 DMSO                0.3 neg       1.14  
## 2 A002   569760 12PB0265 DMSO                0.3 neg       1.28  
## 3 A003   487880 12PB0265 A-1210477          10   sample    1.09  
## 4 A004    13400 12PB0265 Afatinib           10   sample    0.0301
## 5 A005   528920 12PB0265 A-1210477           2   sample    1.19  
## 6 A006   244640 12PB0265 Afatinib            2   sample    0.549

The results should be the same. # Quality control

4.4 Raw count distribution

The DrugScreenExplorer package provides several functions for checking the screen plot. First, we can use the plotRawCount function to plot the distribution of raw count (signal) on each plate, to check whether a certain plate has a strange behavior.

g <- plotRawCount(screenData, ifLog10=TRUE)
plot(g)

We can notice that the plate “CTGLuminescence_20161124_006_P0010_16PB0066” has a rather overall low counts that other plates.

4.5 Raw count distribution for each well type

If the plate layout information has been provided in the “wellType” column, the raw count distribution for each well type on each plate can also be plotted to check whether the raw count distribution for each well type is reasonable, for example, the raw count for negative control wells should be generally higher than sample wells.

g <- plotTypeDist(screenData, ifLog10 = TRUE)
grid.arrange(grobs = g, ncol=3)

The plotTypeDist function can also output a pdf file by specifying the pdfName parameter.

4.6 Plate heatmap plot

A direct way to examine the screen quality is to plot the signal intensity heatmap for each screen. The plotPlate function can be used to plot the heatmaps. From plate heatmap plots, screen artifacts, such as incubation effect, can be spotted.

g <- plotPlate(screenData = screenData, plotPlate = "all", plotType = "viability")
grid.arrange(grobs = g[1:8], ncol = 2)

If plotType = “viability”, the percent inhibition rate relative to negative controls will be used for the heatmap and the screen data must be normalized before hand, i.e the normVal column must be in the screenData object.

For unnormalized values, per-plate z-score can be used for heatmap plot.

g <- plotPlate(screenData = screenData, plotPlate = "all", plotType = "zscore")
grid.arrange(grobs = g[1:8], ncol = 2)

The plotPlate function can also be used to plot the plate layout, by specifying the type parameter as “layout”

g <- plotPlate(screenData = screenData, plotPlate = "all", plotType = "layout")
g[[1]]

4.7 Automatic screen quality report

An html Rmarkdown report on the screen quality can be automatically generated by using the makeReport function. Depend on the annotations included in the screen data tidy table, different quality check items will be included in the report.

makeReport(screenData = screenData, showCode = FALSE, 
           title = "Report for my drug screening project",
           author = "Junyan Lu", ifPlatePlot = TRUE)

An html file will be created in report folder under the working directory.
The information on more customizable options can be found at the manual page for makeReport

5 Adjusting incubation effect

From the plate heatmap plots above, we can notice some obvious artifacts on the screening plates. For example, some wells in a certain region of a plate show significantly higher or lower signal intensity. This is know as incubation effect, because usually those artifacts are caused by technical problems, like solvent evaporation, pipetting issues, and etc.,. If those problematic wells are only located on the edge of the plate, it is also called edge effect.

Plates show typical incubation effect and edge effect.

plotPlates <- c("CTGLuminescence_P0024_14PB0550",
                "CTGLuminescence_P0014_15PB0031")
subScreen <- dplyr::filter(screenData, fileName %in% plotPlates)
g <- plotPlate(subScreen, plotPlate = "all", plotType = "viability")
grid.arrange(grobs=g, ncol=2)

The plate on the left shows typical edge effect, because the wells on the edge of the plate generally show lower values than the well in the middle. On the other hand, the plate on the right show incubation effect, in which left half of the plate shows higher value than the left half.

DrugScreenExplorer package provides two functions fitEdgeEffect and correctEdgeEffect for estimating and correcting incubation effect or edge effect on the plate.

Firstly, the incubation effect can be estimated by using a local regression fit method.

subScreen <- fitEdgeEffect(subScreen, method = "loess", useNeg = TRUE,
                           useLowConcentrations = 1, span = 1)

The estimated incubation effect will be stored in the edgeFactor column.
useNeg = TRUE means only negative control wells are used for incubation effect estimation.
useLowConcentrations = 1 means the lowest 1 concentration of the drugs is also considered as the de facto negative controls and used for incubation effect estimation.
Please refer to the manual page for more explanations of the parameters.

The incubation (edge) effect can be visualized using the plotPlate function by specifying plotType = edgeEffect

g <- plotPlate(subScreen, plotType = "edgeEffect")
grid.arrange(grobs = g, ncol =2)

Next, the incubation effect can be removed from the screen plate by correctEdgeEffect function.

subScreen <- correctEdgeEffect(subScreen, correctMethod = "bliss")

In this example, the edge effect will be removed by Bliss drug combination model, in which the real effect equal observed effect divided by edge factor. The corrected value is stored in a new column normVal.cor
Currently, the estimation and correction of incubation effect can only be performed on normalized values

The corrected values can also be visualized by plotPlate function, by specifying plotType = “viability” and ifCorrected = TRUE

g <- plotPlate(subScreen, plotType = "viability", ifCorrected = TRUE)
grid.arrange(grobs = g, ncol =2)

Now the incubation effect has been largely removed.

Adjust edge effect for the whole screening data

screenData <- correctEdgeEffect(screenData, correctMethod = "bliss")

6 Summarisation of drug effect

screenData <- summariseScreen(screenData, method = c("average","AUC"))

7 Interactive data exploration using Shiny app

Prepare data set for shiny app

makeShiny(screenData, sampleAnnotations = c("batch","sampleID","patientID","Mutation","Expression","Group"))

The functions avaialbe for the shiny app will be dependent on the sample annotations provided by the user.

Run shiny app

The shiny app can be excuted through running “app.R” under “shiny” folder.