Skip to the content.

scDIV

Introduction

This documentation gives an introduction and user manual of scDIV (acronym of the Single Cell RNA sequencing data Demultiplexing using Interindividual Variations) an R package to use inter-individual differential co-expression patterns for demultiplexing the pooled samples without any extra experimental steps.

Easy Installation

  1. Install the R (LINK)
  2. Install the free version of rStudio (LINK) [optional step]
  3. Run the following command in R/rStudio to install scDIV as an R package:
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
    
library(devtools)
install_github("isarnassiri/scDIV")

All dependency packages automatically will be downloaded, installed and loaded from CRAN-like repositories. We only tested scDIV in the R version 4.1.3 environment. You need to have root permission for this distribution, including the installation of any package.

You can find sample input files in system.file("extdata", package = "scDIV") folder.

Step 1: Infer genetic variants from scRNA-seq data

cellsnp-lite is used to pileup the expressed alleles in single-cell data, which can be directly used for donor deconvolution in multiplexed single-cell RNA-seq data and assign cells to donors without genotyping reference (LINK).

cellsnp-lite gets bam file, list of barcodes as variable inputs, and a Variant Call Format (vcf) file listing all candidate SNPs (regionsVCF) as backend input variable:

cellsnp-lite -s possorted_genome_bam.bam -b barcodes.tsv.gz -O FOLDER-NAME -R regionsVCF -p 22 --minMAF 0.05 --minCOUNT 10 --gzip 

cellsnp-lite generates a vcf file including called genetic variants as follows (Figure 1):

Figure 1
Figure 1. Example of vcf file generated by cellsnp-lite.

Step 2: Demultiplex pooled samples using genetic variants

We use Vireo (Variational Inference for Reconstructing Ensemble Origin) for donor deconvolution using expressed SNPs in multiplexed scRNA-seq data (LINK).

Vireo gets variants info file provided by cellsnp-lite as an input:

vireo -c input-vcf-file -o output-folder --randSeed 2 -N Number-of-donors -t GP  

We use “donor_ids.tsv” file from outputs of Vireo for downstream analysis (Figure 2).

Figure 2
Figure 2. Example of output from Vireo.

Step 3: Generate gene-cell count matrix

Raw count data from 10X CellRanger (outs/read_count.csv) or other single-cell experiments has the gene as a row (the gene name should be Ensembl gene ID) and the cell as a column. You can convert an HDF5 Feature-Barcode Matrix (LINK) to a gene-cell count matrix using the cellranger mat2csv (LINK) command provided by 10Xgenomics. The cells in the read_count.csv file are from the filtered feature-barcode matrix generated by the cell ranger.

cellranger mat2csv filtered_feature_bc_matrix.h5 read_count.csv

You can see an example of gene-cell count matrix in figure 3:

Figure 3
*Figure 3. Example of gene-cell count matrix file generated by mat2csv. Each column represent a borcode (cell). *

Step 4: Gene Expression Recovery

In this step, we start to use a function from scDIV package. We use GeneExpressionRecovery() function for Gene Expression Recovery as follows. The GeneExpressionRecovery() function uses SAVER (single-cell analysis via expression recovery), an expression recovery method for unique molecule index (UMI)-based scRNA-seq data to provide accurate expression estimates for all genes in a scRNA-seq profile.

library("scDIV")
csQCEAdir <- system.file("extdata", package = "scDIV")
Donors='donor6_donor2'
FC='FAI5649A17'
GeneExpressionRecovery( InputDir = csQCEAdir, Donors = Donors, FC = FC )

You can find the results in the SAVER/ folder with ‘AssignedCells.txt’ and ‘AllCells.txt’ extensions. You can see an example of transformed gene-cell count matrix in figure 4.

Figure 4
Figure 4. Example of gene-cell count matrix file generated by SAVER.

Step 5: Inter-individual Differential gene Correlation Analysis (IDCA)

The IDCA() function uses correlation coefficients and performed Inter-individual Differential gene Correlation Analysis (IDCA) for two donors (D1 and D2) and genes (G1 and G2).

library("scDIV")
ERP = "donor6_donor2_FAI5649A17_AssignedCells.txt"
Donors='donor6_donor2'
FC='FAI5649A17'
InputDir = system.file("extdata", package = "scDIV")
IDCA( InputDir, Donors, FC, ERP, TEST = T )

You can find the results in the IDCA_Analysis/ folder with “IDCA.txt” extention. Figure 5 represents an example output of IDCA().

Figure 5
Figure 5. Top differentially correlated gene pairs as well as their differential correlation statistics and correlation category (e.g., +/+). AIC and TPrate stands for “Akaike information criterion” and “true positive rate” which are indicators of gaussian mixture model clustering performance to replicate the cluster of cells per donor assigned by genetics demultiplexing.

Notes

  1. The following script can be utilized to find all donor combinations and execute the function accordingly.
List_Donors <- c('Donor1', 'Donor2', 'Donor3')

for(Donor_Pair in apply( combn(List_Donors, 2), 2 , paste , collapse = "-" ))
{
  print(Donor_Pair)
  IDCA( InputDir, Donor_Pair, FC, ERP, TEST = T )
}
  1. To run the IDCAvis() function without genotype demultiplexing, you need to create donor_ids.tsv file in the running directory of the package incluidng barcodes, donor name, and best guess for unassigned cells as follows (Figure 6):
Figure 6
Figure 6. Example of output for running the function without genotype demultiplexing.

Step 6: Visualization of IDCA outputs

The IDCAvis() function visualizes the outputs of IDC analysis.

library("scDIV")
InputDir = system.file("extdata", package = "scDIV")
IDCAvis( InputDir )

You can find the results in the IDCA_Analysis/IDCA_Plots/ as pdf file(s) (Figure 7).

Figure 7
Figure 7. Example of output from IDCAvis() function.

Step 7: Expression Aware Demultiplexing per Donor Pair

The EADDonorPair() function uses inter-individual differential co-expression patterns for demultiplexing per donor pair.

library("scDIV")
InputDir = system.file("extdata", package = "scDIV")
EADDonorPair( InputDir )

You can find the results in the IDCA_Analysis/Expression_Aware_Cell_Assignment/ folder called “Expression_Aware_Cell_Assignment.txt” (Figure 8).

Figure 8
Figure 8. Example of output from EADDonorPair() function. For an indicated pair of the donors (columns Donor1 and Donor2), EADDonorPair() uses a top differentially correlated gene pair (e.g., XIST and RPS27) and gaussian mixture model clustering to assign each cell (BARCODE column) to a donor (predicted_clusters column). In addition, we append the results of genetic demultiplexing (donor_id, prob_max, prob_doublet, n_vars, best_singlet, best_doublet) and flow cell ID (FC).

Step 8: Expression Aware Demultiplexing per sample pool

The EADPoolSmaple() function uses inter-individual differential co-expression patterns for demultiplexing the pooled samples:

library("scDIV")
InputDir = system.file("extdata", package = "scDIV")
EADPoolSmaple( InputDir )

You can find the results in the IDCA_Analysis/Expression_Aware_Cell_Assignment/ folder called “Results_Expression_Aware_Cell_Assignment.txt” and “Summary.txt”.

For instance as you can see in figure 9, for an indicated cell (BARCODE column) (e.g., AAACCTGGTTTGACTG-1), we consider all possibilities, and most of the time the cell is assigned to the donor 7 (predicted_clusters column). We confirm that the cell belongs to donor-7 (EA_Assignment column) if we successfully assign it to donor-7 (NumAssignedtoDonor column) for an equal or greater number of all pairs of donors (NumDonorPairs column) minus 1.

Figure 9
Figure 9. Example of output from EADPoolSmaple() function. Please see text for more details.

Citation

Isar Nassiri, Benjamin Fairfax, Andrew J Kwok, Aneesha Bhandari, Katherine Bull, Angela Lee, Yanxia Wu, Julian Knight, David Buck, Paolo Piazza. Demultiplexing of Single Cell RNA Sequencing Data using Interindividual Variation in Gene Expression.

History

Release v0.1.1 (25/01/2023) A completed version for all planned features.

Quick Resources

Latest version on GitHub (LINK)

Issue Reports

If you find any error or suspicious bug, we will appreciate your report. Please write them in the GitHub issues (LINK)