Mice dorsal striatum scRNA-seq analysis
1 Introduction
We’ll perform single cell RNA-seq analysis on mice dorsal striatum tissue. These data include a subset of cells studied on ‘Molecular Architecture of the Mouse Nervous System’ (Zeisel et al. 2018). The data have been taken from this repository stored in PanglaoDB, a web repository of scRNA-seq data (Franzén, Gan, and Björkegren 2019). These data were obtained with 10x Chromium protocol sequenced by Illumina HiSeq 2500.
The main workflow is also heavily inspired by Roman Hillje github repo scRNA-seq analysis workflow (Hillje 2020).
1.1 Dependencies
First of all we load all necessary dependencies plus some useful plotting functions stored in another file. We’ll mainly use the Seurat package (Stuart et al. 2019), which focuses on single cell QC and explorative analysis. More or less we’ll follow the standard Seurat pipeline (see Seurat tutorial) with some additions.
library(SingleCellExperiment)
library(Seurat)
library(scDblFinder)
library(Matrix)
###
library(cluster)
library(tidyverse)
library(here)
library(patchwork)
library(ggExtra)
library(ggrepel)
library(gghalves)
library(patchwork)
library(viridis)
library(cowplot)
# my own useful functions - see funs.r
source("R/funs.R")
1.2 Data loading
To load the already prepared count table, we downloaded the Rdata file containg the sparse matrix from PanglaoDB. Then, we load it as a standard R object. In this matrix rows are gene, columns are cells and counts are not normalized. To only keep gene symbol we’ll substitute everything after “ENS”. Furthermore, we also filter this matrix with the same filters shown in the Panglao repo (counted reads > 1000 and genes expression over all cells > 0)
load(here("data", "SRA667466_SRS3059955.sparse.RData"))
rownames(sm) <- gsub("_ENS.*", "", rownames(sm))
sm_red <- sm[, colSums(sm) > 1000]
sm_red <- sm_red[rowSums(sm_red) != 0, ]
There are 3 duplicates genes, but Seurat will take care of that. The full matrix has these dimensions: 26190, 11591 (genes, cells). The panglao reduced one has 25618, 3444 (as in the dataset description).
After that, we use the raw counts to initialize a Seurat object. This type of object has useful slots and methods that will be useful in further analysis. Everytime a Seurat object is initialized a metadata slot is also created. It can be accessed through Seurat[[]]
or Seurat@meta.data
. This is basically a dataframe containing useful information about the cells in our dataset. We’ll append two columns to this df which are percentage of mitochondrial reads for each cell (percent.mt) and novelty (computed as number of genes / number of reads). This measures range from 0 to 1 and is related to how deep we sequenced. Low values means that every new read will probably will be assigned to an already discovered gene whereas high values would mean that for each new reads a new gene will be discovered. This can be useful in detecting low complexity cells (few expressed genes that are easily saturated). We do the same fore the panglao reduced data. To ease our laptop life we also remove the raw matrix from the environment
sm_seurat <- CreateSeuratObject(counts = sm, project = "sc_proj")
sm_seurat[["percent.mt"]] <- PercentageFeatureSet(sm_seurat, pattern = "^mt-")
sm_seurat[["novelty"]] <- sm_seurat[[]]$nFeature_RNA/sm_seurat[[]]$nCount_RNA
panglao <- CreateSeuratObject(counts = sm_red, project = "panglao")
panglao[["percent.mt"]] <- PercentageFeatureSet(panglao, pattern = "^mt-")
# there are only 17 annotated genes but there are 37 genes in the mt
panglao[["novelty"]] <- panglao[[]]$nFeature_RNA/panglao[[]]$nCount_RNA
rm(sm, sm_red)
2 Data pre-processing
To preprocess single cell RNA-seq data on Seurat we’ll follow the standard pipeline, but we’ll also try to analyze the doublets influence on our dataset. The main steps will be:
- Doublet analysis
- QC analysis
- Counts normalization
- Detection of highly variable genes
- Gene scaling
2.1 Doublet analysis
In order to remove doublets from our dataset we follow the same procedure as in (Hillje 2020) using the ScDblFinder
package (Germain 2020). This package finds doublets in single cell data by creating artificial doublets and looking at their prevalence in the neighborhood of each cell.
This function is inspired by (Hillje 2020).
dbl <- invisible(colData(scDblFinder(as.SingleCellExperiment(sm_seurat), verbose = FALSE)))
sm_seurat$multiplet_class <- dbl$scDblFinder.class
sm_seurat$doublet_score <- dbl$scDblFinder.score
rm(dbl)
doublet_analysis(sm_seurat@meta.data, col = c("#ffc313", "#c4e538"))
We can see that the proportion of inferred doublets in our dataset is 10.84% of the cells. Also the different distribution of qc covariates is evident.
We then remove the doublet cells.
2.2 QC analysis
QC analysis will be performed on 4 main covariates:
- Number of counts per cell
- Number of genes per cell
- Mitochondrial fraction of genes
- Novelty
The filtering is made by playing with thresholds of these covariates. Cells with few genes and high mito percentage may be dead cells with cytoplasmic mRNA leaked out. Whereas cells with exceptionally high number of reads may be doublets. Low novelty may represent low complexity cells as blood red cells. Nevertheless, we must be careful with these threshold as we could exclude informative cells population such as cells involved in respiratory processes (high mito) or quiescient populations (low reads) (Luecken and Theis 2019).
We can examine the distribution of these covariates with few different visualizations.
cont <- list(scale_color_viridis(discrete = FALSE), guides(color = guide_colourbar(barwidth = 0.2,
barheight = 10, title.position = "left", title.theme = element_text(angle = 90,
hjust = 0.5))))
scatter_full1 <- scatter_plot(sm_seurat@meta.data, "nCount_RNA", "nFeature_RNA",
"percent.mt") + cont
scatter_full2 <- scatter_plot(sm_seurat@meta.data, "nCount_RNA", "percent.mt", "nFeature_RNA") +
cont
scatter_full1 | scatter_full2
We can observe that, as expected, the number of genes and the number of transcritpts are positevely correlated. However, there seems to be two main trajectories. This could be due to Neurons (our majority class presumably) vs all other cells. Also, we can see a small population of cells with low novelty values (lots of reads and few genes), this may be due to low complexity cells.
scatter_panglao1 <- scatter_plot(panglao@meta.data, "nCount_RNA", "nFeature_RNA",
"percent.mt") + cont
scatter_panglao2 <- scatter_plot(panglao@meta.data, "nCount_RNA", "percent.mt", "nFeature_RNA") +
cont
scatter_panglao1 | scatter_panglao2
The Panglao data show more or less the same trend, however many cells (nCount<1000) were filtered out.
We can also visualize QC covariates distribution with violin plots
t <- list(theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank()),
scale_color_manual(values = "#FDA7DF"), scale_fill_manual(values = "#B53471"))
p1 <- vln_plot(sm_seurat@meta.data, "orig.ident", "nCount_RNA", log = T, jitter = T) +
t
p2 <- vln_plot(sm_seurat@meta.data, "orig.ident", "nFeature_RNA", log = T, jitter = T) +
t
patch1 <- p1 | p2
patch1 + plot_annotation(tag_levels = "A")
We can also visualize their distribution with a parallel coordinates plot. This kind of plot is useful to visualize how may covariates change with respect to one another. Given that ther’s no grouping in the data, colors are given based on the quartiles of the first covariates in order to bettere visualize their relationships. Furthermore, this function is not performed on all the cells given that it’s reallly slow. Maybe using base r graphics or other optimized plotting libraries might help. Features are scaled from 0 to 100 in order to have the same range.
p3 <- par_coor(head(sm_seurat@meta.data, 1000))
p4 <- par_coor(head(panglao@meta.data, 1000))
patch2 <- p3 | p4
patch2 + plot_annotation(tag_levels = "A")
We can more or less draw the same conclusions as before.
In order to decide the filters we wrote a high level function that analyse the dataset tryin to capture all its possible interesting features. (Heavily inspired by (Luecken and Theis 2019))
2.3 Final datasets
After visualizing this variables we decided to continue the analysis on three different datasets.
- Loosely filtered
- Strictly filtered
- Panglao criteria
loose <- subset(sm_seurat, subset = nFeature_RNA < 4000 & nFeature_RNA > 50 & nCount_RNA >
70 & nCount_RNA < 10000 & percent.mt < 18 & novelty > 0.3)
strict <- subset(sm_seurat, subset = nFeature_RNA < 3000 & nFeature_RNA > 100 & nCount_RNA >
1000 & nCount_RNA < 8000 & percent.mt < 12 & novelty > 0.4)
rm(sm_seurat)
So we obtain three datasets with these abundance of cells:
- Loose: 7441 cells
- Strict: 2482 cells
- Panglao: 3444 cells
2.4 View
Loose
prelim_analysis(loose, depth_threshold = c(70, 10000), genes_threshold = c(50, 4000),
mit_threshold = 18, novelty_threshold = 0.3)
Loose filters are:
- number of genes between 50 and 4000
- number of reads between 70 and 10000
- mito percentage < 18%
- novelty > 0.3
2.5 Counts normalization
To normalize counts we’ll follow the standard Seurat procedure using NormalizeData()
function. By default this function normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default). This normalization method is basically ‘counts per million’ normalization as done in bulk exoression analyisi. After this the counts are log-transformed (log(x+1)). This is useful because it’s the canonical way in which cheanges in epression are measured, reduce the skewness of the data and mitigate the mean variance relationship (Luecken and Theis 2019).
Normalized data are in seurat[['RNA']]@data
whereas raw counts are in sm_seurat[['RNA']]@counts
2.6 Detection of highly variable genes
We then perform feature selection, this means trying to decide which are the most informative genes. We’ll use the Seurat function FindVariableFeatures()
with default options, only increasing the final number of genes from 2000 to 3000. This function models the mean and variance relationship of each gene.
2.7 Results
2.8 Gene scaling
Finally, we scale the counts so each gene has mean across cells is 0 and variance across cells is 1. This is a debated step as dimensionality reduction methods like PCA often requires it so to not being excessively driven by genes with higher ranges of expression. However, by doing this we assume that the magnitude of difference in expression is not informative (Luecken and Theis 2019).
We’ll do that only on the genes we previously found. Otherwise our laptop will have a hard time and after all our further analysis won’t be affected.
The results are in Seurat[['RNA']]@scale.data
.
3 Principal Component Analysis
Principal Component Analysis (PCA) is often used to reduce dimensionality of the dataset. Reducing the dimensions of the original matrix is essential as single cell data are inherently low-dimensionals (Heimberg et al. 2016). PCA is a linear algorithm to do this, therefore, it’s not suited for visualizing these kind of data in low dimensions. However, it is often used to summarize data to perform further analysis, as in this case. In fact, the next steps will be computed on N reduced dataset, where N is the number of top principal components kept. This parameter is commonly estimated through elbow technique.
Furthermore, PCA is also used for visualization, given that it can project multidimensional features spaces in 2 or 3 PCs. Even if it does not perfrom well on this kind of data it’ might be ’s a good idea to visualize the first dimensions.
loose <- RunPCA(loose, features = VariableFeatures(object = loose))
strict <- RunPCA(strict, features = VariableFeatures(object = strict))
panglao <- RunPCA(panglao, features = VariableFeatures(object = panglao))
3.1 Results
Loose
Strict
Panglao
From now on we’ll use the first 15 PCs, this was decided through some trials. Anyway, there seems to be agreement that is better to overestimate rather than using too few PCs.
4 Clustering
To cluster our cells we’ll follow the Seurat functions FindNeighbors()
and FindClusters()
. Briefly, FindNeighbors()
computes clustering starting from a KNN graph (with euclidean distances) in the reduced PCA spaces (15 dimensions). Then, each edge of the graph is weighted based on the overlap of local neighbors (using Jaccard similarity)
Finally, the FindClusters()
function by default implements the Louvain algorithm to iteratively group cells together. This function has a resolution
parameter which sets the granularity of the clustering. Lower resolution lower granularity. It’s suggested to set this parameter to 0.4-1.2 for datasets around 3000 cells.
In our case we’ll use 4 different values based on the dimensions of our datasets, then we’ll explore which resolution produced a bettere clustering.
invisible({
capture.output({
npcs = 15
### HERE CHOOSE NUMBER OF DIMENSIONS
loose <- FindNeighbors(loose, dims = 1:npcs)
loose <- FindClusters(loose, resolution = c(0.8, 1, 2, 4))
strict <- FindNeighbors(strict, dims = 1:npcs)
strict <- FindClusters(strict, resolution = c(0.4, 0.8, 1, 1.2))
panglao <- FindNeighbors(panglao, dims = 1:npcs)
panglao <- FindClusters(panglao, resolution = c(0.8, 1, 1.2, 2))
})
})
4.1 Results
To understand which resolution worked best for our purposes we can visualize the number of clusters for each value or the distribution of reads/genes. Furthermore, we can compute the silhouette for each different resolution value. This is done following the instructions at this GitHub page.
Loose
cluster_abundance(loose, res = T) + scale_fill_viridis(discrete = TRUE) + theme(axis.text.x = element_blank())
cluster_density(loose, "nCount_RNA", res = TRUE) + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
We can see that by increasing the resolution the number of clusters increases. The distribution of number of genes in clusters varies quite a bit with some clusters with low counts and some with high variability. The silhouette analysis suggests that despite the high number of cells a lower value of resolution seems to work best. For this reason we’ll go on with the clusters defined with res = 0.8.
Strict
We then perform the same analysis on our strictly filtered dataset in order to decide which resolution is the best.
cluster_abundance(strict, res = T) + scale_fill_viridis(discrete = TRUE) + theme(axis.text.x = element_blank())
cluster_density(strict, "nCount_RNA", res = TRUE) + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
Here we decided to use res = 0.8 in order to try to not oversimplify the groups (with res = 0.4) even if, probably, with these filters we may have lost some cells sub-population. Neverthless the mean silhouette value is similar.
Panglao
Finally, we do the same on panglao data.
cluster_abundance(panglao, res = T) + scale_fill_viridis(discrete = TRUE) + theme(axis.text.x = element_blank())
cluster_density(panglao, "nCount_RNA", res = TRUE) + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
For these data we decided to proceed with res = 1.
5 Cell cycle scoring
To assess how the cell cycle was influencing our dataset we decided to infer the stage for each cell using the CellCycleScoring()
function. This function assign a score for each cell representing the probability of being in phase S, G2M or G1. It manages to do that thanks to a list of phase specific marker genes stored in cc.genes
. We’ll only explore how this variable is distributed in our data but we won’t regress out cell cycle information.
cc.genes <- lapply(cc.genes, str_to_title)
loose <- CellCycleScoring(loose, s.features = cc.genes$s.genes, g2m.features = cc.genes$g2m.genes)
strict <- CellCycleScoring(strict, s.features = cc.genes$s.genes, g2m.features = cc.genes$g2m.genes)
panglao <- CellCycleScoring(panglao, s.features = cc.genes$s.genes, g2m.features = cc.genes$g2m.genes)
loose@meta.data <- loose@meta.data %>% mutate(Phase = factor(Phase))
strict@meta.data <- strict@meta.data %>% mutate(Phase = factor(Phase))
panglao@meta.data <- panglao@meta.data %>% mutate(Phase = factor(Phase))
5.1 Results
Loose
l1 <- phase_prop(loose) + labs(subtitle = "Loose")
l2 <- phase_prop(loose, by.cluster = T) + labs(subtitle = "Loose")
l1 + l2 + plot_layout(widths = c(1, 3)) + plot_annotation(tag_levels = "A")
Strict
#
s1 <- phase_prop(strict) + labs(subtitle = "Strict")
s2 <- phase_prop(strict, by.cluster = T) + labs(subtitle = "Strict")
s1 + s2 + plot_layout(widths = c(1, 3)) + plot_annotation(tag_levels = "A")
Panglao
pp1 <- phase_prop(panglao) + labs(subtitle = "Panglao")
pp2 <- phase_prop(panglao, by.cluster = T) + labs(subtitle = "Panglao")
pp1 + pp2 + plot_layout(widths = c(1, 3)) + plot_annotation(tag_levels = "A")
We can see that more or less the proportion of cell phases are the same between datsets and between clusters. Except for few peculiar clusters with almost no cells in G1 phase (12 in loose, 9-11 in strict and 9-12 in panglao). We’ll try to figure out the reason behind this, they may represent specific proliferating cells.
6 Non-linear dimensionality reduction
Non linear dimensionality algortihms are often used to visualize complex data in 2 dimensions. The most common techiniques emplyed in scRNA-seq analysis are t-distributed stochastic neighbour embedding (tSNE) (Maaten and Hinton 2008) and Uniform Approximation and Projection method (UMAP) (McInnes, Healy, and Melville 2018). The first tends to capture local affinity rather than global structure. This characteristic is controllled by the perplexity parameter, at lower perplexity the projected data will be spread out with little preservation of global structure. UMAP is a new algorithm able to scale up well with increasing data dimensions and to preserve global structure. UMAP main parameter is the number of neighbors used to construct the initial graph. Low values will cause to focus on local structures while higher values will result in a representation of the big picture.
We’ll run both of these algorithms on our datasets with default Seurat parameters.
loose <- RunUMAP(loose, dims = 1:npcs)
loose <- RunTSNE(loose, dims = 1:npcs)
strict <- RunUMAP(strict, dims = 1:npcs)
strict <- RunTSNE(strict, dims = 1:npcs)
panglao <- RunUMAP(panglao, dims = 1:npcs)
panglao <- RunTSNE(panglao, dims = 1:npcs)
For each dataset we’ll perform a visual analysis in the newly reduced space:
- Continous vars contains the scatterplot in the reduced dimensions where cells are colored by number of reads, mito percentage and novelty.
- Phase plot represent the cells in redcude dimensions colore by cellphase
- Phase facet is the same plot as before but faceted by phase in order to visualize overlapping points
- Clusters represents cells colored by cluster in the reduced dimensions
6.1 Results
Loose
Strict
7 Cell type markers
We can find differently expressed genes of single clusters compared to all other cells. This can be done through the FindAllMArkers()
function. This function returns a dataframe of genes for each cluster with associated statistics like p-value and average log fold change. The default is to do a wilcoxon test against all other groups.
loose.markers <- FindAllMarkers(loose, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
strict.markers <- FindAllMarkers(strict, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
panglao.markers <- FindAllMarkers(panglao, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
7.1 Genes
We can visualize the expression of this gene in our cells by coloring each cell by the intensity of its expression or by doing a violinplot of the gene expression in each cluster.
Loose
FeaturePlot(loose, features = top_spread(loose.markers, 1)$rank1, reduction = "umap",
cols = c("#e7e1ef", "#b71540"), coord.fixed = TRUE)
VlnPlot(loose, features = top_spread(loose.markers, 1)$rank1, slot = "counts", log = TRUE,
pt.size = 0)
7.2 Save progress
7.3 Type assignment
To assign each cluster to its type we’ll search on three different database of markers gene and we’ll cross information to be as sure as possible of the assigned cell type. Panglao Gene Search, Dropviz (Saunders et al. 2018) and Mousebrain Gene Search (Zeisel et al. 2018).
We’ll use a little function panglao_searcher()
that helps us copypasting and assign the new type (in this case I saved the data in another file and run this script on the saved data). This chunk won’t be evaluated!
ol <- "Oligodendrocytes"
n <- "Neurons"
u <- "Unknown"
a <- "Astrocytes"
m <- "Microglia"
e <- "Endothelial cells"
opc <- "Oligodendrocytes parental cells"
i <- "Interneurons"
p <- "Polydendrocytes"
er <- "Erythroid-like and erythroid precursor cells"
ep <- "Ependymal cells"
msp <- "Medium spiny neurons"
ipc <- "Intermediate progenitor cells"
int <- "Interneurons"
dspn <- "Direct spiny neurons"
ispn <- "Indirect spiny neurons"
loose_df <- top_spread(loose.markers, 3)
loose_df <- panglao_searcher(loose_df, collapse = ",", append = "cell_type")
loose_df <- loose_df %>% mutate_all(as.factor)
saveRDS(loose_df, "data/loose_type.rds")
strict_df <- top_spread(strict.markers, 3)
strict_df <- panglao_searcher(strict_df, collapse = ",", append = "cell_type")
strict_df <- strict_df %>% mutate_all(as.factor)
saveRDS(strict_df, "data/strict_type.rds")
panglao_df <- top_spread(panglao.markers, 3)
panglao_df <- panglao_searcher(panglao_df, collapse = ",", append = "cell_type")
panglao_df <- panglao_df %>% mutate_all(as.factor)
saveRDS(panglao_df, "data/panglao_type.rds")
We load a dataframe containing the associated cell type for each cluster and associated cell type. We join this dataset with our Seurat object metadata, then, we set the cell type as new identities.
loose_df <- readRDS("data/loose_type.rds")
strict_df <- readRDS("data/strict_type.rds")
panglao_df <- readRDS("data/panglao_type.rds")
l_id <- as.vector(loose_df$cell_type)
names(l_id) <- levels(loose)
loose <- RenameIdents(loose, l_id)
loose@meta.data <- loose@meta.data %>% left_join(loose_df, by = c(seurat_clusters = "cluster"))
s_id <- as.vector(strict_df$cell_type)
names(s_id) <- levels(strict)
strict <- RenameIdents(strict, s_id)
strict@meta.data <- strict@meta.data %>% left_join(strict_df, by = c(seurat_clusters = "cluster"))
p_id <- as.vector(panglao_df$cell_type)
names(p_id) <- levels(panglao)
panglao <- RenameIdents(panglao, p_id)
panglao@meta.data <- panglao@meta.data %>% left_join(panglao_df, by = c(seurat_clusters = "cluster"))
8 SI
loose_mrk <- loose.markers %>% left_join(loose_df, by = "cluster") %>% select(-contains("rank")) %>%
group_by(cluster) %>% top_n(6, wt = avg_logFC) %>% ungroup()
xlsx::write.xlsx(as.data.frame(loose_mrk), "data/SI_Mutti.xlsx", sheetName = "loose_markers",
row.names = FALSE)
loose_n <- loose@meta.data %>% group_by(cell_type, seurat_clusters) %>% tally() %>%
ungroup()
xlsx::write.xlsx(as.data.frame(loose_n), "data/SI_Mutti.xlsx", sheetName = "loose_n_by_clust",
row.names = FALSE, append = TRUE)
strict_mrk <- strict.markers %>% left_join(strict_df, by = "cluster") %>% select(-contains("rank")) %>%
group_by(cluster) %>% top_n(6, wt = avg_logFC) %>% ungroup()
xlsx::write.xlsx(as.data.frame(strict_mrk), "data/SI_Mutti.xlsx", sheetName = "strict_markers",
row.names = FALSE, append = TRUE)
strict_n <- strict@meta.data %>% group_by(cell_type, seurat_clusters) %>% tally() %>%
ungroup()
xlsx::write.xlsx(as.data.frame(strict_n), "data/SI_Mutti.xlsx", sheetName = "strict_n_by_clust",
row.names = FALSE, append = TRUE)
panglao_mrk <- panglao.markers %>% left_join(panglao_df, by = "cluster") %>% select(-contains("rank")) %>%
group_by(cluster) %>% top_n(6, wt = avg_logFC) %>% ungroup()
xlsx::write.xlsx(as.data.frame(panglao_mrk), "data/SI_Mutti.xlsx", sheetName = "panglao_markers",
row.names = FALSE, append = TRUE)
panglao_n <- panglao@meta.data %>% group_by(cell_type, seurat_clusters) %>% tally() %>%
ungroup()
xlsx::write.xlsx(as.data.frame(panglao_n), "data/SI_Mutti.xlsx", sheetName = "panglao_n_by_clust",
row.names = FALSE, append = TRUE)
9 Session
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] cowplot_1.1.0 viridis_0.5.1
[3] viridisLite_0.3.0 gghalves_0.1.0
[5] ggrepel_0.8.2 ggExtra_0.9
[7] patchwork_1.0.1 here_0.1
[9] forcats_0.5.0 stringr_1.4.0
[11] dplyr_1.0.2 purrr_0.3.4
[13] readr_1.3.1 tidyr_1.1.2
[15] tibble_3.0.3 ggplot2_3.3.2
[17] tidyverse_1.3.0 cluster_2.1.0
[19] Matrix_1.2-18 scDblFinder_1.2.0
[21] Seurat_3.2.2 SingleCellExperiment_1.10.1
[23] SummarizedExperiment_1.18.1 DelayedArray_0.14.0
[25] matrixStats_0.56.0 Biobase_2.48.0
[27] GenomicRanges_1.40.0 GenomeInfoDb_1.24.0
[29] IRanges_2.22.2 S4Vectors_0.26.1
[31] BiocGenerics_0.34.0 rmdformats_0.3.7
[33] knitr_1.30
loaded via a namespace (and not attached):
[1] reticulate_1.16 tidyselect_1.1.0
[3] htmlwidgets_1.5.1 grid_4.0.2
[5] BiocParallel_1.22.0 Rtsne_0.15
[7] munsell_0.5.0 codetools_0.2-16
[9] ica_1.0-2 statmod_1.4.34
[11] scran_1.16.0 future_1.19.1
[13] miniUI_0.1.1.1 withr_2.3.0
[15] colorspace_1.4-1 highr_0.8
[17] rstudioapi_0.11 ROCR_1.0-11
[19] tensor_1.5 rJava_0.9-13
[21] listenv_0.8.0 labeling_0.3
[23] GenomeInfoDbData_1.2.3 polyclip_1.10-0
[25] farver_2.0.3 rprojroot_1.3-2
[27] vctrs_0.3.4 generics_0.0.2
[29] xfun_0.17 randomForest_4.6-14
[31] R6_2.4.1 ggbeeswarm_0.6.0
[33] rsvd_1.0.3 locfit_1.5-9.4
[35] bitops_1.0-6 spatstat.utils_1.17-0
[37] assertthat_0.2.1 promises_1.1.1
[39] scales_1.1.1 beeswarm_0.2.3
[41] gtable_0.3.0 globals_0.13.0
[43] goftest_1.2-2 xlsx_0.6.4.2
[45] rlang_0.4.7 splines_4.0.2
[47] lazyeval_0.2.2 broom_0.7.0
[49] yaml_2.2.1 reshape2_1.4.4
[51] abind_1.4-5 modelr_0.1.8
[53] backports_1.1.10 httpuv_1.5.4
[55] tools_4.0.2 bookdown_0.20
[57] ellipsis_0.3.1 RColorBrewer_1.1-2
[59] ggridges_0.5.2 Rcpp_1.0.5
[61] plyr_1.8.6 zlibbioc_1.34.0
[63] RCurl_1.98-1.2 rpart_4.1-15
[65] deldir_0.1-29 pbapply_1.4-3
[67] zoo_1.8-8 haven_2.3.1
[69] fs_1.5.0 magrittr_1.5
[71] data.table_1.13.0 lmtest_0.9-38
[73] reprex_0.3.0 RANN_2.6.1
[75] fitdistrplus_1.1-1 hms_0.5.3
[77] xlsxjars_0.6.1 mime_0.9
[79] evaluate_0.14 xtable_1.8-4
[81] readxl_1.3.1 gridExtra_2.3
[83] compiler_4.0.2 scater_1.16.2
[85] KernSmooth_2.23-17 crayon_1.3.4
[87] htmltools_0.5.0 mgcv_1.8-31
[89] later_1.1.0.1 lubridate_1.7.9
[91] DBI_1.1.0 formatR_1.7
[93] dbplyr_1.4.4 MASS_7.3-53
[95] rappdirs_0.3.1 cli_2.0.2
[97] igraph_1.2.5 pkgconfig_2.0.3
[99] plotly_4.9.2.1 xml2_1.3.2
[101] vipor_0.4.5 dqrng_0.2.1
[103] XVector_0.28.0 rvest_0.3.6
[105] digest_0.6.25 sctransform_0.3
[107] RcppAnnoy_0.0.16 spatstat.data_1.4-3
[109] rmarkdown_2.3 cellranger_1.1.0
[111] leiden_0.3.3 uwot_0.1.8
[113] edgeR_3.30.3 DelayedMatrixStats_1.10.1
[115] shiny_1.5.0 lifecycle_0.2.0
[117] nlme_3.1-148 jsonlite_1.7.1
[119] BiocNeighbors_1.6.0 limma_3.44.3
[121] fansi_0.4.1 pillar_1.4.6
[123] lattice_0.20-41 fastmap_1.0.1
[125] httr_1.4.2 survival_3.2-3
[127] glue_1.4.2 spatstat_1.64-1
[129] png_0.1-7 stringi_1.5.3
[131] blob_1.2.1 BiocSingular_1.4.0
[133] irlba_2.3.3 future.apply_1.6.0
References
Franzén, Oscar, Li-Ming Gan, and Johan LM Björkegren. 2019. “PanglaoDB: A Web Server for Exploration of Mouse and Human Single-Cell Rna Sequencing Data.” Database 2019.
Germain, Pierre-Luc. 2020. ScDblFinder: ScDblFinder. https://github.com/plger/scDblFinder.
Heimberg, Graham, Rajat Bhatnagar, Hana El-Samad, and Matt Thomson. 2016. “Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing.” Cell Systems 2 (4): 239–50.
Hillje, R. 2020. “ScRNA-Seq Analysis Workflow.” GitHub Repository. https://romanhaa.github.io/projects/scrnaseq_workflow/; GitHub.
Luecken, Malte D, and Fabian J Theis. 2019. “Current Best Practices in Single-Cell Rna-Seq Analysis: A Tutorial.” Molecular Systems Biology 15 (6): e8746.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-Sne.” Journal of Machine Learning Research 9 (Nov): 2579–2605.
McInnes, L., J. Healy, and J. Melville. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.” ArXiv E-Prints, February. http://arxiv.org/abs/1802.03426.
Saunders, Arpiar, Evan Z Macosko, Alec Wysoker, Melissa Goldman, Fenna M Krienen, Heather de Rivera, Elizabeth Bien, et al. 2018. “Molecular Diversity and Specializations Among the Cells of the Adult Mouse Brain.” Cell 174 (4): 1015–30.
Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177 (7): 1888–1902.
Zeisel, Amit, Hannah Hochgerner, Peter Lönnerberg, Anna Johnsson, Fatima Memic, Job Van Der Zwan, Martin Häring, et al. 2018. “Molecular Architecture of the Mouse Nervous System.” Cell 174 (4): 999–1014.