今天我们来分享一篇关于细胞通讯的文章,其实关于细胞通讯我分享的就已经很多了,不知道大家对10X单细胞做细胞通讯有了怎样的见解,今天我们来分享一篇新的细胞通讯的分析文章,文章在Connectome: computation and visualization of cell-cell signaling topologies in single-cell systems data,还是按照我们原来的方法,先分享文章内容,再来示例参考代码。
Abstract
Single-cell RNA-sequencing data can revolutionize our understanding of the patterns of cell-cell and ligand-receptor connectivity that influence the function of tissues and organs.(这个是常识,但是大家应该知道,真正发挥通讯作用的其实是蛋白分子)。However,the quantification and visualization of these patterns are major computational and epistemological challenges.(这个软件的可视化做的相当不错)。Here, we present Connectome, a software package for R which facilitates rapid calculation, and interactive exploration, of cell-cell signaling network topologies contained in single-cell RNA-sequencing data. Connectome can be used with any reference set of known ligand-receptor mechanisms. It has built-in functionality to facilitate differential and comparative connectomics, in which complete mechanistic networks are quantitatively compared between systems. Connectome includes computational and graphical tools designed to analyze and explore cell-cell connectivity patterns across disparate(不同的) single-cell datasets. We present approaches to quantify these topologies and discuss some of the biologic theory leading to their design.(日常自夸自己的软件)
Introduction
Cell-to-cell communication is a major driver of cell differentiation and physiological function determining tissue/organ development, homeostasis, and response to injury. Within tissues, cells have local neighbors with whom they directly communicate via paracrine signaling and direct cell-cell contact(这个地方涉及到细胞临近通讯,光有单细胞数据是无法做到的,需要借助10X空间转录组技术), and long-range or mobile partners with whom they exchange information via endocrine signaling(通过内分泌信号与他们交换信息的远程或移动partner)。In solid tissues, cell types have specific cellular niches incorporating localized matrix and signaling environments, which facilitate phenotypic maintenance and support specialized cell functions。(看这里的分析主要还是看重细胞通讯和细胞位置的相对关系)。Circulating immune cells use an extensive library of chemokines to coordinate multicellular system responses to threat or injury.(免疫细胞相关性的信号具有流动性)。
接下来做了一些配受体分析的分析点,之前全部分享过,这里不再赘述。
然后介绍了文章开发的软件,Connectome,以及一些好用的方法。
Defining the data structure of tissue connectomics
The connectomic mapping discussed here treats every cell parcellation (e.g., different cell types, phenotypically distinct cell states of the same cell type, etc.) as a single node, averaging ligand and receptor values across a given cell parcellation to yield mean values which are then linked(仍然是以平均值作为配受体的强度)。Mean-wise connectomics has the advantage of accommodating the zero-values intrinsic to single-cell data,while simplifying the system so that every cell parcellation is represented by
a single, canonical node.(细胞类型的整体代表了细胞,容纳了部分细胞不表达配受体的情况)。However, as this approach will blend the effects of all cellular archetypes within a cluster, initial cell parcellation must be done carefully for the resulting connectomic networks to be biologically meaningful.(同一cluster的不同来源)。
An edgeweight must be defined for each edge in the celltype-celltype connectomic dataset. Connectome, by default, calculates two distinct edgeweights, each of which captures biologically relevant information. The first edgeweight, also referred to as ‘w1’ or ‘weightnorm,’ is defined as a function (by default, the product) of the celltype-wise normalized expression for the ligand and the receptor, or
(这里需要注意的是,矩阵是进行均一化的,均一化后的平均值)。edges are compared across tissue conditions.However, Weightnorm is encumbered by the fact that it ignores the cell-type specificity of many ligands and receptors.Weightnnorm will weigh a highly expressed but non-specific cell-cell link as stronger than a highly specific but lowlyexpressed gene.(均一化矩阵使用的缺点,忽视了细胞类型的配受体特异性,在之前的一篇文章中我提出了一个问题,文章在10X单细胞通讯分析之ICELLNET,对于最后的问题,不知道大家心里有没有答案)。This does not align with biologic intuition for many cell-cell signaling mechanisms, nor with the goal of parsing out rare cell types that may have outsized biological relevance.(这与许多细胞-细胞信号传导机制的生物学直觉不符,也与分析可能具有过大生物学相关性的稀有细胞类型的目的不符。)Therefore, Connectome also, by default, calculates a second edgeweight, alternatively referred to as ‘w2’ or ‘weightscale,’ which is defined as a function
single tissue system. It also can be used to compare cellular prominence within signaling networks across disparate tissue systems, or across species, in which normalized values may vary but their relative expressions across cell types do not(这里重点标注了,知道什么意思吧)。
Alternative edgeweights, such as those incorporating a score for likelihood of downstream signal transduction or those which take into account proximity information from spatial transcriptomics or histology-registered techniques(这里也需要注意),may reveal additional patterns. For the purposes of this software program, Connectome defaults to the above edgeweights definitions because we have found them to be effective for understanding biological questions addressed in our studies.(看来作者更倾向于scale矩阵的下游分析)。
Statistical significance depends upon the question being asked, the test being applied, and the threshold for important differences, and has a very different meaning depending on whether a research team is investigating a single tissue, comparing multiple tissues, or studying a physiologic process. However, we find two general statistical patterns to be of key importance, and we have built-in functionality to Connectome to help the researcher narrow in on these patterns(分析模式,我们来看一下):
(1)单样本分析First, when studying a single tissue system (i.e., one organ in one condition), it is often desirable to focus on those ligand-receptor interactions which are highly associated with (markers for) specific celltype-celltype vectors. While such a pattern is not a guarantee of biological relevance, it is often useful to learn those mechanisms which can mark celltypecelltype vectors in the same ways that transcripts can mark celltypes。
(2)多样本分析Second, when comparing two tissue systems (i.e., one organ in two experimental conditions) it is generally interesting to focus on those mechanisms which, regardless of their association with a specific celltypecelltype vector, are differentially regulated across condition。(多样本对比分析细胞通讯差异的问题不知道大家有没有什么深入的认识,多样本分析是需要不需要整合也是值得考虑的问题)
可视化Visualizing the connectomic signature of a single tissue(注释在图的后面)
看一下分析过程:
(1)the tissue data are parcellated into defined cell types using a standard clustering workflow.细胞定义,这个是无论如何都逃脱不了的。
(2)Then, a mapping is created against a ground-truth database of known ligand-receptor mechanisms.(匹配已知的配受体对,大部分软件都是这么做的)。This mapping yields a large edgelist, wherein nodes are defined as cell types and edge attributes contain quantitative information derived from cell type-specific expression levels.Source nodes denote(表示) the ligands, and target nodes refer to the cognate(同源的) receptors.
(3)Conceptually, the data architecture for a single edge attribute (one column in the above discussed edgelist data frame in Figure 1A) can be thought of as a 3D matrix (Figure 1B), where rows are source (sending) cells, columns are target (receiving) cells, and the z-axis is the full list of ground-truth known ligand-receptor mechanisms.(这是对图的解释说明),This allows clear visualization of the data that needs to be subset in order to explore a single interactome (red), outgoing network(purple), niche network (blue), or cell-cell vector (green).(划重点了啊)。
我们来分享一下图一的代码
library(Seurat)
library(SeuratData)
library(Connectome)
library(ggplot2)
library(cowplot)
####Load Pancreas Data (Used in Fig 1 & 2) ####
data('panc8')
table(Idents(panc8))
Idents(panc8) <- panc8[['celltype']]
table(Idents(panc8))
#Make Connectome
panc8 <- NormalizeData(panc8)
connectome.genes <-
union(Connectome::ncomms8866_human$Ligand.ApprovedSymbol,Connectome::ncomms8866_human$Receptor.A
pprovedSymbol)
genes <- connectome.genes[connectome.genes %in% rownames(panc8)]
panc8 <- ScaleData(panc8,features = genes)
panc8.con <- CreateConnectome(panc8,species = 'human',min.cells.per.ident = 75,p.values = T,calculate.DOR = F)
#Embed
pancreas.list <- SplitObject(panc8, split.by = "tech")
pancreas.list <- pancreas.list[c("celseq", "celseq2", "fluidigmc1", "smartseq2")]
for (i in 1:length(pancreas.list)) {
pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)
pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
}
reference.list <- pancreas.list[c("celseq", "celseq2", "smartseq2")]
pancreas.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:30)
pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims = 1:30)
DefaultAssay(pancreas.integrated) <- "integrated"
pancreas.integrated <- ScaleData(pancreas.integrated, verbose = FALSE)
pancreas.integrated <- RunPCA(pancreas.integrated, npcs = 30, verbose = FALSE)
pancreas.integrated <- RunUMAP(pancreas.integrated, reduction = "pca", dims = 1:30)
#Save
DefaultAssay(pancreas.integrated) <- "RNA"
save(pancreas.integrated,file = 'pancreas.integrated.Robj')
load("~/Box Sync/Connectome Paper/Figures/pancreas.integrated.Robj")
#### Plotting Figure 1 #####
#UMAP
png('panc8_UMAP.png',width = 5.5,height = 4,units = 'in',res = 300)
DimPlot(pancreas.integrated,reduction = 'umap',group.by = "celltype")
dev.off()
#colors~
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
cell.types <- sort(as.character(names(table(Idents(panc8)))))
cols.use <- gg_color_hue(length(cell.types))
names(cols.use) <- cell.types
# x1 Interactome
png('interactome_circos.png',width = 7,height = 4.5,units = 'in',res = 300)
CircosPlot(panc8.con,
mechanisms.include = c('DLL4 - NOTCH3'),
weight.attribute = 'weight_norm',
min.z = NULL,
min.pct = 0.1,
lab.cex = 0.01, max.p = 0.05,
gap.degree = 5,
cols.use = cols.use,
title = 'Single Interactome (DLL4 -> NOTCH3)')
dev.off()
# x1 Vector
png('vector_circos.png',width = 7,height = 4.5,units = 'in',res = 300)
CircosPlot(panc8.con,
sources.include = 'endothelial',
targets.include = 'activated_stellate',
min.z = 0,
min.pct = 0.1,
lab.cex = 0.3, max.p = 0.05,
weight.attribute = 'weight_sc',
cols.use = cols.use,
title = 'Single Vector (Endothelial -> Activated Stellate)')
dev.off()
# x1 Niche
png('niche_circos.png',width = 7,height = 4.5,units = 'in',res = 300)
CircosPlot(panc8.con,
targets.include = 'activated_stellate',
min.pct = 0.1,
lab.cex = 0.3,
min.z = 1,max.p = 0.05,
weight.attribute = 'weight_sc',
cols.use = cols.use,
title = 'Single Niche (Activated Stellate)')
dev.off()
可视化2Tissue network centrality analysis(组织网络集中度分析)
A major goal of tissue science and cell biology is to understand the roles that individual cells types play and their ability to affect other cell types.It is of interest, therefore, to rank cell types based on their ability to produce or receive information within a given signaling network, or signaling family(细胞之间的通讯网络)。如下图:
这个可视化做的还是相当到位的。
In this analysis, the total connectome for a single tissue is first subset down to only those edges belonging to a single signaling family。
(1)This weighted graph is then used to calculate two centrality metrics: the Kleinberg hub and Kleinberg authority scores (这个地方如果大家感兴趣可以查一下)(represented as the dot size), and the cumulative directed edgeweight (x-axis), for each cell type within the network.(这个图就是上面的图B,x轴的大小可以理解为某个配受体家族的权重,点的大小代表得分,点越大,下周越大,代表这个信号通路主要由该细胞类型“发射”(“接收))。The result is a ranking of outgoing and incoming centrality for every cell type, across every signaling family, for each target cell type within the tissue.
看看图二的代码
# Big connectome
png('Big_Connectome_v2.png',width = 10*.82,height = 8*.82,units = 'in',res = 300)
CircosPlot(panc8.con,min.pct = 0.2,max.p = 0.05,lab.cex = 0.01,gap.degree = 0.01,cols.use = cols.use)
dev.off()
png('Single_mode_connectome.png',width = 10*.8,height = 8*.8,units = 'in',res = 300)
CircosPlot(panc8.con,modes.include = 'VEGF',min.pct = 0.05,max.p = 0.05,lab.cex = 0.01,gap.degree =
0.01,cols.use = cols.use)
dev.off()
# Centrality categorization
Centrality(panc8.con,weight.attribute = 'weight_norm',cols.use = cols.use,min.pct = 0.05,max.p = 0.05)
epi.auth <- c('ADAM','Transferrin','MET','TNF','Ephrins','NEGF','MMP','Laminins','FGF','EGF','Intracellular
trafficking','Matrix glycoproteins','UNCAT','Neurotrophins','APOE')
ec.auth <- c('Interleukins','CXCL','VEGF','KIT','Collagens','SLIT','Semaphorins','TGFB','Fibronectin','IGF','WNT','Matrix
(assorted)','NOTCH','ANGPT','BMP')
mes.auth <- c('Vasoactive','Lysophosphatidic acid','PDGF','Protease inhibition')
imm.auth <- c('RARRES','MHC','Cell-cell adhesion','CC','CSF','Complement','TLR','Chemotaxis')
pdf('Centrality.Mes.pdf',width = 12,height = 3)
Centrality(panc8.con,weight.attribute = 'weight_norm',cols.use = cols.use,
modes.include = mes.auth,min.pct = 0.05,max.p = 0.05)
dev.off()
pdf('Centrality.Epi.pdf',width = 12,height = 4)
Centrality(panc8.con,weight.attribute = 'weight_norm',cols.use = cols.use,
modes.include = epi.auth,min.pct = 0.05,max.p = 0.05)
dev.off()
pdf('Centrality.Imm.pdf',width = 12,height = 4)
Centrality(panc8.con,weight.attribute = 'weight_norm',cols.use = cols.use,
modes.include = imm.auth,min.pct = 0.05,max.p = 0.05)
dev.off()
pdf('Centrality.Endo.pdf',width = 12,height = 5)
Centrality(panc8.con,weight.attribute = 'weight_norm',cols.use = cols.use,
modes.include = ec.auth,min.pct = 0.05,max.p = 0.05)
dev.off()
这个地方我们要重点看一下,首先配受体进行了家族的分类,这个很重要,不是所有的软件都有这个功能,更多时候需要我们的先验知识去实现这个能力。还有就是这样的可是画相当直观,直接可以看某个配受体家族的主要sender和receiver。。希望引起大家的重视
可视化3 Differential connectomics with aligned nodes
It is common in systems biology to examine changes in cell-cell signaling between two systems,(信号通路的强度变化,这个我多次强调过)。
In one such example, if the same cell types are present in both systems(我们可以理解为多个样本), we may calculate direct, one-to-one comparison of all edges using the Connectome package.
while conversely, when both the ligand and receptor are downregulated, we may think of the edge as ‘deactivated’. The two alternate cases, when a ligand is upregulated and a receptor down, or vice-versa, are more complicated to interpret. Although such patterns may initially suggest ligand-pressure or ligand-starvation, in which the downstream cell changes its character in response to upstream influence, it is important to note that in most tissue systems there are many cells interacting at once, and that there are likely multi-cell feedback loops in play which confound easy interpretation of such patterns.(四种情况)。
看一下这部分的代码
InstallData('ifnb')
data('ifnb')
table(Idents(ifnb))
Idents(ifnb) <- ifnb[['seurat_annotations']]
table(Idents(ifnb))
# Make differential connectomes:
# First identify ligands and receptors which have mapped in the dataset:
connectome.genes <-
union(Connectome::ncomms8866_human$Ligand.ApprovedSymbol,Connectome::ncomms8866_human$Receptor.A
pprovedSymbol)
genes <- connectome.genes[connectome.genes %in% rownames(ifnb)]
# Split the object by condition:
ifnb.list <- SplitObject(ifnb,split.by = 'stim')
# Normalize, Scale, and create Connectome:
ifnb.con.list <- list()
for (i in 1:length(ifnb.list)){
ifnb.list[[i]] <- NormalizeData(ifnb.list[[i]])
ifnb.list[[i]] <- ScaleData(ifnb.list[[i]],features = rownames(ifnb.list[[i]]))
ifnb.con.list[[i]] <- CreateConnectome(ifnb.list[[i]],species = 'human',p.values = F)
}
names(ifnb.con.list) <- names(ifnb.list)
diff <- DifferentialConnectome(ifnb.con.list[[1]],ifnb.con.list[[2]])
# Stat sig:
# Stash idents and make new identities which identify each as stimulated vs. control
celltypes <- as.character(unique(Idents(ifnb)))
celltypes.stim <- paste(celltypes, 'STIM', sep = '_')
celltypes.ctrl <- paste(celltypes, 'CTRL', sep = '_')
ifnb$celltype.condition <- paste(Idents(ifnb), ifnb$stim, sep = "_")
ifnb$celltype <- Idents(ifnb)
Idents(ifnb) <- "celltype.condition"
# Identify which ligands and receptors, for which cell populations, have an adjusted p-value < 0.05 based on a
Wilcoxon rank test
diff.p <- data.frame()
for (i in 1:length(celltypes)){
temp <- FindMarkers(ifnb,
ident.1 = celltypes.stim[i],
ident.2 = celltypes.ctrl[i],
verbose = FALSE,
features = genes,
min.pct = 0,
logfc.threshold = 0)
temp2 <- subset(temp, p_val_adj < 0.05)
if (nrow(temp2)>0){
temp3 <- data.frame(genes = rownames(temp2),cells = celltypes[i])
diff.p <- rbind(diff.p, temp3)
}
}
diff.p$cell.gene <- paste(diff.p$cells,diff.p$genes,sep = '.')
# Filter differential connectome to only include significantly perturbed edges
diff$source.ligand <- paste(diff$source,diff$ligand,sep = '.')
diff$target.receptor <- paste(diff$target,diff$receptor,sep = '.')
diff.sub <- subset(diff,source.ligand %in% diff.p$cell.gene & target.receptor %in% diff.p$cell.gene)
#### Plotting Figure 3 ####
pdf('diff.score.small.pdf',width = 12,height=3.25)
DifferentialScoringPlot(diff.sub,min.score = 1.5,min.pct = 0.1,infinity.to.max = T,aligned = F)
dev.off()
pdf('diff.score.big.pdf',width = 22 ,height=4.5)
DifferentialScoringPlot(diff.sub,min.score = 1.5,min.pct = 0.1,infinity.to.max = T,aligned = F)
dev.off()
pdf('diff.score.small.2.pdf',width = 12*0.9,height=3.25*0.9)
DifferentialScoringPlot(diff.sub,min.score = 2,min.pct = 0.2,infinity.to.max = T,aligned = F)
dev.off()
pdf('diff.score.big.2.pdf',width = 22*0.9 ,height=4.5*0.9)
DifferentialScoringPlot(diff.sub,min.score = 2,min.pct = 0.2,infinity.to.max = T,aligned = F)
dev.off()
## Ligand and receptor are both **UP**:
diff.up.up <- subset(diff.sub,ligand.norm.lfc > 0 & recept.norm.lfc > 0 )
diff.up.down <- subset(diff.sub,ligand.norm.lfc > 0 & recept.norm.lfc < 0 )
diff.down.up <- subset(diff.sub,ligand.norm.lfc < 0 & recept.norm.lfc > 0 )
diff.down.down <- subset(diff.sub,ligand.norm.lfc < 0 & recept.norm.lfc < 0 )
png('up.up.png',width = 8.5,height = 4.5,units = 'in',res = 300)
CircosDiff(diff.up.up,min.score = 2,min.pct = 0.1,lab.cex = 0.8)
dev.off()
png('up.down.png',width = 8.5,height = 4.5,units = 'in',res = 300)
CircosDiff(diff.up.down,min.score = 2,min.pct = 0.1,lab.cex = 0.8)
dev.off()
png('down.up.png',width = 8.5,height = 4.5,units = 'in',res = 300)
CircosDiff(diff.down.up,min.score = 2,min.pct = 0.1,lab.cex = 0.8)
dev.off()
png('down.down.png',width = 8.5,height = 4.5,units = 'in',res = 300)
CircosDiff(diff.down.down,min.score = 2,min.pct = 0.1,lab.cex = 0.8)
dev.off()
可视化真的做的相当不错。,注意作者用的矩阵是经过scale的。
可视化4 分析动态的组织过程
这一部分的代码,我希望大家可以自己写出来。
那么问题来了,究竟我们做细胞通讯应该用均一化的矩阵呢?还是scale处理过的?心里有答案的同学不妨写一下自己的见解吧!
生活很好,有你更好