单细胞分析之细胞交互-5：NicheNet多组间互作比较

常用的细胞通讯软件：

CellphoneDB：是公开的人工校正的，储存受体、配体以及两种相互作用的数据库。此外，还考虑了结构组成，能够描述异构复合物。（配体-受体+多聚体）

iTALK：通过平均表达量方式，筛选高表达的胚体和受体，根据结果作圈图。（配体-受体）

CellChat：CellChat将细胞的基因表达数据作为输入，并结合配体受体及其辅助因子的相互作用来模拟细胞间通讯。（配体-受体+多聚体+辅因子）

NicheNet // NicheNet多样本分析 // 目标基因的配体和靶基因活性预测：通过将相互作用细胞的表达数据与信号和基因调控网络的先验知识相结合来预测相互作用细胞之间的配体-靶标联系的方法。( 配体-受体+信号通路）
附：NicheNet使用的常见问题汇总

其它细胞互作软件还包括Celltalker，SingleCellSignalR，scTensor和SoptSC（这几个也是基于配体-受体相互作用）

NicheNet 在其分析单细胞转录组数据时，首先构建了细胞内配体、受体、信号蛋白、转录调控因子和靶基因互作信号网络先验知识数据库，然后基于该数据库，通过基因表达模式的强相关性，最终确定了细胞类型之间的潜在通讯，并且能够有效地预测配体活性。

原理简介

NicheNet是一种模拟细胞间相互作用如何影响细胞基因表达的计算方法。真实的细胞通讯过程是配体-受体-信号蛋白-转录调控因子-靶基因，因此仅仅分析配受体之间的相互作用存在一定的局限性。该方法的创新之处在于通过结合细胞表达数据、信号、基因调控网络的先验知识，来预测相互作用的细胞之间的配体-靶基因的连接情况。目前NicheNet可以处理人或者小鼠数据，以人类或小鼠的基因表达数据作为输入，并将其与通过整合配体到目标信号路径构建的先验模型进行结合。

NicheNet分析中依赖的先验模型，主要是为了表征现有知识对配体可以调节目标基因表达的支持程度，为了计算这种配体-靶标调节潜力， NicheNet的先验模型中不仅包括配体-受体的相互作用信息，还包括了细胞内信号传导信息。因此，NicheNet除了预测哪些配体影响另一个细胞的基因表达外，还可以预测哪些靶基因受到这些配体的影响，以及哪些信号介质可能参与其中。

首先，NicheNet收集涵盖配体-受体（例如，KEGG数据库）、细胞信号转导（例如，蛋白质-蛋白质相互作用和激酶-底物相互作用）、基因调控相互作用（例如，基于ChIP数据和motif的推断）的公共数据信息；然后，将这些独立的数据信息整合到两个加权网络中：（1）配体信号网络，其中包含蛋白质 - 蛋白质相互作用，涵盖从配体到下游转录调节因子的信号通路；（2) 基因调控网络，其中包含转录调控因子和靶基因之间的基因调控相互作用。

为了让信息数据源对最终模型做出更多贡献，该软件在集成过程中对每个数据源进行了加权，这些数据源权重是通过基于模型的参数优化自动确定的，以提高配体-目标预测的准确性。

最后，NicheNet结合配体信号传导和基因调控网络来计算所有配体和靶基因对之间的调控潜力评分：如果靶基因的调节因子位于配体信号网络的下游，则配体-靶标对具有高调节潜力。为了计算这一评分，该方法在整合的网络上使用网络传播方法来传播来自配体的信号，从配体开始，流经受体、信号蛋白、转录调节因子，最终在目标基因处结束。

之前写过NicheNet的标准分析pipeline，实际上做细胞互作分析的时候我们更多的还是在做样本间的互作差异比较。平常我用CellChat比较多，但其实NicheNet也可以做多样本互作比较，而且效果更好。

0. 读入`expression data of interest`, `NicheNet ligand-receptor network` 和 `ligand-target matrix`

加载所需要的包

library(nichenetr)
library(RColorBrewer)
library(tidyverse)
library(Seurat) #

读入演示数据

seurat_obj = readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
p1=DimPlot(seurat_obj, group.by = "celltype") # user adaptation required on own dataset
p2=DimPlot(seurat_obj, group.by = "pEMT") # user adaptation required on own dataset
p1|p2

table(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT)
               
#                 High  Low
#  CAF            396  104
#  Endothelial    105   53
#  Malignant     1093  549
#  Myeloid         92    7
#  myofibroblast  382   61
#  T.cell         689    3

这个演示里面比较的是pEMT-high-niche和pEMT-low-niche，换成不同组都一样的。

seurat_obj@meta.data$celltype_aggregate = paste(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT,sep = "_") # user adaptation required on own dataset
DimPlot(seurat_obj, group.by = "celltype_aggregate")

seurat_obj@meta.data$celltype_aggregate %>% table() %>% sort(decreasing = TRUE)
## .
##     Malignant_High        T.cell_High      Malignant_Low           CAF_High myofibroblast_High   Endothelial_High 
##               1093                689                549                396                382                105 
##            CAF_Low       Myeloid_High  myofibroblast_Low    Endothelial_Low        Myeloid_Low         T.cell_Low 
##                104                 92                 61                 53                  7                  3

celltype_id = "celltype_aggregate" # metadata column name of the cell type of interest
seurat_obj = SetIdent(seurat_obj, value = seurat_obj[[celltype_id]])

读入NicheNet受体靶基因矩阵（里面的数值可以理解为亲和力）和受体配体网络（受体配体是否结合）

ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
##                 CXCL1        CXCL2        CXCL3        CXCL5         PPBP
## A1BG     3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF     5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M      6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1  8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
lr_network = lr_network %>% mutate(bonafide = ! database %in% c("ppi_prediction","ppi_prediction_go"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor, bonafide)

head(lr_network)
## # A tibble: 6 x 3
##   ligand receptor bonafide
##   <chr>  <chr>    <lgl>   
## 1 CXCL1  CXCR2    TRUE    
## 2 CXCL2  CXCR2    TRUE    
## 3 CXCL3  CXCR2    TRUE    
## 4 CXCL5  CXCR2    TRUE    
## 5 PPBP   CXCR2    TRUE    
## 6 CXCL6  CXCR2    TRUE

table(lr_network$bonafide)

# FALSE  TRUE 
# 10629  1390 
###?为什么这么多是false？

如果分析的是小鼠的数据，需要先做一下基因的同源转换

organism = "human" # user adaptation required on own dataset
if(organism == "mouse"){
  lr_network = lr_network %>% mutate(ligand = convert_human_to_mouse_symbols(ligand), receptor = convert_human_to_mouse_symbols(receptor)) %>% drop_na()

  colnames(ligand_target_matrix) = ligand_target_matrix %>% colnames() %>% convert_human_to_mouse_symbols()
  rownames(ligand_target_matrix) = ligand_target_matrix %>% rownames() %>% convert_human_to_mouse_symbols()
  ligand_target_matrix = ligand_target_matrix %>% .[!is.na(rownames(ligand_target_matrix)), !is.na(colnames(ligand_target_matrix))]
}

1. Define the niches/microenvironments of interest

每个niche应该至少有一个“sender/niche”细胞群和一个“receiver/target”细胞群。
在这个演示数据集中，我们想要去查看pEMT high和pEMT low的肿瘤组织中免疫细胞对肿瘤细胞的作用差异。因此“Malignant_High”和“Malignant_Low”被定义为“receiver/target”细胞群，其它细胞被定义为“sender/niche”细胞群。注意：T.Cell和Myeloid细胞只有在pEMT-High样本中才被定义为sender，因为pEMT-low样本中这两类细胞数目太少了。

⚠️也就是说，NicheNet在做组间比较的时候，可以把condition-specific的细胞群考虑在内。（比较的是所有sender细胞的组间差异，而不是细胞特异性组间差异）

! Important: your receiver cell type should consist of 1 cluster!

（可以理解为：解析组织微环境如何决定某种细胞的行为）

niches = list(
  "pEMT_High_niche" = list(
    "sender" = c("myofibroblast_High", "Endothelial_High", "CAF_High", "T.cell_High", "Myeloid_High"),
    "receiver" = c("Malignant_High")),
  "pEMT_Low_niche" = list(
    "sender" = c("myofibroblast_Low",  "Endothelial_Low", "CAF_Low"),
    "receiver" = c("Malignant_Low"))
  ) # user adaptation required on own dataset

2. Calculate differential expression between the niches （得到的是差异性`受体配体矩阵`）

In this step, we will determine DE between the different niches for both senders and receivers to define the DE of L-R pairs.

2.1 计算DE

calculate_niche_de Calculate differential expression of cell types in one niche versus all other niches of interest. This is possible for sender cell types and receiver cell types.

计算差异基因的方法默认是Seurat Wilcoxon test（也可以使用其它方法）。

assay_oi = "SCT" # other possibilities: RNA,...
DE_sender = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), niches = niches, type = "sender", assay_oi = assay_oi) # only ligands important for sender cell types
## [1] "Calculate Sender DE between: myofibroblast_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: myofibroblast_High and Endothelial_Low"  
## [3] "Calculate Sender DE between: myofibroblast_High and CAF_Low"          
## [1] "Calculate Sender DE between: Endothelial_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Endothelial_High and Endothelial_Low"  
## [3] "Calculate Sender DE between: Endothelial_High and CAF_Low"          
## [1] "Calculate Sender DE between: CAF_High and myofibroblast_Low" 
## [2] "Calculate Sender DE between: CAF_High and Endothelial_Low"  
## [3] "Calculate Sender DE between: CAF_High and CAF_Low"          
## [1] "Calculate Sender DE between: T.cell_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: T.cell_High and Endothelial_Low"  
## [3] "Calculate Sender DE between: T.cell_High and CAF_Low"          
## [1] "Calculate Sender DE between: Myeloid_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Myeloid_High and Endothelial_Low"  
## [3] "Calculate Sender DE between: Myeloid_High and CAF_Low"          
## [1] "Calculate Sender DE between: myofibroblast_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: myofibroblast_Low and Endothelial_High"  
## [3] "Calculate Sender DE between: myofibroblast_Low and CAF_High"          
## [4] "Calculate Sender DE between: myofibroblast_Low and T.cell_High"       
## [5] "Calculate Sender DE between: myofibroblast_Low and Myeloid_High"      
## [1] "Calculate Sender DE between: Endothelial_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: Endothelial_Low and Endothelial_High"  
## [3] "Calculate Sender DE between: Endothelial_Low and CAF_High"          
## [4] "Calculate Sender DE between: Endothelial_Low and T.cell_High"       
## [5] "Calculate Sender DE between: Endothelial_Low and Myeloid_High"      
## [1] "Calculate Sender DE between: CAF_Low and myofibroblast_High" 
## [2] "Calculate Sender DE between: CAF_Low and Endothelial_High"  
## [3] "Calculate Sender DE between: CAF_Low and CAF_High"           
## [4] "Calculate Sender DE between: CAF_Low and T.cell_High"       
## [5] "Calculate Sender DE between: CAF_Low and Myeloid_High"

DE_receiver = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), niches = niches, type = "receiver", assay_oi = assay_oi) # only receptors now, later on: DE analysis to find targets
## # A tibble: 1 x 2
##   receiver       receiver_other_niche
##   <chr>          <chr>               
## 1 Malignant_High Malignant_Low       
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"

可以看到，它是先计算了Sender：high的5种sender细胞分别和low的3中sender细胞的Sender DE，又反过来计算了low的3中sender细胞分别和high的5种sender细胞的DE。
然后计算了Receiver：肿瘤细胞high-low的差异基因和low-high的差异基因。
这样把细胞类型分开挨个计算而不是把所有sender和receiver细胞合并计算的意义是避免差异分析的结果主要被丰度高的细胞驱动。

正着计算一遍，反着计算一遍，a-->b上调的/下调的基因和b-->a下调的/上调的难道不是一样的吗？
随后我去验证了一下，果然如此。以receiver中的IL6为例

前15个和后15个一一对应。p值是一样的，pct.1和pct.2是反的，log2fc也是反的。

DE_sender = DE_sender %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))
DE_receiver = DE_receiver %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))

2.2 处理DE结果

expression_pct = 0.10
DE_sender_processed = process_niche_de(DE_table = DE_sender, niches = niches, expression_pct = expression_pct, type = "sender")
DE_receiver_processed = process_niche_de(DE_table = DE_receiver, niches = niches, expression_pct = expression_pct, type = "receiver")

以receiver中的IL6为例

对照上面的图看

process_niche_de是对calculate_niche_de的结果做了上上图中所标的整合（红色的整合为了五种pEMT_High_niche，下面三种颜色分别整合为三种pEMT_Low_niche）

2.3 Combine sender-receiver DE based on L-R pairs:

如前所述，来自一种sender细胞的差异表达的配体是通过计算该样品这种sender细胞和另一样品中所有sender细胞得到的。因此我们有多种方法总结得到细胞类型的差异表达配体。我们可以使用average LFC，也可以使用minimum LFC。但是更推荐使用minimum LFC。因为它是评估配体表达的最强的特异性指标，因为高的min LFC意味着和niche 2中的所有细胞类型相比，这个配体在niche 1的这个细胞类型中表达更强（如果使用average LFC，则不能排除niche 2中一种或多种细胞也很强的表达这个配体）。

specificity_score_LR_pairs = "min_lfc"
DE_sender_receiver = combine_sender_receiver_de(DE_sender_processed, DE_receiver_processed, lr_network, specificity_score = specificity_score_LR_pairs)
table(DE_sender_receiver$sender)
#          CAF_High            CAF_Low   Endothelial_High    Endothelial_Low 
#              8171               8171               8171               8171 
#      Myeloid_High myofibroblast_High  myofibroblast_Low        T.cell_High 
#              8171               8171               8171               8171
table(DE_sender_receiver$receiver)
# Malignant_High  Malignant_Low 
#         40855          24513 
View(DE_sender_receiver)

8个sender和2个receiver的互作矩阵

这一步主要得到了DE_sender_receiver这个对象，也就是不同niche中的差异基因。

3. 计算空间互作差异（可选）

如果有空间转录组数据，把下面两行代码设置成TRUE，没有的话直接运行下面的代码即可。

include_spatial_info_sender = FALSE # if not spatial info to include: put this to false # user adaptation required on own dataset
include_spatial_info_receiver = FALSE # if spatial info to include: put this to true # user adaptation required on own dataset

spatial_info = tibble(celltype_region_oi = "CAF_High", celltype_other_region = "myofibroblast_High", niche =  "pEMT_High_niche", celltype_type = "sender") # user adaptation required on own dataset
specificity_score_spatial = "lfc"

# this is how this should be defined if you don't have spatial info
# mock spatial info
if(include_spatial_info_sender == FALSE & include_spatial_info_receiver == FALSE){
    spatial_info = tibble(celltype_region_oi = NA, celltype_other_region = NA) %>% mutate(niche =  niches %>% names() %>% head(1), celltype_type = "sender")
}

if(include_spatial_info_sender == TRUE){
  sender_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "sender"))
  sender_spatial_DE_processed = process_spatial_de(DE_table = sender_spatial_DE, type = "sender", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)

  # add a neutral spatial score for sender celltypes in which the spatial is not known / not of importance
  sender_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
  sender_spatial_DE_processed = sender_spatial_DE_processed %>% bind_rows(sender_spatial_DE_others)

  sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))

} else {
  # # add a neutral spatial score for all sender celltypes (for none of them, spatial is relevant in this case)
  sender_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
  sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))  

}
## [1] "Calculate Spatial DE between: CAF_High and myofibroblast_High"

if(include_spatial_info_receiver == TRUE){
  receiver_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "receiver"))
  receiver_spatial_DE_processed = process_spatial_de(DE_table = receiver_spatial_DE, type = "receiver", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)

  # add a neutral spatial score for receiver celltypes in which the spatial is not known / not of importance
  receiver_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
  receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% bind_rows(receiver_spatial_DE_others)

  receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))

} else {
    # # add a neutral spatial score for all receiver celltypes (for none of them, spatial is relevant in this case)
  receiver_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
  receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
}

4. 计算配体活性，推断`active ligand-target links`

在这一步中，我们将要预测不同niches中receiver细胞类型的每个配体的活性。（和常规NicheNet的配体活性分析类似）。

⚠️第2步是 受体 -- 配体，这一步是 配体--靶基因。

为了计算配体活性，我们首先需要在每个niche中分别定义一个感兴趣的基因集。在这个示例中，pEMT-high的基因集是和pEMT-low肿瘤相比，pEMT-high中的上调基因。pEMT-low的基因集则相反。

lfc_cutoff = 0.15 # recommended for 10x as min_lfc cutoff. 
specificity_score_targets = "min_lfc"

DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi) 
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
View(DE_receiver_processed_targets)

background = DE_receiver_processed_targets  %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
  
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))  
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))  

length(geneset_niche1) #niche1中受体的target
## [1] 1668
length(geneset_niche2) #niche2中受体的target
## [1] 2889

在做配体活性分析之前，最好还是做一下基因集中的基因数的检测。一般认为对配体活性分析来说，感兴趣的基因集中有20-1000基因是比较合适的。如果得到的DE基因数过多，推荐使用更高的lfc_cutoff阈值。在有>2的receiver细胞/niches时或，我们建议使用0.15的cutoff值。如果只有2组receiver细胞/niches时，我们建议使用更高的阈值（比如0.25）。如果是测序深度比较深的数据比如Smart-seq2，同样建议使用更高的阈值。
在这个演示数据中，我们使用的是Smart-seq2的数据，而且只有比较了2个niches，所以我们使用高LFC阈值以得到更少的DE基因（更高的阈值得到的DE基因更少，可信度更高）。

lfc_cutoff = 0.75 

specificity_score_targets = "min_lfc"

DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi) 
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
  
background = DE_receiver_processed_targets  %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
  
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
## [1] "ANXA8L2"  "PRKCDBP"  "IL8"      "PTRF"     "SEPP1"    "C1orf186"
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix)) 
## [1] "LOC344887"    "AGPAT9"       "C1orf110"     "KIAA1467"     "LOC100292680" "EPT1"         "CT45A4"

length(geneset_niche1)
## [1] 169
length(geneset_niche2)
## [1] 136

top_n_target = 250

niche_geneset_list = list(
  "pEMT_High_niche" = list(
    "receiver" = niches[[1]]$receiver,
    "geneset" = geneset_niche1,
    "background" = background),
  "pEMT_Low_niche" = list(
    "receiver" = niches[[2]]$receiver,
    "geneset" = geneset_niche2 ,
    "background" = background)
  )
  
ligand_activities_targets = get_ligand_activities_targets(niche_geneset_list = niche_geneset_list, ligand_target_matrix = ligand_target_matrix, top_n_target = top_n_target)
## [1] "Calculate Ligand activities for: Malignant_High"
## [1] "Calculate Ligand activities for: Malignant_Low"

上面这一步经常出现这个warning

5. 计算不同感兴趣细胞群中的受体、配体和靶基因的(scaled)表达 (log expression values and expression fractions)

在这一步中，我们将会计算受体、配体和靶基因在不同细胞群中的平均(scaled)表达和表达fraction。这里是使用DotPlot展示的，也可以用其他方式展示。

features_oi = union(lr_network$ligand, lr_network$receptor) %>% union(ligand_activities_targets$target) %>% setdiff(NA)
# 上一步得到1597个feature
dotplot = suppressWarnings(Seurat::DotPlot(seurat_obj %>% subset(idents = niches %>% unlist() %>% unique()), features = features_oi, assay = assay_oi))
exprs_tbl = dotplot$data %>% as_tibble()
exprs_tbl = exprs_tbl %>% rename(celltype = id, gene = features.plot, expression = avg.exp, expression_scaled = avg.exp.scaled, fraction = pct.exp) %>%
    mutate(fraction = fraction/100) %>% as_tibble() %>% select(celltype, gene, expression, expression_scaled, fraction) %>% distinct() %>% arrange(gene) %>% mutate(gene = as.character(gene))
  
exprs_tbl_ligand = exprs_tbl %>% filter(gene %in% lr_network$ligand) %>% rename(sender = celltype, ligand = gene, ligand_expression = expression, ligand_expression_scaled = expression_scaled, ligand_fraction = fraction) 
exprs_tbl_receptor = exprs_tbl %>% filter(gene %in% lr_network$receptor) %>% rename(receiver = celltype, receptor = gene, receptor_expression = expression, receptor_expression_scaled = expression_scaled, receptor_fraction = fraction)
exprs_tbl_target = exprs_tbl %>% filter(gene %in% ligand_activities_targets$target) %>% rename(receiver = celltype, target = gene, target_expression = expression, target_expression_scaled = expression_scaled, target_fraction = fraction)
dotplot

为什么要用Dotplot展示这么多基因？直接热图多好

exprs_tbl_ligand = exprs_tbl_ligand %>%  mutate(scaled_ligand_expression_scaled = scale_quantile_adapted(ligand_expression_scaled)) %>% mutate(ligand_fraction_adapted = ligand_fraction) %>% mutate_cond(ligand_fraction >= expression_pct, ligand_fraction_adapted = expression_pct)  %>% mutate(scaled_ligand_fraction_adapted = scale_quantile_adapted(ligand_fraction_adapted))

exprs_tbl_receptor = exprs_tbl_receptor %>% mutate(scaled_receptor_expression_scaled = scale_quantile_adapted(receptor_expression_scaled))  %>% mutate(receptor_fraction_adapted = receptor_fraction) %>% mutate_cond(receptor_fraction >= expression_pct, receptor_fraction_adapted = expression_pct)  %>% mutate(scaled_receptor_fraction_adapted = scale_quantile_adapted(receptor_fraction_adapted))

这一步得到了ligand, receptor和target的表达表。以exprs_tbl_ligand为例，每个表中都有ligand/ receptor/ target的细胞类型，表达量和在细胞中的表达百分比。

exprs_tbl_ligand

6. Expression fraction and receptor

在这一步中，我们将会基于受体表达强度计算配体-受体互作，基于此，我们将对各细胞类型里每个配体的受体进行打分，表达最强的受体将被给予最高的评分。这不会影响随后对单个配体的排序，但是将会帮助我们对每个配体最重要的受体进行排序。(next to other factors regarding the receptor - see later).

exprs_sender_receiver = lr_network %>% 
  inner_join(exprs_tbl_ligand, by = c("ligand")) %>% 
  inner_join(exprs_tbl_receptor, by = c("receptor")) %>% inner_join(DE_sender_receiver %>% distinct(niche, sender, receiver))
  
ligand_scaled_receptor_expression_fraction_df = exprs_sender_receiver %>% group_by(ligand, receiver) %>% mutate(rank_receptor_expression = dense_rank(receptor_expression), rank_receptor_fraction  = dense_rank(receptor_fraction)) %>% mutate(ligand_scaled_receptor_expression_fraction = 0.5*( (rank_receptor_fraction / max(rank_receptor_fraction)) + ((rank_receptor_expression / max(rank_receptor_expression))) ) )  %>% distinct(ligand, receptor, receiver, ligand_scaled_receptor_expression_fraction, bonafide) %>% distinct() %>% ungroup() 
View(ligand_scaled_receptor_expression_fraction_df)

7. Prioritization of ligand-receptor and ligand-target links

在这一步中，我们将会结合上面所有的计算结果来对ligand-receptor-target之间的links进行排序。
We scale every property of interest between 0 and 1, and the final prioritization score is a weighted sum of the scaled scores of all the properties of interest.

prioritizing_weights由以下代码中的12个参数综合决定。

prioritizing_weights = c("scaled_ligand_score" = 5,
                         "scaled_ligand_expression_scaled" = 1,
                         "ligand_fraction" = 1,
                         "scaled_ligand_score_spatial" = 2, 
                         "scaled_receptor_score" = 0.5,
                         "scaled_receptor_expression_scaled" = 0.5,
                          "receptor_fraction" = 1, 
                         "ligand_scaled_receptor_expression_fraction" = 1,
                         "scaled_receptor_score_spatial" = 0,
                         "scaled_activity" = 0,
                         "scaled_activity_normalized" = 1,
                         "bona_fide" = 1)

上面的参数是可变的，需要根据不同的需求设置。（每个参数的含义见：Differential NicheNet analysis between conditions of interest）

output = list(DE_sender_receiver = DE_sender_receiver, ligand_scaled_receptor_expression_fraction_df = ligand_scaled_receptor_expression_fraction_df, sender_spatial_DE_processed = sender_spatial_DE_processed, receiver_spatial_DE_processed = receiver_spatial_DE_processed,
         ligand_activities_targets = ligand_activities_targets, DE_receiver_processed_targets = DE_receiver_processed_targets, exprs_tbl_ligand = exprs_tbl_ligand,  exprs_tbl_receptor = exprs_tbl_receptor, exprs_tbl_target = exprs_tbl_target)
prioritization_tables = get_prioritization_tables(output, prioritizing_weights)

##查看list基本结构
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[1]]$receiver) %>% head(10)

prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[1]]$receiver) %>% head(10)

prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[2]]$receiver) %>% head(10)

prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[2]]$receiver) %>% head(10)

8. 可视化

8.1 Differential expression of ligand and expression

在可视化之前，我们需要先定义每个niche中最重要的配体受体对。We will do this by first determining for which niche the highest score is found for each ligand/ligand-receptor pair. 然后我们就可以得到每个niche中的top 50 ligands。

top_ligand_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
top_ligand_receptor_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand, receptor) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)

ligand_prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, prioritization_score) %>% group_by(ligand, niche) %>% top_n(1, prioritization_score) %>% ungroup() %>% distinct() %>% inner_join(top_ligand_niche_df) %>% filter(niche == top_niche) %>% group_by(niche) %>% top_n(50, prioritization_score) %>% ungroup() # get the top50 ligands per niche
table(ligand_prioritized_tbl_oi$receiver)
# Malignant_High  Malignant_Low 
#             50             50 
View(ligand_prioritized_tbl_oi)

每个niche中的top 50 ligands

然后就是可视化啦！首先我们展示top 受体-配体对，这里展示了每个配体的评分最高的top 2受体。(可以通过修改下面top_n后的数字修改)

receiver_oi = "Malignant_High" 

filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% pull(ligand) %>% unique()

prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand,  receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()

Visualization: minimum LFC compared to other niches

lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot

左图是Sender：左边五列是pEMT_High_niche的五种Sender细胞，右边是pEMT_Low_niche的三种Sender细胞。每一行是用ligand LFC计算出来的top配体受体对，每个配体展示了top2的受体。右图是Receiver：左右两列分别是pEMT_High_niche和pEMT_Low_niche的Receiver细胞。

Show the spatialDE as additional information（适用于空间数据）

lfc_plot_spatial = make_ligand_receptor_lfc_spatial_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, ligand_spatial = include_spatial_info_sender, receptor_spatial = include_spatial_info_receiver, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot_spatial

8.2 配体表达，活性和靶基因

visualization of ligand activity and ligand-target links

exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi,  prioritization_tables$prioritization_tbl_ligand_receptor,  prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand,  output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot

从左到右的四个图依次是：1. top50配体在pEMT_High_niche的五种Sender细胞和pEMT_Low_niche的三种Sender细胞中的表达；2. top50配体在pEMT_High_niche和pEMT_Low_niche的Receiver细胞中的表达；3. 2图的scale；4. top50配体的target的表达

基于这个plot，我们可以推断出许多假说假说，比如：“Interestingly, IL1 family ligands seem to have activity in inducing the DE genes between high pEMT and low pEMT malignant cells; and they are mainly expressed by myeloid cells, a cell type unique for pEMT-high tumors.”

important: ligand-receptor pairs with both high differential expression (or condition-specificity) and ligand activity (=target gene enrichment) are very interesting predictions as key regulators of your intercellular communication process of interest !

这个图默认展示top50的配体。如果觉得展示的信息太多，不容易从其中找到有价值的线索，也可以只展示top20。

filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(20, prioritization_score) %>% pull(ligand) %>% unique()

prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand,  receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup() 

exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi,  prioritization_tables$prioritization_tbl_ligand_receptor,  prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand,  output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot

8.3 Circos plot of prioritized ligand-receptor pairs

Because a top50 is too much to visualize in a circos plot, we will only visualize the top 15.

filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(15, prioritization_score) %>% pull(ligand) %>% unique()

prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand,  receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup() 

colors_sender = brewer.pal(n = prioritized_tbl_oi$sender %>% unique() %>% sort() %>% length(), name = 'Spectral') %>% magrittr::set_names(prioritized_tbl_oi$sender %>% unique() %>% sort())
colors_receiver = c("lavender")  %>% magrittr::set_names(prioritized_tbl_oi$receiver %>% unique() %>% sort())

circos_output = make_circos_lr(prioritized_tbl_oi, colors_sender, colors_receiver)

8.4 Interpretation of these results

多数排名靠前的差异性L-R对似乎来自仅存在于pEMT high肿瘤中的细胞类型。这可能部分是由于生物学的因素（在某种情况下某种独特的细胞类型可能非常重要），但也可能是由于优先排序的方式以及这些独特的细胞类型在其他niche中并没有 “counterpart”。
由于肿瘤微环境中，髓系细胞和T细胞与其他细胞有很大不同，因此它们的配体会显示出很强的差异表达。与来自相同细胞类型但不同niche/条件的细胞之间的差异表达（pEMT-high中的CAF与pEMT-low中的CAF相比）相比，同一niche/条件中（如pEMT-low肿瘤中）髓系/T 细胞vs肌成纤维细胞/CAFs/内皮细胞的差异表达可能更明显（这时候需要做同一niche不同细胞类型的互作差异）。

8.5 Visualization for the other condition: pEMT-low

前面可视化的是在pEMT-high niche中上调的受体配体对，我们也可以可视化在pEMT-low niche中上调的受体配体对。

receiver_oi = "Malignant_Low"  
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(50, prioritization_score) %>% pull(ligand) %>% unique()

prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand,  receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup() 

lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot

9. Notes, limitations, and comparison to default NicheNet.

在原始的NicheNet pipeline中，配体 - 受体对表达的排序仅仅是根据其配体活性得出的。在这个差异性NicheNet pipeline中，我们也是根据与其他niches（或者其他空间位置信息[空转数据]）相比，L-R 对的差异表达来得到信息。
因为我们在这里关注配体- 受体对的差异表达，并且通过默认将DE而不是activity赋予更高的优先级权重，我们倾向于找到许多与默认NicheNet pipeline不同的hits。在Differential NicheNet中，我们倾向于找到更多的high-DE, low-activity hits，而使用default NicheNet，我们找到更多的low-DE, high-activity hits。
应该注意的是，一些high-DE, low-activity hits可能非常重要，因为它们可能是由于NicheNet活性预测的限制而具有低NicheNet活性（例如NicheNet中关于该配体的不正确/不完全的先验知识），但其中一些也可能在DE中很高，但activity不高，因为它们没有强烈的信号效应（例如，仅参与细胞粘附的配体）。
相反的，对于在Diffifer NicheNet中没有被强烈优先考虑的low-DE, high-activity的受体配体对，应考虑以下因素：1）一些配体受到转录后调控，高预测活性可能仍然反映的是真实的信号传导; 2）高预测活性值可能是由于NicheNet的局限性（不准确的先验知识），这些低DE配体在感兴趣的生物学过程中并不重要（但该配体的高DE家族成员可能是重要的。因为家族成员之间的信号传导往往非常相似）; 3）一种情况下的高活性可能是由于另一种情况下的下调，导致高activity和DE低。目前，配体活性是在每个条件下根据上调基因计算的，但下调基因也可能是配体活性的标志。
当配体 - 受体对同时具有高DE和高activity时，我们可以认为它们是调节感兴趣过程的非常好的候选者，我们建议测试这些候选物以进行进一步的实验验证。

应用文献

纵向单细胞分析揭示stress促进转移性卵巢癌的耐药性

References

Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019) doi:10.1038/s41592-019-0667-5

Guilliams et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell (2022) doi:10.1016/j.cell.2021.12.018

Puram, Sidharth V., Itay Tirosh, Anuraag S. Parikh, Anoop P. Patel, Keren Yizhak, Shawn Gillespie, Christopher Rodman, et al. 2017. “Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer.” Cell 171 (7): 1611–1624.e24. https://doi.org/10.1016/j.cell.2017.10.044.

最后编辑于：2022.11.02 09:39:39

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,088评论 5赞 459
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,715评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,361评论 0赞 319
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,099评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 60,987评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,063评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,486评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,175评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,440评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,518评论 2赞 309
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,305评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,190评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,550评论 3赞 298
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,880评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,152评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,451评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,637评论 2赞 335