10X单细胞通讯分析之CrosstalkR（特异性和通讯强度的变化都很重要）

hello，大家好，生信一线员工这次给大家再分享一个细胞通讯的软件CrosstalkR，一样的道理，我们先来看文献，最后分享一下代码，文章在CrossTalkeR: Analysis and Visualisation of Ligand Receptor Networks,在这里再次重新强调一下，做细胞通讯，不能只关注对比中特异的配受体对，更好看看共有的配受体对通讯强度的变化，之前分享过一个方法也分析了通讯强度的变化，文章在单细胞数据细胞通讯分析软件NATMI。希望引起大家足够的重视，好了，我们开始看看文献。

Abstract

Motivation: Ligand-receptor (LR) analysis allows the characterization of cellular crosstalk from single cell RNA-seq data. However, current LR methods provide limited approaches for prioritization of cell types, ligands or receptors or characterizing changes in crosstalk between two biological conditions.（这里如前所说，配受体特异性和强度变化同等重要）。
Results:CrossTalkeR is a framework for network analysis and visualisation of LR networks. CrossTalkeR identifies relevant ligands, receptors and cell types contributing to changes in cell communication when contrasting two biological states: disease vs. homeostasis. A case study on scRNA-seq of human myeloproliferative neoplasms reinforces the strengths of CrossTalkeR for characterisation of changes in cellular crosstalk in disease state.(这个地方也说的我们通常的分析思路，大家对于这个应该不陌生)。

Introduction

Understanding cellular crosstalk is vital for uncovering molecular mechanisms associated with cell differentiation and disease progression.（ $\color{red}{这是共识}$ ）。这一部分主要讲了一下几点：
（1）information of cellular proximity and crosstalk is not captured（单细胞技术的缺点）
（2）current LR inference methods usually predicts, hundreds of potential LR pairs for a given scRNA-seq dataset making their interpretation challenging （目前方法上的缺陷，大部分软件是这样的预测配受体）。
（3） most 20 LR methods focus on the analysis of a single scRNA-seq experiment and are not able to characterize changes in cellular crosstalk between pairs of conditions,（这里就体现通讯强度在不同条件下的变化，而不是简简单单知道哪些特异性的配受体）。
（4）CrossTalkeR explores network-based measures to rank cell types, receptors,ligands, cell-receptors and cell-ligand pairs regarding their importance in cellular crosstalk in a single or a pair of biological conditions.（怎么样，分析思路绝对是perfect），并且和cellphoneDB的库可兼容，看来cellphoneDB一哥的地位无可撼动。

Overview and implementation

图片.png

我们来看看这个软件的几个特点。
1、疾病样本和正常样本是分开分析的。（数据是否整合我们需要进一步看看）
2、配受体的强度仍然采用平均值相乘的方法
3、对于不同state下的配受体，对比配受体强度的变化（上调和下调），及调控网络
4、配受体优先次序的排序，这个很创新
我们详细看看每步的过程。

Network Construction

CrossTalkeR constructs two representations of the LR network.
（1）On the cell-cell interaction (CCI) network, the nodes are defined by each cell-type and the edges are weighted by characteristics of the interactions 。 number of LR pairs and sum of weights of LR pairs。（数量和强度）
（2）On the cell-gene interaction (CGI) network, the nodes represent gene and cell pairs (ligand and sender cell or receptor and receiving cell) and the weights are amsmath igiven by the mean LR expression.（细胞基因的相互网络）。
（3）differential CCI and CGI networks are obtained by subtracting the condition state network, e.g.disease, from the control state network. In this way, the interactions with positive values are up-regulated in the condition and the negative valued interactions are down-regulated on the experiment (disease) state.（对比分析）。

Network-based analysis

（1）Node Ranking approaches:（这部分我们详细翻译一下）：
We apply distinct network property methods to rank either the CCI and CGI network. By default, measures take the weights of LR pairs into account.（使用不同的网络属性方法对CCI和CGI网络进行排名。默认情况下，度量会考虑LR对的权重。）

图片.png

不知道大家能不能详细看出来和之前分析方法的不同。

方法：

图片.png

这部分是解释说明，相对简单。

图片.png

这个地方说的是基因细胞网络的构建。

图片.png

这个地方解释CCI网络，和我们通常的网络差别不大。

图片.png

研究差别倒是很省事😄

图片.png

这里强调了，数据是在整合的基础上构建网络的，整合方法推荐的是Seurat的（恕我直言，der）。

图片.png

和上面一样，研究等级。

最后一部分很值得我们注意：：

PC analysis of network properties

CrossTalkerR employs PCA as a mean to combine distinct network measures. By default, it provides scores of two first PCs as a mean to rank either gene-cell pairs in GCI networks or cells in CCI networks.（排序的方式居然是PCA），CrossTalkeR also provide importance of variables within the PCs to make then interpretable. We interpret that the first PC capture global interactions, whilethe second PC local interactions. Top up-regulated cell-gene pairs predicted by PC2 include matrix (FN1/MSC, COL1A1/MSC) and TGFB signalling associated genes (TGFB1/Megakaryocyte), as previously discussed.。

图片.png

我们来看看参考代码

1 - Step Extract data from Seurat Object

require(EWCE)
require(tibble)
require(biomaRt)
require(tidyr)
require(dplyr)
# Basic function to convert mouse to human gene names
data_ <- readRDS('path_to_data')
alldata <- NormalizeData(alldata, normalization.method = "LogNormalize", scale.factor = 10000)
allgenes <- rownames(alldata)
matrix1 <- as.data.frame(alldata@assays$RNA@data)
matrix1 <- matrix1[rowSums(matrix1[,2:dim(matrix1)[2]])!=0,]

### If you are using a mouse data, then its needed to convert the gene names to human orthologs
human = useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = rownames(alldata@assays$RNA@data) , mart = mouse, attributesL = c("hgnc_symbol","hgnc_id",'ensembl_gene_id'), martL = human, uniqueRows=T)
print(head(genesV2))
matrix1 <- matrix1[match(genesV2$MGI.symbol,rownames(alldata),nomatch=F),]
matrix1$gene <- genesV2$Gene.stable.ID
#rownames(matrix1) <-  genesV2$Gene.stable.ID

### Subseting the matrix
s1 <- grepl('state1',alldata@meta.data$cond)
s2 <- grepl('state2',alldata@meta.data$cond)
s1[match('gene',colnames(matrix1))] <- TRUE
s2[match('gene',colnames(matrix1))] <- TRUE

## Checking the dimensions
print(dim(matrix1[,s1]))
print(dim(matrix1[,s2]))

## If the cluster names are categorical, you will need to convert it to numerical
alldata@meta.data$sub_cell_type <- as.factor(alldata@meta.data$sub_cell_type)
print(levels(alldata@meta.data$sub_cell_type))
levels(alldata@meta.data$sub_cell_type) <- 1:length(levels(alldata@meta.data$sub_cell_type))
print(1:length(levels(alldata@meta.data$sub_cell_type)))
alldata@meta.data$sub_cell_type <- as.numeric(alldata@meta.data$sub_cell_type)

write.table(matrix1[,s1], 's1_filtered_hcount.csv',row.names=T,sep=',')
write.table(matrix1[,s2], 's2_filtered_hcount.csv',row.names=T,sep=',')
metadata <- data.frame(cells=rownames(alldata@meta.data[grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[grepl('state1',alldata@meta.data$stim)])
metadata_s2 <- data.frame(cells=rownames(alldata@meta.data[!grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[!grepl('state1',alldata@meta.data$stim)]) ## Just negate grepl('state1',alldata@meta.data$stim),]
print('Writing Metadata')
write.csv(metadata, 's1_filtered_meta.csv', row.names=FALSE)
write.csv(metadata_tac, 's2_filtered_meta.csv', row.names=FALSE)```

Note that the gene ID needs to be unique, you will need to use an approach to combine multiple mapped orthologs (sum, max or bitwiseor )

Run CellPhoneDB

** The parameters list is available at https://github.com/Teichlab/cellphonedb

Ensembl based ID

#! /bin/bash
mkdir s1 s2  # creating the output folders

cellphonedb method statistical_analysis s1_filtered_meta.csv  s1_filtered_hcount.csv  --threads 30 --output-path s1/

cellphonedb method statistical_analysis s2_filtered_meta.csv  s2_filtered_hcount.csv  --threads 30 --output-path s2/

HUGO based ID

#! /bin/bash
mkdir s1 s2  # creating the output folders

cellphonedb method statistical_analysis s1_filtered_meta.csv --counts-data hgnc_symbol s1_filtered_hcount.csv  --threads 30 --output-path s1/

cellphonedb method statistical_analysis s2_filtered_meta.csv --counts-data hgnc_symbol s2_filtered_hcount.csv  --threads 30 --output-path s2/

Extracting LR

def correct_lr(data):
    '''
    Invert the RL to LR and R1R2 to r2>r1
    '''
    import pandas as pd
    def swap(a,b): return b,a
    data = data.to_dict('index')
    for k,v in data.items():
        if v['isReceptor_fst'] and v['isReceptor_scn']:
            v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
            v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
            v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
        elif v['isReceptor_fst'] and not v['isReceptor_scn']:
            v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
            v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
            v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
    res_df = pd.DataFrame.from_dict(data,orient='index')
    return (res_df)
def cpdb2df(data,clsmapping):
    data = data.fillna(0)
    df_data = {}
    df_data['Ligand'] = []
    df_data['Receptor'] = []
    df_data['Ligand.Cluster'] = []
    df_data['Receptor.Cluster'] = []
    df_data['isReceptor_fst'] = []
    df_data['isReceptor_scn'] = []
    df_data['MeanLR'] = []
    for i in range(data.shape[0]):
        pair = list(data['interacting_pair'])[i].split('_')
        for j in range(data.iloc[:,12:].shape[1]):
            c_pair = list(data.columns)[j+12].split('|')
            if float(data.iloc[i,j+12]) !=0.0:
                df_data['Ligand'].append(pair[0])
                df_data['Receptor'].append(pair[1])
                df_data['Ligand.Cluster'].append(clsmapping[c_pair[0]])
                df_data['Receptor.Cluster'].append(clsmapping[c_pair[1]])
                df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
                df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
                df_data['MeanLR'].append(data.iloc[i,j+12])
    data_final = pd.DataFrame.from_dict(df_data)
    return(data_final)

def cpdb2df_nocls(data):
    '''
        When the cluster name is used on CPDB
    '''
    data = data.fillna(0)
    df_data = {}
    df_data['Ligand'] = []
    df_data['Receptor'] = []
    df_data['Ligand.Cluster'] = []
    df_data['Receptor.Cluster'] = []
    df_data['isReceptor_fst'] = []
    df_data['isReceptor_scn'] = []
    df_data['MeanLR'] = []
    for i in range(data.shape[0]):
        pair = list(data['interacting_pair'])[i].split('_')
        for j in range(data.iloc[:,12:].shape[1]):
            c_pair = list(data.columns)[j+12].split('|')
            if float(data.iloc[i,j+12]) !=0.0:
                df_data['Ligand'].append(pair[0])
                df_data['Receptor'].append(pair[1])
                df_data['Ligand.Cluster'].append(c_pair[0])
                df_data['Receptor.Cluster'].append(c_pair[1])
                df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
                df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
                df_data['MeanLR'].append(data.iloc[i,j+12])
    data_final = pd.DataFrame.from_dict(df_data)
    return(data_final)

s1 = pd.read_csv('./s1/significant_means.txt',sep='\t')
s2 = pd.read_csv('./s2/ significant_means.txt',sep='\t')
#dict with the mapping
num_to_clust = {'1':'Cluters 1',
                '2':'Cluters 2',
                '3':'Cluters 3',
                '4':'Cluters 4',
                '5':'Cluters 5'}

s1_filtered = cpdb2df(s1,num_to_clust)
s2_filtered = cpdb2df(s2,num_to_clust)
s1_filtered = correct_lr(s1_filtered)
s2_filtered = correct_lr(s2_filtered)

s1_filtered.to_csv('s1_filtered_corrected.csv')
s2_filtered.to_csv('s2_filtered_corrected.csv')

CrossTalkeR

library('CrossTalkeR')

# the method always consider the first path as control: the multiple control case will be handle soon
paths <- c('CTR' = 's1_filtered_corrected.csv',
           'EXP' = 's1_filtered_corrected.csv',
           'EXP1' = 's1_filtered_corrected.csv',)
# Selected gene list     
genes <- c('TGFB1', 'PF4')

# Generating the report and the object

data <- generate_report(paths=paths, # paths list
                        selected_genes=genes, # Selected list
                        output='/home/nagai/Documents/', # output path
                        threshold = 0, # threshold of prune edges 0=keep all
                        out_file='All_Nils.html' report name
                        )

至于报告：

图片.png

生活很好，有你更好

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 199,271评论 5赞 466
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 83,725评论 2赞 376
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 146,252评论 0赞 328
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 53,634评论 1赞 270
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 62,549评论 5赞 359
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 47,985评论 1赞 275
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,471评论 3赞 390
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,128评论 0赞 254
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,257评论 1赞 294
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,233评论 2赞 317
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,235评论 1赞 328
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,940评论 3赞 316
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,528评论 3赞 302
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,623评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,858评论 1赞 255
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,245评论 2赞 344
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 41,790评论 2赞 339

10X单细胞通讯分析之CrosstalkR（特异性和通讯强度的变化都很重要）

Abstract

Introduction

Overview and implementation

Network Construction

Network-based analysis

方法：

PC analysis of network properties

我们来看看参考代码

1 - Step Extract data from Seurat Object

Run CellPhoneDB

Ensembl based ID

HUGO based ID

Extracting LR

CrossTalkeR

推荐阅读更多精彩内容