scanpy也可以使用harmony,但是其实调用的Harmonypy这个包,其实使用的话倒是比较简单,就是下面这些命令,但是我不是很关心这个,关键是它怎么写的
Fast, sensitive and accurate integration of single-cell data with Harmony | Nature Methods
import scanpy as sc
import scanpy.external as sce
adata = sc.datasets.pbmc3k()
sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata)
adata.obs['batch'] = 1350*['a'] + 1350*['b']
sce.pp.harmony_integrate(adata, 'batch')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['batch'], legend_fontsize=8)
harmonypy/harmony.py at master · slowkow/harmonypy (github.com)
obsm就是降维的数据
sce.pp.harmony_integrate(adata, 'batch')这句其实就是下面这个调用下面的语句
import harmonypy
harmony_out = harmonypy.run_harmony(adata.obsm["X_pca"], adata.obs, 'batch')
adata.obsm[adjusted_basis] = harmony_out.Z_corr.T ###obsm就是降维的数据
adata.obs
算法1这里讲的就是harmnoy通过使用PCA降维后的数据,不断重复聚类,收敛的过程
算法的主体
def harmonize(self, iter_harmony=10, verbose=True):
converged = False
for i in range(1, iter_harmony + 1):
if verbose:
logger.info("Iteration {} of {}".format(i, iter_harmony))
# STEP 1: Clustering
self.cluster()
# STEP 2: Regress out covariates
# self.moe_correct_ridge()
self.Z_cos, self.Z_corr, self.W, self.Phi_Rk = moe_correct_ridge(
self.Z_orig, self.Z_cos, self.Z_corr, self.R, self.W, self.K,
self.Phi_Rk, self.Phi_moe, self.lamb
)
# STEP 3: Check for convergence
converged = self.check_convergence(1)
if converged:
if verbose:
logger.info(
"Converged after {} iteration{}"
.format(i, 's' if i > 1 else '')
)
break
if verbose and not converged:
logger.info("Stopped before convergence")
return 0
算法2是最大优化多样性聚类
Harmony概率性地将细胞分配给cluster,从而使每个cluster内数据集的多样性最大化。
def cluster(self):
# Z_cos has changed
# R is assumed to not have changed
# Update Y to match new integrated data
self.dist_mat = 2 * (1 - np.dot(self.Y.T, self.Z_cos))
for i in range(self.max_iter_kmeans):
# print("kmeans {}".format(i))
# STEP 1: Update Y
self.Y = np.dot(self.Z_cos, self.R.T)
self.Y = self.Y / np.linalg.norm(self.Y, ord=2, axis=0)
# STEP 2: Update dist_mat
self.dist_mat = 2 * (1 - np.dot(self.Y.T, self.Z_cos))
# STEP 3: Update R
self.update_R()
# STEP 4: Check for convergence
self.compute_objective()
if i > self.window_size:
converged = self.check_convergence(0)
if converged:
break
self.kmeans_rounds.append(i)
self.objective_harmony.append(self.objective_kmeans[-1])
return 0
算法3,Mixture of Experts Correct
由于Harmony使用软聚类,因此可以通过多个因子的线性组合对其A中进行的软聚类分配进行线性校正,来修正每个单细胞。
def moe_correct_ridge(Z_orig, Z_cos, Z_corr, R, W, K, Phi_Rk, Phi_moe, lamb):
Z_corr = Z_orig.copy()
for i in range(K):
Phi_Rk = np.multiply(Phi_moe, R[i,:])
x = np.dot(Phi_Rk, Phi_moe.T) + lamb
W = np.dot(np.dot(np.linalg.inv(x), Phi_Rk), Z_orig.T)
W[0,:] = 0 # do not remove the intercept
Z_corr -= np.dot(W.T, Phi_Rk)
Z_cos = Z_corr / np.linalg.norm(Z_corr, ord=2, axis=0)
return Z_cos, Z_corr, W,
(A)Harmony概率性地将细胞分配给cluster,从而使每个cluster内数据集的多样性最大化。
(B)Harmony计算每个cluster的所有数据集的全局中心,以及特定数据集的中心。
(C)在每个cluster中,Harmony基于中心为每个数据集计算校正因子。
(D)最后,Harmony使用基于C的特定于细胞的因子校正每个细胞。由于Harmony使用软聚类,因此可以通过多个因子的线性组合对其A中进行的软聚类分配进行线性校正,来修正每个单细胞。
重复步骤A到D,直到收敛为止。聚类分配和数据集之间的依赖性随着每一轮的减少而减小。
“harmony”整合不同平台的单细胞数据之旅 - 腾讯云开发者社区-腾讯云 (tencent.com)