DSNE: A Velocity Visualization Python Tool – Analytics India Magazine

Handling high-dimensional information is a non-trivial job in Exploratory Data Analysis. Numerous instruments starting from the well-known Principal Component Analysis (PCA) to the Stochastic Neighbour Embedding (SNE) variants deal with embedding the high-dimensional information on a low-dimensional visualization, particularly in a two-or three-dimensional illustration. The vanilla Stochastic Neighbor Embedding converts the high-dimensional Euclidean distance from some extent to its K-nearest neighbors into conditional chances that signify similarities. Stochastic Neighbour Embedding variants are developed for task-specific purposes, together with t-SNE, Hierarchical-SNE, RNA-velocity methods, and UMAP. 

Biological science typically offers with high-dimensional velocity information corresponding to embryo improvement, cell fluid switch, cell-type transition, cell differentiation, and mRNA velocity. Recent work by Volker Bergen, et al., the scVelo has launched a strong velocity visualization in a transient cell system. However, the tactic depends on an intuitive method quite than a stable mathematical method. This methodology yields unreliable outcomes for extremely dynamic and unpredictable cell methods corresponding to messenger-RNA (mRNA) velocity visualization.

Songting Shi of the Peking University, China, has developed a mathematically sound method to visualise cell parts’ velocity. This methodology is called as DSNE, the abbreviation for Directional Stochastic Neighbor Embedding. It will be considered as a variant of the vanilla SNE proposed to deal with velocity visualization issues corresponding to cell differentiation and embryo improvement.

DSNE converts the high-dimensional Euclidean distance between the unit-length velocity and the unit-length route from the focus to the closest neighbor factors right into a conditional chance distribution. This conditional chance distribution represents the similarity between the rate and the route of cell particles. 



Mathematically, the conditional chance will be expressed as:

DSNE high dimensional representation

Similarly, the conditional chance distribution for the dimensionally-reduced low-dimensional counterparts will be mathematically expressed as:

DSNE low dimensional representation

An entropy loss operate (the Kullback-Leibler Divergence loss) is integrated to measure the impurity between the high-dimensional representations and the low-dimensional representations. DSNE is optimized throughout coaching by decreasing this loss operate. The loss operate will be expressed mathematically as:

DSNE loss function

Pancreas Visualization utilizing DSNE 

Pancreas is without doubt one of the delicate and important organs within the human physique. Pancreas takes care of meals digestion and blood sugar regulation. Pancreas cells possess extremely dynamic performance inter- and intra-cells. Timely research of Pancreas results in higher treatment in case of impairment. Pancreas cell velocity will be analyzed and visualized utilizing DSNE’s python package deal. 

  1. DSNE is now accessible as a PyPi package deal. We can merely pip set up it.
!pip set up dsne
  1. Create the event surroundings by importing the required modules
 import numpy as np
 import scvelo as scv
 from scipy.sparse import issparse
 from dsne import DSNE, DSNE_approximate 
  1. Configure the visualization settings to go well with the applying and obtain the built-in pancreas dataset.
 scv.settings.verbosity = 3  # present errors(0), warnings(1), information(2), hints(3)
 scv.settings.presenter_view = True  # set max width measurement for presenter view
 scv.settings.set_figure_params('scvelo')  # for beautified visualization
 adata = scv.datasets.pancreas()
 scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
 scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
 scv.tl.velocity(adata) 

Output:

  1. Define a helper operate to acquire the rate parts which might be required to plot pancreas velocity.
 def get_X_V_Y(adata,vkey="velocity",
               xkey="Ms",
               foundation=None,
               gene_subset=None,
               ):
     subset = np.ones(adata.n_vars, bool)
     if gene_subset is just not None:
         var_names_subset = adata.var_names.isin(gene_subset)
         subset &= var_names_subset if len(var_names_subset) > Zero else gene_subset
     elif f"vkey_genes" in adata.var.keys():
         subset &= np.array(adata.var[f"vkey_genes"].values, dtype=bool)
     xkey = xkey if xkey in adata.layers.keys() else "spliced"
     foundation="umap" if foundation is None else foundation
     X = np.array(
         adata.layers[xkey].A[:, subset]
         if issparse(adata.layers[xkey])
         else adata.layers[xkey][:, subset]
     )
     V = np.array(
         adata.layers[vkey].A[:, subset]
         if issparse(adata.layers[vkey])
         else adata.layers[vkey][:, subset]
     )
     # V -= np.nanmean(V, axis=1)[:, None]
     Y =np.array(
         adata.obsm[f"X_basis"]
     )
     nans = np.isnan(np.sum(V, axis=0))
     if np.any(nans):
         X = X[:, ~nans]
         V = V[:, ~nans]
     return X,V,Y 
  1. Get the two-dimensional velocity embedding of high-dimensional pancreas velocity information with customary DSNE.
 X,V,X_2d = get_X_V_Y(adata,vkey="velocity",xkey="Ms",foundation="umap")
 V_2d = DSNE(X, V, Y=X_2d,
             perplexity=3.0,
             Ok=16,
             threshold_V=1e-8,
             separate_threshold=1e-8,
             max_iter=600,
             mom_switch_iter=250,
             momentum=0.5,
             final_momentum=0.8,
             eta=0.1,
             epsilon_kl=1e-16,
             epsilon_dsne=1e-16,
             seed=6,
             random_state=None,
             copy_data=False,
             with_norm=True,
             verbose=True) 
  1. Plot the usual DSNE velocity information in a two-dimensional plot.
 adata.obsm["X_DSNE"] = X_2d
 adata.obsm["V_DSNE"] = V_2d
 title ="DSNE"
 scv.pl.velocity_embedding_stream(adata, title=title+' stream', foundation="umap",V=adata.obsm["V_DSNE"], easy=0.5,density=2,)
 scv.pl.velocity_embedding_grid(adata, title=title+' grid' , foundation="umap",V=adata.obsm["V_DSNE"], easy=0.5,density=2,)
 scv.pl.velocity_embedding(adata,  title=title+' embedding',foundation="umap",V = adata.obsm["V_DSNE"]) 

Output: 

Pancreas DSNE stream
Pancreas DSNE grid
Pancreas DSNE embedding
  1. Plot the compute-efficient DSNE approximate model of the above plots.
 title ="DSNE-approximate"
 V_2d = DSNE_approximate(X, V, Y=X_2d,
                         perplexity=3.0,
                         Ok=16,
                         threshold_V=1e-8,
                         separate_threshold=1e-8,
                         seed=6,
                         random_state=None,
                         copy_data=False,
                         with_norm=True,
                         verbose=True)
 adata.obsm["X_DSNE_approximate"] = X_2d
 adata.obsm["V_DSNE_approximate"] = V_2d
 scv.pl.velocity_embedding_stream(adata, foundation="umap",V=adata.obsm["V_DSNE_approximate"],  title=title+' stream', easy=0.5,density=2,)
 scv.pl.velocity_embedding_grid(adata, foundation="umap",V=adata.obsm["V_DSNE_approximate"],  title=title+' grid', easy=0.5,density=2,)
 scv.pl.velocity_embedding(adata, foundation="umap",V = adata.obsm["V_DSNE_approximate"], title=title+' embedding') 

Output:

Pancreas DSNE approx stream
Pancreas DSNE approx grid
Pancreas DSNE approx embedding

Comparison of DSNE with scVelo on Exact Simulation

  1. Import the required packages and information; and configure hyper-parameters.
 import os
 import numpy as np
 import scvelo as scv
 from anndata import AnnData, read_h5ad
 from dsne import DSNE, DSNE_approximate
 N=500
 D=300
 d=2
 Ok=16
 perplexity =6
 n_rep=1
 actual = False
 with_norm = True
 foundation="exact_embeddings"
 verbose = False 
  1. Define helper capabilities for figuring out unit size and velocity accuracy.
 def unitLength(V):
     V_ = V/np.sqrt(np.sum(V*V,axis=1,keepdims=True))
     return V_
 def velocity_accuracy(V, V_exact):
     V_unit = unitLength(V)
     V_exact_unit = unitLength(V_exact)
     accu = np.sum( V_unit* V_exact_unit )/(V.form[0]*1.)
     return accu 
  1. Define a operate to preprocess and simulate the built-in information.
 def simulate_data(N=50, D=3, d=2, save =True, file_name_prefix ="./data" ):
     if not os.path.exists(file_name_prefix):
         print("Directory:  do not exist, create it! n".format(os.path.abspath(file_name_prefix)))
         os.makedirs(os.path.abspath(file_name_prefix))
     V_2d = np.random.randn(*(N * 3, d)) * 6
     err_2d = np.random.randn(*(N * 3, d))*2
     x_1 = np.asarray([0, ] * d)
     x_2 = np.asarray([50, ] * d)
     x_3 = np.asarray([160, ] * d)
     X_2d = np.zeros_like(V_2d)
     X_2d[0, :] = x_1
     X_2d[N, :] = x_2
     X_2d[N * 2, :] = x_3
     for i in np.arange(N - 1):
         X_2d[i + 1, :] = X_2d[i, :] + V_2d[i, :] + err_2d[i,:]
         X_2d[i + N + 1, :] = X_2d[i + N, :] + V_2d[i + N, :] + err_2d[i + N, :]
         X_2d[i + N * 2 + 1, :] = X_2d[i + N * 2, :] + V_2d[i + N * 2, :] +  err_2d[i + N * 2, :]
     y = np.asarray([0, ] * N + [1, ] * N + [2, ] * N)
     U = np.array(np.random.randn(*(d, D)))
     X = X_2d.__matmul__(U)
     V = V_2d.__matmul__(U)
     adata = AnnData(X=X, layers="velocity": V,obs="clusters": y, obsm="X_exact_embeddings":X_2d, "V_exact_embeddings":V_2d)
     if save:
         file_name = file_name_prefix+"simulated_data_N__D_.h5hd".format(N,D)
         adata.write_h5ad(file_name)
     return adata 
  1. Process the info for simulation and put together it for two-dimensional plotting.
 adata = simulate_data(N=N,D=D,d=d,save=False)
 X = adata.X
 V = adata.layers["velocity"]
 X_basis = f"X_basis"
 X = np.asarray(X, dtype=np.float64)
 V = np.asarray(V, dtype=np.float64)
 Y = None
 if (X_basis in adata.obsm.keys()) and adata.obsm[X_basis] is just not None:
   Y = adata.obsm[f"X_basis"]
 if Y is None:
   print("Do not get the low dimesnional embedding Y! n")
   # increase
 Y = np.asarray(Y, dtype=np.float64) 
  1. Plot the simulation with the latest mannequin, scVelo and decide the accuracy.
 adata_tmp = AnnData(X=X, obsm="X_umap": Y, layers="velocity": V, "spliced": X)
 scv.tl.velocity_graph(adata_tmp, xkey='spliced')
 scv.instruments.velocity_embedding(adata_tmp, foundation="umap")
 W = adata_tmp.obsm["velocity_umap"]
 vkey = "velocity_scvelo_original"
 str_exact = "exact" if actual else "approx"
 methodology = 'scvelo_velocity_original'
 adata.obsm[f"vkey_str_exact_basis"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  method, str_exact,  accu: accun")
 method_str = "scVelo"
 title = " on exact embeddings with accuracy :5.3f".format(method_str, accu)
 scv.pl.velocity_embedding(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, foundation=foundation, V=W, title=title,) 

Output:

  1. Plot the precise simulation visualization with DSNE-Approximate model and decide the accuracy.
 W = DSNE_approximate(X, V, Y=Y,
                                       perplexity=perplexity,
                                       pca_d=None,
                                       threshold_V=1e-8,
                                       separate_threshold=1e-8,
                                       seed=16,
                                       random_state=None,
                                       copy_data=False,
                                       verbose=verbose)
 vkey = "velocity_scvelo"
 str_exact = "exact" if actual else "approx"
 methodology = "DSNE_approximate"
 adata.obsm[f"vkey_str_exact_basis"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  method, str_exact,  accu: accun")
 method_str = "DSNE-approximate"
 title = " on exact embeddings with accuracy :5.3f".format(method_str, accu)
 scv.pl.velocity_embedding(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, foundation=foundation, V=W, title=title,) 

Output:

See Also

selftime feature
DSNE approx embeddings
DSNE approx velocity
DSNE approx grid
  1. Plot the precise simulation visualization with customary DSNE and decide the accuracy.
 W = DSNE(X, V, Y=Y,
                  Ok= Ok,
                  perplexity=perplexity,
                  pca_d=None,
                  threshold_V=1e-8,
                  separate_threshold=1e-8,
                  max_iter=1000,
                  mom_switch_iter=250,
                  momentum=0.5,
                  final_momentum=0.8,
                  eta=0.1,
                  epsilon_kl=1e-16,
                  epsilon_dsne=1e-16,
                  with_norm=with_norm,
                  seed=16,
                  random_state=None,
                  copy_data=True,
                  verbose=verbose)
 vkey = "velocity_dsne"
 methodology = 'DSNE'
 str_exact = "exact" if actual else "approx"
 adata.obsm[f"vkey_str_exact_basis"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  method, str_exact,  accu: accun")
 method_str = "DSNE"
 title = " on exact embeddings with accuracy :5.3f".format(method_str, accu)
 scv.pl.velocity_embedding(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, foundation=foundation, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, foundation=foundation, V=W, title=title,) 

Output:

DSNE velocity
DSNE embedding

Find the Colab Notebook with these code implementations here.

Wrapping up

Directional Stochastic Neighbor Embedding (DSNE) outperforms the latest profitable strategies together with scVelo in velocity visualizations. DSNE achieves extraordinary ends in cell biology purposes corresponding to messenger RNA velocity evaluation, embryo improvement, cell transition and cell differentiation with accuracies near unity!

Further Reading:

Original research paper

Github Repository


Subscribe to our Newsletter

Get the most recent updates and related gives by sharing your e mail.


Join Our Telegram Group. Be a part of an attractive on-line neighborhood. Join Here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here