API Reference
This section provides detailed information about the Nico SC-SP API, including all available modules, classes, and functions.
Module 1: nico_annotations
- nico_annotations.Annotations.create_directory(outputFolder)[source]
Create an empty directory.
This function checks if a specified directory exists, and if not, it creates the directory.
Parameters
- outputFolderstr
The path of the directory to be created.
Raises
- OSError
If the directory cannot be created due to permission issues or other OS-related errors.
Notes
If the directory already exists, no action is taken.
This function ensures that the directory path is available for subsequent file operations.
Example
>>> create_directory('./new_out/')
- nico_annotations.Annotations.delete_files(input)[source]
This function will delete the anchors file and temporary file generated during the annotations.
- nico_annotations.Annotations.findSpatialCells(midzoneCells, mnn)[source]
Finds the anchored cells for each cell type.
This helper function is used in find_all_the_spatial_cells_mapped_to_single_cells to identify the spatial cells that are anchored to each single-cell RNA sequencing (scRNAseq) cell type.
Parameters
- midzoneCellsnumpy.ndarray
An array of barcode IDs representing the single-cell RNA sequencing (scRNAseq) cells belonging to a specific cell type.
- mnnnumpy.ndarray
A 2D array where each row represents a mutual nearest neighbor (MNN) pair. The first column contains spatial cell barcode IDs and the second column contains scRNAseq cell barcode IDs.
Returns
- dict
A dictionary where keys are spatial cell barcode IDs and values are the counts of how many times each spatial cell is anchored to the scRNAseq cells.
- nico_annotations.Annotations.find_all_the_spatial_cells_mapped_to_single_cells(sc_ctype_id, sc_clusters, mnn, sc_ctype_name)[source]
Maps spatial cells to single-cell RNA sequencing (scRNAseq) cell types.
This helper function is used in nico_based_annotation to find the mapping of cells between spatial and scRNAseq modalities. It identifies the spatial cells that correspond to specific scRNAseq cell types based on mutual nearest neighbor (MNN) pairs.
Parameters
- sc_ctype_idnumpy.ndarray
An array of unique identifiers for each scRNAseq cell type.
- sc_clustersnumpy.ndarray
A 2D array where each row contains a barcode ID and a cluster ID representing the clustering of scRNAseq cells.
- mnnnumpy.ndarray
A 2D array where each row represents an MNN pair. The first column contains spatial cell barcode IDs and the second column contains scRNAseq cell barcode IDs.
- sc_ctype_namelist of str
A list of names corresponding to each scRNAseq cell type ID.
Returns
- dict
A dictionary where keys are spatial cell barcode IDs and values are lists containing the scRNAseq cell type names and their counts, with each spatial cell being mapped to the most frequent scRNAseq cell type if there are ties.
- nico_annotations.Annotations.find_anchor_cells_between_ref_and_query(refpath='./inputRef/', quepath='./inputQuery/', output_annotation_dir=None, output_nico_dir=None, neigh=50, no_of_pc=50, minkowski_order=2)[source]
Finds all anchor cells between query and reference data.
This function reads the reference and query data from the specified directories, performs necessary preprocessing, and finds mutual nearest neighbors in the PCA space to map cell types between the two datasets.
Parameters
- refpathstr, optional
Path to the directory containing reference scRNAseq data. This directory should contain: - ‘Original_counts.h5ad’ : The original count matrix in raw layer. - ‘sct_singleCell.h5ad’ : The scTransform-like normalized matrix. (default is ‘./inputRef/’)
- quepathstr, optional
Path to the directory containing spatial transcriptomics query data. This directory should contain: - ‘sct_spatial.h5ad’ : The scTransform-like normalized matrix. (default is ‘./inputQuery/’)
- output_annotation_dirstr, optional
Directory to save output annotations. If None, a default directory is used. (default is None)
- output_nico_dirstr, optional
Directory to save output NICOLAE results. If None, ‘./nico_out/’ is used. (default is ‘./nico_out/annotations’)
- neighint, optional
The number of K-nearest neighbors to find the anchor cells. (default is 50)
- no_of_pcint, optional
The number of principal components used to transform the normalized expression matrix into PCA space. (default is 50)
- minkowski_orderint, optional
The type of distance metric used: - 2 for Euclidean distance - 1 for Manhattan distance (default is 2)
Outputs
The function produces the mapping of cell type information between two modalities and saves the results in the specified output directory.
- nico_annotations.Annotations.find_annotation_index(annot_cellname, sct_cellname)[source]
Helper function for find_common_MNN to find the correct cell name.
This function maps the indices of cell barcodes names to the corresponding indices in the anchored data.
Parameters
- annot_cellnamelist or array-like
List or array of cell barcode names.
- sct_cellnamelist or array-like
List or array of anchored cell barcode names.
Returns
- indexlist
List of indices in sct_cellname that corresponds to the position of the cell barcode name in annot_cellname.
- nico_annotations.Annotations.find_commnon_MNN(input)[source]
Helper function used in find_anchor_cells_between_ref_and_query to find the anchored cells between spatial and sequencing modalities using the mutual nearest neighbors (MNN) method in the PCA space.
This function reads MNN pairs from a provided file and processes the data to identify mutual nearest neighbors between spatial and single-cell datasets.
Parameters
- inputobject
An object containing various attributes required for the function.
Returns
- datanumpy.ndarray
After prunning the bad anchors it returen the good anchors.
- nico_annotations.Annotations.find_index(sp_genename, sc_genename)[source]
Find the common gene space submatrix between two modalities.
This helper function is used within the find_anchor_cells_between_ref_and_query function to identify the indices of common genes between two lists of gene names corresponding to spatial and scRNAseq modalities.
Parameters
- sp_genenamelist
A list of gene names from the spatial modality.
- sc_genenamelist
A list of gene names from the scRNAseq modality.
Returns
- list
A list of indices corresponding to the common genes found in both sp_genename and sc_genename.
Example
>>> sp_genes = ['gene1', 'gene2', 'gene3', 'gene4'] >>> sc_genes = ['gene3', 'gene4', 'gene5', 'gene6'] >>> index_sp,index_sc = find_index(sp_genes, sc_genes) >>> print(index_sp) [2, 3]
- nico_annotations.Annotations.find_match_index_in_dist(t1, t2, s1, s2, index_1, index_2)[source]
Helper function used in find_mutual_nn to find the correct pairing of cell barcodes.
This function takes two lists of cell barcodes and their corresponding indices and returns the matched pair of distances for the specified indices.
Parameters
- t1list or numpy.ndarray
A list or array containing the distances corresponding to the cell barcodes in s1.
- t2list or numpy.ndarray
A list or array containing the distances corresponding to the cell barcodes in s2.
- s1list or numpy.ndarray
A list or array containing the cell barcodes for the first set of distances.
- s2list or numpy.ndarray
A list or array containing the cell barcodes for the second set of distances.
- index_1int
The index of the cell barcode in s1 to be matched.
- index_2int
The index of the cell barcode in s2 to be matched.
Returns
- tuple
A tuple (p1, p2) where p1 is the distance from t1 corresponding to index_1 and p2 is the distance from t2 corresponding to index_2.
- nico_annotations.Annotations.find_mutual_nn(minkowski_order, data1, data2, sp_barcode, sc_barcode, k1, k2)[source]
Helper function used in find_anchor_cells_between_ref_and_query to find mutual nearest neighbors using cKDTree.
This function finds mutual nearest neighbors (MNNs) between two datasets using the cKDTree algorithm. The mutual nearest neighbors are those pairs of points that are each other’s nearest neighbors.
Parameters
- minkowski_orderint
The order of the Minkowski distance to use. For example, 2 is the Euclidean distance.
- data1numpy.ndarray
The reference dataset, typically the spatial dataset.
- data2numpy.ndarray
The query dataset, typically the sequencing dataset.
- sp_barcodenumpy.ndarray
Array of barcodes for the spatial dataset.
- sc_barcodenumpy.ndarray
Array of barcodes for the single-cell dataset.
- k1int
The number of nearest neighbors to query in the spatial dataset.
- k2int
The number of nearest neighbors to query in the single-cell dataset.
Returns
- numpy.ndarray
An array where each row represents a mutual nearest neighbor pair and their distances. The columns are: [sp_barcode of mutual_1, sc_barcode of mutual_2, distance in data1, distance in data2]
- nico_annotations.Annotations.find_unmapped_cells_and_deg(deg, unique_mapped)[source]
Identifies unmapped non-anchored cells and their degrees.
This helper function is used in nico_based_annotation to find cells that have not been mapped and their corresponding degree values. It returns the cell names and their degree values sorted in descending order of degree.
Parameters
- degdict
A dictionary where keys are cell node identifiers and values are their corresponding degrees (number of connections).
- unique_mappeddict
A dictionary where keys are mapped cell node identifiers and values are their corresponding cell type names.
Returns
- tupleA tuple containing two numpy arrays:
- cellnamenumpy.ndarray
An array of unmapped cell node identifiers sorted by their degree values in descending order.
- degvaluenumpy.ndarray
An array of degree values corresponding to the unmapped cell node identifiers, sorted in descending order.
- nico_annotations.Annotations.nico_based_annotation(previous, ref_cluster_tag='cluster', across_spatial_clusters_dispersion_cutoff=0.15, guiding_spatial_cluster_resolution_tag='leiden0.5', number_of_iteration_to_perform_celltype_annotations=3, resolved_tie_issue_with_weighted_nearest_neighbor='No')[source]
This function performs NiCo-based annotation of spatial cell transcriptomes by using label transfer from scRNAseq data.
The function utilizes label transfer to annotate spatial transcriptomic data based on cell type information from scRNAseq data. It leverages anchored cells to iteratively annotate non-anchor cells. The annotations are either performed using a majority vote or a weighted vote based on distances in the transformed gene expression space.
The function starts by reading the output from find_anchor_cells_between_ref_and_query and annotates the spatial cells based on the provided parameters.
Parameters
- previousobject
The output object from find_anchor_cells_between_ref_and_query containing necessary data for annotation.
- ref_cluster_tagstr, optional
The slot in the reference anndata object file (‘Original_counts.h5ad’) where cell type information is stored <anndata>.obs[cluster]. (default is ‘cluster’)
- across_spatial_clusters_dispersion_cutofffloat, optional
The cutoff used to remove noisy anchors. Anchored cells that belong to any guiding spatial cluster with a frequency lower than this cutoff will be discarded. (default is 0.15)
- guiding_spatial_cluster_resolution_tagstr, optional
The guiding spatial Leiden cluster resolution (clustering of spatial data used for anchor pruning). The sct_spatial.h5ad file should have required resolution of Leiden clusterings such as 0.3, 0.4, 0.5, 0.6, 0.7 and 0.8 that can be stored in the anndata.obs slot with name ‘leiden0.3’, ‘leiden0.4’, ‘leiden0.5’, ‘leiden0.6’, ‘leiden0.7’ and ‘leiden0.8’. (default is ‘leiden0.5’)
- number_of_iteration_to_perform_celltype_annotationsint, optional
The number of iterations to perform the cell type annotations. Higher numbers of iterations may annotate more cells but decrease confidence due to dilution of anchor information. (default is 3)
- resolved_tie_issue_with_weighted_nearest_neighborstr, optional
Whether to resolve tie issues in cell type assignment with a weighted nearest neighbor approach: - ‘No’: Assigns ‘NM’ (not mapped) to non-anchor cells in case of a tie. - ‘Yes’: Utilizes the weighted average of cell type proportions for resolving ties, with weights inversely proportional to the distance. (default is ‘No’)
Outputs
For each iteration, the function generates the following files in the specified output directory: - _nico_annotation_cluster.csv: Contains the annotated cluster information. - _nico_annotation_ct_name.csv: Contains the cell type names associated with each cluster.
The default output directory for these files is ./nico_out/annotations/.
Notes
The niche function uses the final iteration of the annotations for finding niche cell type interactions in the spatial_neighborhood_analysis.
- nico_annotations.Annotations.plot_all_ct(CTname, PP, cellsinCT, ax, flag, cmap)[source]
Helper function used in visualize_umap_and_cell_coordinates_with_all_celltypes to plot all cell types together.
This function plots the locations of all cell types together on a single plot, with each cell type assigned a different color. Optionally, it can label each cell type with its index.
Parameters
- CTnamelist of str
A list of cell type names to be plotted.
- PPnumpy.ndarray
A 2D numpy array of shape (n_cells, 2) containing the coordinates of the cells.
- cellsinCTlist of list of int
A list where each element is a list of indices corresponding to cells of a specific type.
- axmatplotlib.axes.Axes
The axes object where the plot will be drawn.
- flagbool
A flag indicating whether to label each cell type with its index on the plot.
- cmapmatplotlib.colors.Colormap
The colormap used to assign colors to different cell types.
- nico_annotations.Annotations.plot_specific_ct(CTname, PP, index, ax, cmap, ms, msna)[source]
Plots individual cell types for visualizing cell type annotations.
This helper function is used to visualize_umap_and_cell_coordinates_with_selected_celltypes by plotting the locations of cells belonging to specific cell types. Cells not belonging to any specified cell types are also plotted in a different color.
Parameters
- CTnamelist of str
A list of cell type names to be plotted.
- PPnumpy.ndarray
A 2D numpy array of shape (n_cells, 2) containing the coordinates of the cells.
- indexlist of list of int
A list where each element is a list of indices corresponding to cells of a specific type.
- axmatplotlib.axes.Axes
The axes object where the plot will be drawn.
- cmapmatplotlib.colors.Colormap
The colormap used to assign colors to different cell types.
- msint or float
The marker size for plotting the cells of specified types.
- msnaint or float
The marker size for plotting the cells not belonging to any specified types.
Returns
- None
This function does not return any value. It directly plots on the provided axes object.
- nico_annotations.Annotations.read_dist_and_nodes_as_graph(knn_dist, knn_nodes)[source]
Reads edges information from k-nearest neighbors (KNN) data and converts it into a graph representation.
This helper function is used in the nico_based_annotation function to interpret the relationships between nodes (cells) based on their KNN distances. It constructs a graph G where nodes represent cells and edges represent the KNN relationships with associated distances as weights. Additionally, it calculates the degree of each node.
Parameters
- knn_distarray-like
A 2D array where each row represents a node and contains the distances to its k-nearest neighbors.
- knn_nodesarray-like
A 2D array where each row represents a node and contains the indices of its k-nearest neighbors.
Returns
- degdict
A dictionary where keys are node indices and values are the degrees (number of edges) of the corresponding nodes.
- Gnetworkx.Graph
A graph object where nodes represent cells and edges represent KNN relationships between cells.
- weightsdict
A dictionary where keys are edge identifiers (formatted as ‘node1#node2’) and values are the distances between the nodes.
Notes
The function assumes that the first element in each row of knn_nodes is the node itself, and subsequent elements are its nearest neighbors.
- nico_annotations.Annotations.remove_extra_character_from_name(name)[source]
Remove special characters from cell type names to avoid errors while saving figures.
This function replaces certain special characters in the input name with underscores or other appropriate characters to ensure the name is safe for use as a filename.
Parameters
- namestr
The original cell type name that may contain special characters.
Returns
- str
The modified cell type name with special characters removed or replaced.
Example
>>> name = 'T-cell (CD4+)/CD8+' >>> clean_name = remove_extra_character_from_name(name) >>> print(clean_name) 'T-cell_CD4p_CD8p'
Notes
The following replacements are made:
‘/’ is replaced with ‘_’
‘ ‘ (space) is replaced with ‘_’
‘”’ (double quote) is removed
“’” (single quote) is removed
‘)’ is removed
‘(’ is removed
‘+’ is replaced with ‘p’
‘-’ is replaced with ‘n’
‘.’ (dot) is removed
These substitutions help in creating filenames that do not contain characters that might be problematic for file systems or software.
- nico_annotations.Annotations.resolved_confused_and_unmapped_mapping_of_cells_with_majority_vote(confused, G, all_mapped, unique_mapped, sp_leiden_barcode2cluid)[source]
Annotates confused anchored and non-anchored spatial cells using a majority vote scheme across the neighbors.
This helper function is used in nico_based_annotation to resolve the mapping of spatial cells that are either confused or not anchored by utilizing the majority vote of their neighbors’ annotations.
Parameters
- confusedlist
List of spatial cell identifiers that are confused and need to be resolved.
- Gnetworkx.Graph
Graph where nodes represent cells and edges represent connections between cells.
- all_mappeddict
Dictionary where keys are cell identifiers and values are their mapped cell types.
- unique_mappeddict
Dictionary to be updated where keys are cell identifiers and values are their resolved mapped cell types.
- sp_leiden_barcode2cluiddict
Dictionary mapping cell identifiers to their cluster IDs based on Leiden clustering.
Returns
- dict
Updated unique_mapped dictionary with resolved cell type annotations for the confused and unmapped cells.
- nico_annotations.Annotations.resolved_confused_and_unmapped_mapping_of_cells_with_weighted_average_of_inverse_distance_in_neighbors(confused, G, weights, all_mapped, unique_mapped, sp_leiden_barcode2cluid_resolution_wise)[source]
Annotates confused and unmapped spatial cells using a weighted average score from their neighbors.
This helper function is used in nico_based_annotation to resolve the mapping of spatial cells that are either confused or not anchored by utilizing the weighted average of the inverse distance to their neighbors.
Parameters
- confusedlist
List of spatial cell identifiers that are confused and need to be resolved.
- Gnetworkx.Graph
Graph where nodes represent cells and edges represent connections between cells.
- weightsdict
A dictionary where keys are edge identifiers (formatted as ‘node1#node2’) and values are the corresponding weights (inverse of distances).
- all_mappeddict
Dictionary where keys are cell identifiers and values are their mapped cell types.
- unique_mappeddict
Dictionary to be updated where keys are cell identifiers and values are their resolved mapped cell types.
- sp_leiden_barcode2cluid_resolution_wisedict
Dictionary mapping cell identifiers to their cluster IDs based on Leiden clustering resolution.
Returns
- dict
Updated unique_mapped dictionary with resolved cell type annotations for the confused and unmapped cells.
- nico_annotations.Annotations.return_singlecells(cluster_data, midzone)[source]
Finds the scRNAseq cells belonging to a specific cell type.
This helper function is used in find_all_the_spatial_cells_mapped_to_single_cells to identify the single-cell RNA sequencing (scRNAseq) cells that belong to a specified cell type (midzone).
Parameters
- cluster_datanumpy.ndarray
A 2D array where the first column contains barcode IDs and the second column contains cluster IDs.
- midzoneint or str
The cluster ID representing the specific cell type of interest.
Returns
- numpy.ndarray
An array of unique barcode IDs corresponding to the cells that belong to the specified cell type (midzone).
- nico_annotations.Annotations.save_annotations_in_spatial_object(inputdict, anndata_object_name='nico_celltype_annotation.h5ad')[source]
Save NiCo cell type cluster annotations in the AnnData object.
This function takes a dictionary containing the necessary data and saves the cell type cluster annotations into the .obs[‘nico_ct’] slot of an AnnData object. The updated AnnData object is then saved to a specified file.
Inputs:
- inputdictdict
A dictionary containing the cell type annotations related objects.
- anndata_object_namestr, optional
Name of the AnnData file to save the annotated data. Default is ‘nico_celltype_annotation.h5ad’.
Outputs:
The function saves the annotated AnnData object in the specified directory (‘./nico_out/’) with the given file name.
Helper function used in find_anchor_cells_between_ref_and_query to transform the common gene expression data into PCA space.
This function scales the data, performs PCA on single-cell data, and then projects both spatial and single-cell data into the shared PCA space. The projections are normalized by z-scores.
Parameters
- ad_sp1AnnData
AnnData object containing the spatial gene expression data.
- ad_sc1AnnData
AnnData object containing the sequencing gene expression data.
- no_of_pcint
Number of principal components to compute.
- methodstr
Method used for scaling and PCA.
Returns
- transfer_sp_comnumpy.ndarray
The spatial data projected into the shared PCA space.
- transfer_sc_comnumpy.ndarray
The single-cell data projected into the shared PCA space.
- sp_barcodenumpy.ndarray
Barcodes of the spatial data.
- sc_barcodenumpy.ndarray
Barcodes of the single-cell data.
- nico_annotations.Annotations.visualize_spatial_anchored_cell_mapped_to_scRNAseq(input, saveas='pdf', transparent_mode=False, showit=True, figsize=(12, 10))[source]
Visualizes the anchored cells mapping between spatial and sequencing modalities.
This function generates a heatmap to visualize the mapping of anchored cells between spatial Leiden clusters and scRNAseq clusters.
Parameters
- inputobject,
An object containing various attributes required for the function. Specifically, it must contain: - visualize_anchors : tuple A tuple containing the matrix of anchored cells, and the cluster names for spatial and scRNAseq data. Example: (matrix, spatial_cluster_names, scrnaseq_cluster_names) - KNN : int The number of nearest neighbors used in the mutual nearest neighbors (MNN) analysis. - output_annot : str The path where the output figure will be saved.
- saveasstr, optional
The format to save the figure, either ‘pdf’ or ‘png’ (default is ‘pdf’).
- transparent_modebool, optional
Whether the background of the figure should be transparent (default is False).
- showitbool, optional
Whether to display the figure immediately (default is True). If False, the figure is closed after saving.
- figsizetuple, optional
The size of the figure (default is (12,10)).
Outputs
The figure is saved at the location specified by “nico_out/annotations/”.
- nico_annotations.Annotations.visualize_umap_and_cell_coordinates_with_all_celltypes(output_annotation_dir=None, output_nico_dir=None, anndata_object_name='nico_celltype_annotation.h5ad', spatial_cluster_tag='nico_ct', spatial_coordinate_tag='spatial', umap_tag='X_umap', number_of_iteration_to_perform_celltype_annotations=3, cmap=<matplotlib.colors.LinearSegmentedColormap object>, saveas='pdf', transparent_mode=False, showit=True, figsize=(15, 6))[source]
Visualize UMAP and spatial coordinates with all cell types annotated in a single plot.
This function generates visualizations for UMAP projections and spatial coordinates of cells, annotated by cell types. It saves the figures to specified directories and supports customization of various visualization parameters.
Parameters:
- output_annotation_dirstr, optional
Directory to save the annotation figures. Default is ‘./nico_out/annotations/’.
- output_nico_dirstr, optional
Base directory for nico output files. Default is ‘./nico_out/’.
- anndata_object_namestr, optional
Name of the AnnData object file containing cell type annotations. Default is ‘nico_celltype_annotation.h5ad’.
- spatial_cluster_tagstr, optional
Key in AnnData object for spatial cluster annotations slot. Default is ‘nico_ct’.
- spatial_coordinate_tagstr, optional
Key in AnnData object for spatial coordinates slot. Default is ‘spatial’.
- umap_tagstr, optional
Key in AnnData object for UMAP embeddings slot. Default is ‘X_umap’.
- number_of_iteration_to_perform_celltype_annotationsint, optional
Number of iterations performed for cell type annotations. Default is 3.
- cmapmatplotlib.colors.Colormap, optional
Colormap used to color the cell types. Default is ‘jet’.
- saveasstr, optional
Format to save the figures (‘pdf’ or ‘png’). Default is ‘pdf’.
- transparent_modebool, optional
If True, sets the background color of the figures to transparent. Default is False.
- showitbool, optional
If True, displays the figures. Default is True.
- figsizetuple, optional
Dimensions of the figure size. Default is (15, 6).
Outputs:
Saves annotation figures to the following path ‘./nico_out/annotations/’
- nico_annotations.Annotations.visualize_umap_and_cell_coordinates_with_selected_celltypes(output_annotation_dir=None, output_nico_dir=None, anndata_object_name='nico_celltype_annotation.h5ad', spatial_cluster_tag='nico_ct', spatial_coordinate_tag='spatial', umap_tag='X_umap', number_of_iteration_to_perform_celltype_annotations=3, choose_celltypes=[], msna=0.1, ms=0.5, showit=True, cmap=<matplotlib.colors.LinearSegmentedColormap object>, saveas='pdf', transparent_mode=False, figsize=(8, 3.5))[source]
Visualize UMAP and cell coordinates with selected cell types.
This function visualizes the UMAP embedding and cell coordinates for selected cell types from spatial transcriptomics data.
Inputs:
- output_annotation_dirstr, optional
Directory path to save the annotation figures. Default is None, which uses ‘./nico_out/annotations/’.
- output_nico_dirstr, optional
Directory path for NiCo outputs. Default is None, which uses ‘./nico_out/’.
- anndata_object_namestr, optional
Name of the AnnData file containing cell type annotations. Default is ‘nico_celltype_annotation.h5ad’.
- spatial_cluster_tagstr, optional
Slot for spatial cluster annotations in the AnnData object. Default is ‘nico_ct’.
- spatial_coordinate_tagstr, optional
Slot for spatial coordinates in the AnnData object. Default is ‘spatial’.
- umap_tagstr, optional
Slot for UMAP embeddings in the AnnData object. Default is ‘X_umap’.
- number_of_iteration_to_perform_celltype_annotationsint, optional
Number of iterations for performing cell type annotations. Default is 3.
- choose_celltypeslist, optional
List of cell types to visualize. Default is an empty list, which shows annotations for all cell types.
- msnafloat, optional
Marker size for non-selected (NA) cell types. Default is 0.1.
- msfloat, optional
Marker size for selected cell types. Default is 0.5.
- showitbool, optional
Whether to display the figures. Default is True.
- cmapColormap, optional
Colormap used to color the cell types. Default is plt.cm.get_cmap(‘jet’).
- saveasstr, optional
Format to save the figures (‘pdf’ or ‘png’). Default is ‘pdf’.
- transparent_modebool, optional
Whether to use a transparent background for the figures. Default is False.
- figsizetuple, optional
Dimension of the figure size. Default is (8, 3.5).
Outputs:
The function saves individual cell type annotation figures in the specified directory.
Notes:
Ensure that the input AnnData object contains the required tags for UMAP, spatial coordinates, and cell type annotations.
The function will save the annotation figures in the specified directory.
- nico_annotations.Annotations.write_annotation(deg_annot_cluster_fname, deg_annot_ct_fname, unique_mapped, cellname)[source]
Generates CSV files for each iteration’s annotation clusters and cell type names.
This helper function is used in nico_based_annotation to create two CSV files: one for the cluster annotation and one for the cell type names with their frequencies.
Parameters
- deg_annot_cluster_fnamestr
The filename for the CSV file that will contain the cluster annotations.
- deg_annot_ct_fnamestr
The filename for the CSV file that will contain the cell type names and their frequencies.
- unique_mappeddict
A dictionary where keys are cell barcodes and values are their corresponding cell type names.
- cellnamenumpy.ndarray
An array of cell barcode IDs.
Returns
- numpy.ndarray
An array of cell type names corresponding to each cell barcode ID in the cellname array.
Module 2: nico_interactions
- nico_interactions.Interactions.create_directory(outputFolder)[source]
Create an empty directory.
This function checks if a specified directory exists, and if not, it creates the directory.
Parameters
- outputFolderstr
The path of the directory to be created.
Raises
- OSError
If the directory cannot be created due to permission issues or other OS-related errors.
Notes
If the directory already exists, no action is taken.
This function ensures that the directory path is available for subsequent file operations.
Example
>>> create_directory('./new_out/')
- nico_interactions.Interactions.create_spatial_CT_feature_matrix(radius, PP, louvain, noct, fraction_CT, saveSpatial, epsilonThreshold)[source]
Generate the expected spatial cell type neighborhood matrix.
This helper function is used in spatial_neighborhood_analysis to create a matrix that represents the expected neighborhood cell type composition based on spatial data. It uses either a radius-based approach or Delaunay triangulation to determine neighboring cells.
Parameters
- radiusfloat
Radius within which to find neighbors. If set to 0, Delaunay triangulation is used instead.
- PPnp.ndarray
Array of spatial coordinates of cells. Shape (n_cells, n_dimensions).
- louvainnp.ndarray
Array containing Louvain clustering results for each cell. Shape (n_cells, 1).
- noctint
Number of cell types.
- fraction_CTlist of float
List representing the fraction of each cell type.
- saveSpatialstr
Path to save the output file containing the normalized spatial neighborhood matrix.
- epsilonThresholdfloat
Threshold distance cutoff to limit the distant neighbors when using Delaunay triangulation.
Returns
- tuple
- Mint
Placeholder for future calculations (currently always returns 0).
- neighborslist of list of int
List of neighbors for each cell. Each sublist contains indices of neighboring cells.
- distancelist of list of float
List of distances to neighbors for each cell. Each sublist contains distances to the neighboring cells.
Notes
If radius is set to 0, Delaunay triangulation is used to find the neighbors within the epsilonThreshold distance.
The function saves the normalized spatial neighborhood matrix as a .npz file at the specified location.
- nico_interactions.Interactions.euclidean_dist(p1, p2)[source]
Calculate euclidean distance between two points in 2d/3d.
- nico_interactions.Interactions.findNeighbors_in_given_radius(location, radius)[source]
Find the neighbors for each cell using the given radius.
This helper function used in
create_spatial_CT_feature_matrixidentifies the neighboring cells for each cell within the specified radius and computes the average distance to these neighbors.Parameters:
- locationnp.ndarray
An array of shape (n, 3) representing the coordinates of the cells.
- radiusfloat
The radius within which to search for neighboring cells. For immediate neighbors it is 0
Returns:
- list
A list of lists where each sublist contains the indices of the neighbors for each cell.
- nico_interactions.Interactions.find_interacting_cell_types(input, choose_celltypes=[], celltype_niche_interaction_cutoff=0.1, dpi=300, coeff_cutoff=20, saveas='pdf', transparent_mode=False, showit=True, figsize=(4.0, 2.0))[source]
Display regression coefficients indicating cell type interactions.
Parameters:
- inputobject
The main input is the output from spatial_neighborhood_analysis.
- choose_celltypeslist, optional
List of cell types to display the regression coefficients for. If empty, the output will be shown for all cell types. Default is [].
- celltype_niche_interaction_cutofffloat, optional
The cutoff value to consider for cell type niche interactions for normalized coefficients. This is visualized by blue dotted line. Default is 0.1.
- coeff_cutoffint, optional
Maximum number of neighborhood cell types shown on the X-axis of the figures for each central cell type. If there are too many interacting cell types, choosing a more stringet cutoff limits the display to the cell types with the largest positive or negative regression coefficients to avoid crowding in the figure. Default is 20.
- saveasstr, optional
Format to save the figures in, either ‘pdf’ or ‘png’. Default is ‘pdf’.
- transparent_modebool, optional
Background color of the figures. Default is False.
- showitbool, optional
Whether to display the figures. Default is True.
- figsizetuple, optional
Dimension of the figure size. Default is (4.0, 2.0).
Outputs:
The figures are saved in ./nico_out/niche_prediction_linear/TopCoeff_R0/*
Notes:
The function normalizes the coefficients by dividing by maximum and then it visualizes by blue dotted line.
- nico_interactions.Interactions.find_neighbors(pindex, triang)[source]
Find the neighbors for a given point index using Delaunay triangulation.
This helper function used in
`create_spatial_CT_feature_matrix`identifies the neighboring points (cells) for a given point index using the Delaunay triangulation.Parameters:
- pindexint
The index of the point (cell) for which neighbors are to be found.
- triangscipy.spatial.Delaunay
The Delaunay triangulation of the point set.
Returns:
- np.ndarray
An array of indices representing the neighbors of the given point.
- nico_interactions.Interactions.model_log_regression(K_fold, n_repeats, neighborhoodClass, target, lambda_c, strategy, BothLinearAndCrossTerms, seed, n_jobs)[source]
Perform logistic regression classification to learn the probabilities of each cell type class. This helper function used in spatial_neighborhood_analysis.
Parameters:
- K_foldint
Number of folds for cross-validation.
- n_repeatsint
Number of times the cross-validation is repeated.
- neighborhoodClassnumpy.ndarray
Matrix of neighborhood class features.
- targetnumpy.ndarray
Target labels (cell types).
- lambda_clist or numpy.ndarray
Regularization strength(s) to be tested in the logistic regression.
- strategystr
The regularization and multi-class strategy. Options include ‘L1_multi’, ‘L1_ovr’, ‘L2_multi’, ‘L2_ovr’, ‘elasticnet_multi’, ‘elasticnet_ovr’.
- BothLinearAndCrossTermsint
Degree of polynomial features including interaction terms only.
- seedint
Random seed for reproducibility.
- n_jobsint
Number of jobs to run in parallel.
Returns:
- log_reg_modelsklearn.linear_model.LogisticRegression
The logistic regression model with specified parameters.
- parametersdict
Dictionary of parameters used for model training.
- hyperparameter_scoringdict
Dictionary of scoring metrics used for hyperparameter tuning.
Notes:
The function uses polynomial features to create interaction terms based on the specified degree.
Hyperparameter tuning is performed using cross-validation with f1_weighted scoring metrics.
- nico_interactions.Interactions.plot_coefficient_matrix(input, saveas='pdf', showit=True, transparent_mode=False, dpi=300, figsize=(5, 8))[source]
Generate and save a coefficient matrix plot from the results of spatial_neighborhood_analysis.
Parameters:
- inputdict, or similar object
The main input is the output from spatial_neighborhood_analysis.
- saveasstr, optional, default=’pdf’
Format to save the figure. Options are ‘pdf’ or ‘png’. If ‘png’, the dpi is set to 300.
- showitbool, optional, default=True
Whether to display the plot after saving. If False, the plot will be closed after saving.
- transparent_modebool, optional, default=False
Whether to save the figure with a transparent background.
- figsizetuple of float, optional, default=(5, 8)
Size of the figure in inches.
Outputs:
The function saves the coefficient matrix plot in the directory specified by ./nico_out/niche_prediction_linear/. The filename will be in the format “weight_matrix_R<Radius>.<saveas>”, where <Radius> is the radius value from the input and <saveas> is the file format.
- nico_interactions.Interactions.plot_confusion_matrix(input, saveas='pdf', showit=True, transparent_mode=False, dpi=300, figsize=(5.5, 5))[source]
Generate and save a confusion matrix plot from the results of spatial_neighborhood_analysis.
Parameters:
- inputdict, or similar object
The main input is the output from spatial_neighborhood_analysis.
- saveasstr, optional, default=’pdf’
Format to save the figure. Options are ‘pdf’ or ‘png’. If ‘png’, the dpi is set to 300.
- showitbool, optional, default=True
Whether to display the plot after saving. If False, the plot will be closed after saving.
- transparent_modebool, optional, default=False
Whether to save the figure with a transparent background.
- figsizetuple of float, optional, default=(5.5, 5)
Size of the figure in inches.
Outputs:
The function saves the confusion matrix plot in the directory specified by nico_out/niche_prediction_linear/. The filename will be in the format ‘Confusing_matrix_R<Radius>.<saveas>’, where <Radius> is the radius value from the input and <saveas> is the file format.
Notes:
The function loads data from a numpy file specified by input.fout, which should contain the confusion matrix and related data.
The confusion matrix is plotted using seaborn’s heatmap function with annotations.
The plot is saved in the specified format and directory, and optionally displayed based on the showit parameter.
- nico_interactions.Interactions.plot_evaluation_scores(input, saveas='pdf', transparent_mode=False, showit=True, dpi=300, figsize=(4, 3))[source]
This function generates and saves plots of evaluation scores obtained from the spatial_neighborhood_analysis. The plots can be saved in PDF or PNG format and can be displayed during execution.
Parameters
- inputdict or similar
The main input is the output from spatial_neighborhood_analysis. This should contain the evaluation scores to be plotted.
- saveasstr, optional
Format to save the figures. Options are ‘pdf’ or ‘png’. Default is ‘pdf’.
- transparent_modebool, optional
If True, the background color of the figures will be transparent. Default is False.
- showitbool, optional
If True, the figures will be displayed when the function is called. Default is True.
- figsizetuple, optional
Dimensions of the figure size in inches (width, height). Default is (4, 3).
Outputs
- None
The function saves the generated figures in the directory “./nico_out/niche_prediction_linear/” with filenames starting with “scores”.
Notes
The order of scores saved in input.score as follows:
accuracy
macro F1
macro precision
macro recall
micro F1
micro precision
micro recall
weighted F1
weighted precision
weighted recall
Cohen Kappa
cross entropy
mathhew correlation coefficient
heming loss
zeros one loss
- nico_interactions.Interactions.plot_multiclass_roc(clf, X_test, y_test, n_classes)[source]
Compute the ROC (Receiver Operating Characteristic) curve for each cell type prediction and evaluate its performance on the test dataset.
Parameters:
- clfclassifier object
The classifier used for making predictions. It should have a decision_function method.
- X_testarray-like of shape (n_samples, n_features)
Test feature set.
- y_testarray-like of shape (n_samples,)
True labels for the test set.
- n_classesint
Number of unique classes (cell types) in the dataset.
Returns:
- fprdict
A dictionary where the keys are class indices and the values are arrays of false positive rates.
- tprdict
A dictionary where the keys are class indices and the values are arrays of true positive rates.
- roc_aucdict
A dictionary where the keys are class indices and the values are the area under the ROC curve (AUC) scores.
Notes:
This function uses the decision_function method of the classifier to get the confidence scores for each class.
The true labels y_test are converted into a binary format using one-hot encoding.
The ROC curve is computed for each class and the AUC score is calculated for each ROC curve.
- nico_interactions.Interactions.plot_niche_interactions_with_edge_weight(input, niche_cutoff=0.1, saveas='pdf', transparent_mode=False, showit=True, figsize=(10, 7), dpi=300, input_colormap='jet', with_labels=True, node_size=300, linewidths=0.5, node_font_size=8, alpha=0.5, font_weight='normal', edge_label_pos=0.35, edge_font_size=3)[source]
Plot niche interactions map with edge weights.
This function generates and saves a directed graph that represents niche interactions map based on the output of spatial_neighborhood_analysis. The nodes represent cell types, and the edges (with weights) indicate the strength of interactions between central cell typ and niche cell types. The plot can be saved in PDF or PNG format and can be displayed during execution.
Parameters
- inputdict or similar
The main input is the output from spatial_neighborhood_analysis. This should contain the necessary data to plot the niche interactions.
- niche_cutofffloat, optional
Threshold for including interactions in the graph. Higher values result in fewer connections, while lower values include more connections. Default is 0.1.
- saveasstr, optional
The format for saving the figures. Options are ‘pdf’ or ‘png’. Default is ‘pdf’.
- transparent_modebool, optional
If True, saves the figure with a transparent background. Default is False.
- showitbool, optional
If True, displays the plot after generating. Default is True.
- figsizetuple, optional
Size of the figure in inches (width, height). Default is (10, 7).
- dpiint, optional
Resolution in dots per inch for saving the figure. Default is 300.
- input_colormapstr, optional
Color map for node colors, based on matplotlib colormaps. Default is ‘jet’. For details see documentation https://matplotlib.org/stable/gallery/color/colormap_reference.html
- with_labelsbool, optional
If True, displays cell type labels on the nodes. Default is True.
- node_sizeint, optional
Size of the nodes. Default is 300.
- linewidthsint, optional
Width of the node border lines. Default is 0.5.
- node_font_sizeint, optional
Font size for node labels. Default is 8.
- alphafloat, optional
Opacity level for nodes and edges. Default is 0.5.
- font_weightstr, optional
Weight of the font for node labels. Options are ‘normal’ or ‘bold’. Default is ‘normal’.
- edge_label_posfloat, optional
Position of edge labels along the edges. Default is 0.35.
- edge_font_sizeint, optional
Font size for edge labels. Default is 3.
Outputs
- None
The function saves the generated figures in the directory “./nico_out/niche_prediction_linear/” with filenames starting with “Niche_interactions_*”.
- nico_interactions.Interactions.plot_niche_interactions_without_edge_weight(input, niche_cutoff=0.1, saveas='pdf', transparent_mode=False, showit=True, figsize=(10, 7), dpi=300, input_colormap='jet', with_labels=True, node_size=300, linewidths=0.5, node_font_size=8, alpha=0.5, font_weight='normal')[source]
Plot niche interactions map without edge weights.
This function generates and saves a niche interactions map using data from the output of spatial_neighborhood_analysis. The graph illustrates connections between cell types based on their niche interactions, without weighting the edges by interaction strength. The output plot can be saved in either PDF or PNG format, and optionally displayed after generation.
Parameters
- inputdict or similar
The main input containing data from spatial_neighborhood_analysis. This should include information on cell types and interaction strengths needed to plot niche cell type interactions.
- niche_cutofffloat, optional
Threshold for plotting connections in the niche interactions map. Only connections with normalized interaction
strengths above this cutoff are displayed. Higher values reduce connections, while lower values increase them. Default is 0.1.
- saveasstr, optional
Format to save the figures. Options are ‘pdf’ or ‘png’. Default is ‘pdf’.
- transparent_modebool, optional
If True, the plot background will be transparent. Default is False.
- showitbool, optional
If True, the figures will be displayed when the function is called. Default is True.
- figsizetuple, optional
Dimensions of the plot in inches (width, height). Default is (10, 7).
- dpiint, optional
Resolution in dots per inch for saving the figure. Default is 300.
- input_colormapstr, optional
Color map for node colors, based on matplotlib colormaps. Default is ‘jet’. For details see documentation https://matplotlib.org/stable/gallery/color/colormap_reference.html
- with_labelsbool, optional
If True, displays cell type labels on the nodes. Default is True.
- node_sizeint, optional
Size of the nodes. Default is 300.
- linewidthsint, optional
Width of the node border lines. Default is 0.5.
- node_font_sizeint, optional
Font size for node labels. Default is 8.
- alphafloat, optional
Opacity level for nodes and edges. Default is 0.5.
- font_weightstr, optional
Weight of the font for node labels. Options are ‘normal’ or ‘bold’. Default is ‘normal’.
Outputs
- None
The function saves the generated figures in the directory “./nico_out/niche_prediction_linear/” with filenames starting with “Niche_interactions_*”.
- nico_interactions.Interactions.plot_predicted_probabilities(input, saveas='pdf', showit=True, transparent_mode=False, dpi=300, figsize=(12, 6))[source]
Generate and save a plot of predicted probabilities from the results of spatial_neighborhood_analysis.
Parameters:
- inputdict, or similar object
The main input is the output from spatial_neighborhood_analysis.
- saveasstr, optional, default=’pdf’
Format to save the figure. Options are ‘pdf’ or ‘png’. If ‘png’, the dpi is set to 300.
- showitbool, optional, default=True
Whether to display the plot after saving. If False, the plot will be closed after saving.
- transparent_modebool, optional, default=False
Whether to save the figure with a transparent background.
- figsizetuple of float, optional, default=(12, 6)
Size of the figure in inches.
Outputs:
The function saves the plot of predicted probabilities in the directory specified by ./nico_out/niche_prediction_linear/. The filename will be in the format ‘predicted_probability_R<Radius>.<saveas>’, where <Radius> is the radius value from the input and <saveas> is the file format.
- nico_interactions.Interactions.plot_roc_results(input, nrows=4, ncols=4, saveas='pdf', showit=True, transparent_mode=False, dpi=300, figsize=(10, 7))[source]
Generate and save ROC curves for the top 16 cell type predictions from the results of spatial_neighborhood_analysis.
Parameters:
- inputdict, or similar object
The main input is the output from spatial_neighborhood_analysis.
- nrowsint, optional, default=4
Number of rows in the subplot grid.
- ncolsint, optional, default=4
Number of columns in the subplot grid.
- saveasstr, optional, default=’pdf’
Format to save the figure. Options are ‘pdf’ or ‘png’. If ‘png’, the dpi is set to 300.
- showitbool, optional, default=True
Whether to display the plot after saving. If False, the plot will be closed after saving.
- transparent_modebool, optional, default=False
Whether to save the figure with a transparent background.
- figsizetuple of float, optional, default=(10, 7)
Size of the figure in inches.
Outputs:
The function saves the ROC curves plot in the directory specified by ./nico_out/niche_prediction_linear. The filename will be in the format ‘ROC_R<Radius>.<saveas>’, where <Radius> is the radius value from the input and <saveas> is the file format.
Notes:
The function creates a grid of ROC curves for the top 16 cell types with the highest ROC AUC values.
- nico_interactions.Interactions.read_processed_data(radius, inputdir)[source]
Read and process the neighborhood expected feature matrix for spatial_neighborhood_analysis.
Parameters:
- radiusint or float
The radius value used in the spatial analysis.
- inputdirstr
The directory containing the input data file.
Returns:
- neighborhoodClassnumpy.ndarray
The matrix of neighborhood class features.
- targetnumpy.ndarray
The target labels (cell types).
- inputFeaturesrange
A range object representing the indices of the input features.
Notes:
The function reads a compressed .npz file containing the neighborhood expected feature matrix.
It filters out rows with NaN values.
It calculates the proportion of each cell type in the dataset.
The function returns the processed neighborhood class features, target labels, and input feature indices.
- nico_interactions.Interactions.reading_data(coordinates, louvainFull, degbased_ctname, saveSpatial, removed_CTs_before_finding_CT_CT_interactions)[source]
Helper function used in spatial_neighborhood_analysis to read the cell coordinate file, cluster file, and cluster name file according to the input cell type list provided for the prediction.
Parameters:
- coordinatesstr
Path to the file containing cell coordinates.
- louvainFullstr
Path to the file containing the full louvain clustering information.
- degbased_ctnamelist of tuples
A list where each element is a tuple containing the cell type ID and the cell type name.
- saveSpatialstr
Path where the spatial analysis results should be saved.
- removed_CTs_before_finding_CT_CT_interactionslist of str
A list of cell type names that should be excluded from the analysis.
Returns:
- CTnamelist of str
A list of cell type names that are included in the analysis after filtering out the removed cell types.
- CTidlist of int
A list of cell type IDs corresponding to the filtered cell type names.
Notes:
This function assumes that degbased_ctname is a list of tuples where the first element is an integer representing the cell type ID and the second element is a string representing the cell type name.
The function filters out the cell types listed in removed_CTs_before_finding_CT_CT_interactions from the degbased_ctname list and returns the remaining cell type names and IDs.
- nico_interactions.Interactions.remove_extra_character_from_name(name)[source]
Remove special characters from cell type names to avoid errors while saving figures.
This function replaces certain special characters in the input name with underscores or other appropriate characters to ensure the name is safe for use as a filename.
Parameters
- namestr
The original cell type name that may contain special characters.
Returns
- str
The modified cell type name with special characters removed or replaced.
Example
>>> name = 'T-cell (CD4+)/CD8+' >>> clean_name = remove_extra_character_from_name(name) >>> print(clean_name) 'T-cell_CD4p_CD8p'
Notes
The following replacements are made:
‘/’ is replaced with ‘_’
‘ ‘ (space) is replaced with ‘_’
‘”’ (double quote) is removed
“’” (single quote) is removed
‘)’ is removed
‘(’ is removed
‘+’ is replaced with ‘p’
‘-’ is replaced with ‘n’
‘.’ (dot) is removed
These substitutions help in creating filenames that do not contain characters that might be problematic for file systems or software.
- nico_interactions.Interactions.spatial_neighborhood_analysis(output_nico_dir=None, anndata_object_name='nico_celltype_annotation.h5ad', spatial_cluster_tag='nico_ct', spatial_coordinate_tag='spatial', Radius=0, n_repeats=1, K_fold=5, seed=36851234, n_jobs=-1, lambda_c_ranges=[0.000244140625, 0.00048828125, 0.0009765625, 0.001953125, 0.00390625, 0.0078125, 0.015625, 0.03125, 0.0625, 0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0, 1024.0, 2048.0], epsilonThreshold=100, removed_CTs_before_finding_CT_CT_interactions=[])[source]
Perform spatial neighborhood analysis to reconstruct the niche interaction patterns.
This is the primary function called by the user to perform spatial neighborhood analysis, i.e., reconstruction of the niche.
Prerequisites: Before calling this function, the user must have an annotation of the spatial cell from any method. This annotation is expected to comprise two files: clusterFilename that contains cells and cluster-ID information, and celltypeFilename that contains cluster-ID and cell type name information.
Inputs:
- output_nico_dirstr, optional
Directory to save the output of niche interaction prediction. Default is ‘./nico_out/’.
- anndata_object_namestr, optional
Name of the AnnData object file containing cell type annotations. Default is ‘nico_celltype_annotation.h5ad’.
- spatial_cluster_tagstr, optional
Slot for spatial cluster information. Default is ‘nico_ct’ that means it is stored in anndata.obs[‘nico_ct’] slot.
- spatial_coordinate_tagstr, optional
Slot for spatial coordinate information. Default is ‘spatial’ that means it is stored in anndata.obsm[‘spatial’] slot.
- Radiusint, optional
Niche radius to predict the cell type-cell type interactions. Radius 0 focuses on direct spatial neighbors inferred by Delaunay triangulation, and nonzero Radius extends the neighborhood to include all cells within a given radius for predicting niche interactions. Default is 0.
- n_repeatsint, optional
Number of times to repeat the logistic regression after finding the hyperparameters. Default is 1.
- K_foldint, optional
Number of cross-folds for the logistic regression. Default is 5.
- seedint, optional
Random seed used in RepeatedStratifiedKFold. Default is 36851234.
- n_jobsint, optional
Number of processors to use. See https://scikit-learn.org/stable/glossary.html#term-n_jobs for details. Default is -1.
- lambda_c_rangeslist, optional
The initial range of the inverse regularization parameter used in the logistic regression to find the optimal parameter. Default is list(np.power(2.0, np.arange(-12, 12))).
- epsilonThresholdint, optional
Threshold value for neighboring cell during Delaunay Triangulation. This means those cells which are large then this cutoff cannot become neighbor at any cost. Default is 100.
- removed_CTs_before_finding_CT_CT_interactionslist, optional
Exclude cell types from the niche interactions analysis. Default is [].
Outputs:
The function saves the output of niche interaction prediction in the specified “nico_out” directory.
Notes:
Before running this function, ensure you have the cell type annotation files in the anndata object slot.
If running for multiple radius parameters, it’s good practice to change the output directory name or delete the previously created one.
If the average number of neighbors is relatively low (<1), consider increasing the radius for neighborhood analysis.
Every input CSV file (positionFilename, clusterFilename, celltypeFilename) must contain header information.
Module 3: nico_covariations
- nico_covariations.Covariations.alignment_score(H, spH, ind_H, ind_spH)[source]
Calculate the alignment score between factors from two different modalities during integrated NMF.
This helper function is used in find_PC_of_invidualCluster_in_SC to evaluate the alignment score between factors from scRNAseq data and spatial data.
Parameters
- Hnumpy.ndarray
The matrix representing the factors from the scRNAseq data. Each row corresponds to a sample, and each column corresponds to a factor.
- spHnumpy.ndarray
The matrix representing the factors from the spatial data. Each row corresponds to a sample, and each column corresponds to a factor.
- ind_Hnumpy.ndarray or list
Indices of the common genes in the scRNAseq data.
- ind_spHnumpy.ndarray or list
Indices of the common genes in the spatial data.
Returns
- float
The alignment score between the factors from the scRNAseq and spatial data.
Notes
The alignment score is calculated by computing the cosine similarity between the factors of the common genes in the scRNAseq and spatial data. A higher score indicates better alignment between the factors from the two modalities.
- nico_covariations.Covariations.compute_PC_space(input, sct_ad_sc_full)[source]
Helper function in gene_covariation_analysis to find the weighted neighborhood average of cell types from the spatial factors.
This function computes the weighted neighborhood average of principal components (PCs) for each cell type from spatial transcriptomics data. The weights are based on the inverse of the distances between neighboring cells.
Parameters
- inputobject
An object containing the following attributes:
spatialcell_unique_clusterid: list of unique spatial cell cluster IDs.
neighbors: list of neighbors for each cell.
neigh_distances: list of distances to neighbors for each cell.
annotation_spatial_barcode_id: list of spatial barcode IDs for each cell.
annotation_spatial_cluster_id: list of spatial cluster IDs for each cell.
pc_of_sp_clusterid: matrix of principal components for each spatial cluster ID.
no_of_pc: int, number of principal components.
outputname: str, the name of the output file to save the results.
Returns
- None
This function saves the weighted neighborhood of factors in a niche to a .npz file specified by input.outputname.
Notes
This function calculates the weighted average of the principal components (PCs) for each cell’s neighborhood, using the inverse of the distances to its neighbors as weights.
The result is a matrix where each row represents a cell, and each column represents the weighted average PC values for each cluster in the cell’s neighborhood.
The weighted neighborhood feature matrix is saved to a file in .npz format.
- nico_covariations.Covariations.create_directory(outputFolder)[source]
Create an empty directory.
This function checks if a specified directory exists, and if not, it creates the directory.
Parameters
- outputFolderstr
The path of the directory to be created.
Raises
- OSError
If the directory cannot be created due to permission issues or other OS-related errors.
Notes
If the directory already exists, no action is taken.
This function ensures that the directory path is available for subsequent file operations.
Example
>>> create_directory('./new_out/')
- nico_covariations.Covariations.create_subtitle(fig: Figure, grid: SubplotSpec, title: str)[source]
Add a title to a specific set of subplots within a figure.
This helper function is used to create a title for a subset of plots within a Matplotlib figure. The title is added with a specific formatting and the subplot is hidden from view (no axes or frames).
Parameters:
- figmatplotlib.figure.Figure
The figure object to which the subplot belongs.
- gridmatplotlib.gridspec.SubplotSpec
The subplot specification that defines the location and size of the subplot within the figure.
- titlestr
The title text to be displayed above the subplot.
- nico_covariations.Covariations.extract_and_plot_top_genes_from_chosen_factor_in_celltype(input, choose_celltype, choose_factor_id, top_NOG=30, rps_rpl_mt_genes_included=True, correlation_with_spearman=True, positively_correlated=True, saveas='pdf', cmap='RdBu_r', transparent_mode=False, showit=True, dpi=300, figsize=(5, 6))[source]
Extract and plot top genes associated with a chosen factor in a specified cell type.
This function uses the output from gene_covariation_analysis to identify and visualize the top genes associated with a chosen factor in a specified cell type. The genes can be filtered and visualized based on their correlation with the factor, with options to include or exclude specific gene types.
Parameters
- inputobject
The main input is the output from gene_covariation_analysis.
- choose_celltypestr
Define the cell type to include in the analysis.
- choose_factor_idint
Define the factor ID of the cell type to be analyzed.
- top_NOGint, optional
Number of top genes to visualize. (default is 30)
- rps_rpl_mt_genes_includedbool, optional
Decide whether to include rps, rpl, and mt genes in the pathway analysis. If True, they are included. (default is True)
- correlation_with_spearmanbool, optional
If True, visualize gene-factor association using the Spearman correlation coefficient; otherwise, use cosine similarity. (default is True)
- positively_correlatedbool, optional
If the gene-factor association is selected as Spearman correlation, choose whether the associated genes should be positively correlated (True) or negatively correlated (False). (default is True)
- saveasstr, optional
Save the figures in PDF or PNG format (dpi for PNG format is 300). (default is ‘pdf’)
- cmapstr, optional
Define the colormap for visualizing factors. (default is ‘RdBu_r’)
- transparent_modebool, optional
Define the background color of the figures. If True, figures have a transparent background. (default is False)
- showitbool, optional
If True, the generated figures will be displayed. (default is True)
- figsizetuple, optional
Dimension of the figure size. (default is (5, 6))
Outputs
- pd.DataFrame
Returns a DataFrame containing the gene, factor, average expression, and proportion of the population expressing that gene.
Notes
The function saves the figures in the directory “nico_out/covariations_R*_F*/dotplots/Factors*”.
The DataFrame returned includes detailed information about the top genes associated with the chosen factor.
Example
>>> extract_and_plot_top_genes_from_chosen_factor_in_celltype(input_data, 'CellTypeA', 1, top_NOG=50, saveas='png', figsize=(10, 8))
- nico_covariations.Covariations.findXYZC(c, s)[source]
Helper function used in plot_top_selected_genes_as_dotplot.
This function extracts and transforms data from the given matrices c and s, creating four lists: x-coordinates, y-coordinates, values (z), and sizes (bigs).
Parameters
- carray-like
A 2D array (matrix) where each element represents a value at a specific (i, j) coordinate.
- sarray-like
A 2D array (matrix) of the same shape as c, where each element represents a size multiplier for the corresponding element in c.
Returns
- xlist
List of x-coordinates for each element in c.
- ylist
List of y-coordinates for each element in c.
- zlist
List of values from c corresponding to each (x, y) coordinate.
- bigslist
List of sizes, where each size is calculated as 100 times the corresponding element in s.
- nico_covariations.Covariations.find_LR_interactions_in_interacting_cell_types(input, choose_interacting_celltype_pair=[], choose_factors_id=[], pvalueCutoff=0.05, dpi=300, correlation_with_spearman=True, LR_plot_NMF_Fa_thres=0.2, LR_plot_Exp_thres=0.2, saveas='pdf', transparent_mode=False, showit=True, figsize=(12, 10))[source]
Find ligand-receptor (LR) interactions in interacting cell types and visualize them.
This function processes the output from gene_covariation_analysis to identify significant LR interactions between specified cell type pairs and visualizes the results.
Parameters
- inputobject
The main input is the output from gene_covariation_analysis.
- choose_interacting_celltype_pairlist, optional
Define the cell type pairs for which information on LR communication should be returned. The first element of the list is the central cell type (CC), and the second element is the niche cell type (NC). If the list is empty, LR interactions will be returned for all significant interacting cell types. Default is [].
- choose_factors_idlist, optional
Define factor IDs for which LR interactions are visualized. The first element of the list is the factor ID of the central cell type, and the second element is the factor ID of the niche cell type. If the list is empty, LR plots will be saved for all significant niche cell type factor interactions. Default is [].
- pvalueCutofffloat, optional
The p-value cutoff used to find the significant central cell type factor and niche cell type factor interactions. Default is 0.05.
- correlation_with_spearmanbool, optional
If True, compute gene-factor correlation as Spearman correlation coefficient; otherwise, compute as cosine similarity. Default is True.
- LR_plot_NMF_Fa_thresfloat, optional
Only ligands or receptors that exhibit a correlation to the respective factors higher than this cutoff are retained. Default is 0.2.
- LR_plot_Exp_thresfloat, optional
Only ligands or receptors that are expressed in a fraction of cells of the respective cell types exceeding this cutoff are retained. Default is 0.2.
- saveasstr, optional
Save the figures in PDF or PNG format (dpi for PNG format is 300). Default is ‘pdf’.
- transparent_modebool, optional
Background color in the figures. Default is False.
- showitbool, optional
If True, the figures are shown interactively. Default is True.
- figsizetuple, optional
Dimension of the figure size. The figure size on the X-axis direction is the (number of genes) multiplied by factor 12/34. The figure size on the Y-axis direction is the (number of genes) multiplied by factor 10/44. All generated figure size are scaled according to the above factors. Initital figure size is (12, 10).
Outputs
The LR interaction figures are saved in “./nico_out/covariations_R*_F*/Plot_ligand_receptor_in_niche*”.
Notes
Our analysis accounts for bidirectional cellular crosstalk interactions of ligands and receptors in cell types A and B.
The ligand can be expressed on cell type A and signal to the receptor detected on cell type B, or vice versa.
Both ligand-receptor plots and Excel sheets profile bidirectional cellular crosstalk of ligand and receptor in cell types A and B.
- nico_covariations.Covariations.find_PC_of_invidualCluster_in_SC(seed, spatial_integration_modality, scbarcode, iNMFmode, scadata, no_of_pc, spbarcode, spadata, sct_ad_sc_full, celltype_name, cutoff_to_count_exp_cell_population)[source]
Helper function used in compute_PC_space to find principal components (PCs) for individual clusters in single-cell RNA sequencing (scRNA-seq) data and spatial transcriptomics data.
This function integrates scRNA-seq and spatial transcriptomics data using non-negative matrix factorization (NMF) or integrative NMF (iNMF), and computes the alignment score, correlation, and other metrics for the identified principal components.
Parameters
- seedint
Random seed for reproducibility.
- spatial_integration_modalitystr
Modality for spatial integration, either ‘single’ or ‘double’.
- scbarcodelist
List of single-cell barcodes.
- iNMFmodebool
Flag indicating whether to use iNMF (True) or not (False).
- scadataAnnData
Single-cell RNA-seq data in AnnData format.
- no_of_pcint
Number of principal components to compute.
- spbarcodelist
List of spatial transcriptomics barcodes.
- spadataAnnData
Spatial transcriptomics data in AnnData format.
- sct_ad_sc_fullAnnData
Full single-cell RNA-seq data in AnnData format.
- celltype_namestr
Name of the cell type being analyzed.
- cutoff_to_count_exp_cell_populationfloat
Expression cutoff to count the proportion of cell population expressing a gene.
Returns
- transfer_sp_comndarray
Transformed spatial component matrix.
- transfer_sc_comlist
Transformed single-cell component matrix (currently not populated).
- sc_spearmanndarray
Spearman correlation between genes and principal components in single-cell data.
- sc_cosinendarray
Cosine similarity between genes and principal components in single-cell data.
- sc_genenamesndarray
Array of gene names.
- Hndarray
Principal component matrix for single-cell data.
- spHndarray
Principal component matrix for spatial data.
- sc_cluster_mean_expndarray
Mean expression of genes across single-cell clusters.
- sc_cluster_exp_more_than_thresholdndarray
Proportion of single-cell clusters expressing genes above the cutoff.
- alphaint
Optimal alpha value used in iNMF.
Notes
This function normalizes gene expression data and computes principal components using either NMF or iNMF.
It calculates the alignment score for spatial and single-cell data integration.
Spearman correlation and cosine similarity between genes and PCs are computed.
The results include the transformed spatial component matrix, gene correlations, and other metrics for downstream analysis.
- nico_covariations.Covariations.find_correlation_bw_genes_and_PC_component_in_singlecell(KcomponentCluster, clusterExpression)[source]
Calculate Spearman correlation between genes and principal components in single-cell data.
This helper function is used within the find_PC_of_invidualCluster_in_SC function to determine the Spearman correlation between common gene scRNAseq factors (principal components) and scRNAseq gene expression data.
Parameters
- KcomponentClusternumpy.ndarray or pandas.DataFrame
The matrix representing the principal components (factors) from scRNAseq data. Each column corresponds to a principal component.
- clusterExpressionnumpy.ndarray or pandas.DataFrame
The matrix representing the gene expression data from scRNAseq. Each row corresponds to a gene and each column corresponds to a cell.
- nico_covariations.Covariations.find_correlation_bw_genes_and_PC_component_in_singlecell_cosine(KcomponentCluster, clusterExpression)[source]
Calculate cosine similarity between common gene scRNAseq factors and scRNAseq count data.
This helper function is used within the find_PC_of_invidualCluster_in_SC function to determine the cosine similarity between common gene scRNAseq factors (principal components) and scRNAseq gene expression data.
Parameters
- KcomponentClusternumpy.ndarray or pandas.DataFrame
The matrix representing the principal components (factors) from scRNAseq data. Each column corresponds to a principal component.
- clusterExpressionnumpy.ndarray or pandas.DataFrame
The matrix representing the gene expression data from scRNAseq. Each row corresponds to a gene and each column corresponds to a cell.
Returns
- numpy.ndarray
A matrix containing the cosine similarity scores between each gene and each principal component. Each row corresponds to a gene, and each column corresponds to a principal component.
- nico_covariations.Covariations.find_fold_change(PCA, NH_PCA, gene, CCPC, NCPC, totalLRpairs, LRcutoff, CC_meanExpression, NC_meanExpression, CC_popExpression, NC_popExpression, number_of_top_genes_to_print)[source]
Identify ligand-receptor genes for cell type interaction analysis.
This helper function is used in find_LR_interactions_in_interacting_cell_types to find ligand-receptor (LR) genes based on principal component analysis (PCA) data. It identifies the top genes and checks for LR interactions between specific cell types.
Parameters:
- PCAnumpy.ndarray
PCA data for the cell type of interest.
- NH_PCAnumpy.ndarray
PCA data for non-host cell types.
- genelist of str
List of gene names corresponding to the PCA data.
- CCPCint
Principal component index for the cell type of interest.
- NCPCint
Principal component index for the non-host cell type.
- totalLRpairslist of tuples
List of tuples representing all possible ligand-receptor pairs.
- LRcutofffloat
Threshold for selecting significant ligand-receptor interactions.
- CC_meanExpressionnumpy.ndarray
Mean expression values for the central cell type.
- NC_meanExpressionnumpy.ndarray
Mean expression values for the niche cell type.
- CC_popExpressionnumpy.ndarray
Population expression values for the central cell type.
- NC_popExpressionnumpy.ndarray
Population expression values for the niche cell type.
- number_of_top_genes_to_printint
Number of top genes to include in the output.
Returns:
- cc_geneslist of str
List of significant genes for the cell type of interest.
- nc_geneslist of str
List of significant genes for the non-host cell type.
- cc_genes5list of list
Top genes for the cell type of interest with their PCA scores.
- nc_genes5list of list
Top genes for the non-host cell type with their PCA scores.
- Found1list of list
Ligand-receptor pairs with ligands in the cell type of interest and receptors in the non-host cell type.
- Found2list of list
Ligand-receptor pairs with ligands in the non-host cell type and receptors in the cell type of interest.
- nico_covariations.Covariations.find_index(sp_genename, sc_genename)[source]
Find the common gene space submatrix between two modalities.
This helper function is used within the gene_covariation_analysis function to identify the indices of common genes between two lists of gene names corresponding to spatial and scRNAseq modalities.
Parameters
- sp_genenamelist
A list of gene names from the spatial modality.
- sc_genenamelist
A list of gene names from the scRNAseq modality.
Returns
- list
A list of indices corresponding to the common genes found in both sp_genename and sc_genename.
Example
>>> sp_genes = ['gene1', 'gene2', 'gene3', 'gene4'] >>> sc_genes = ['gene3', 'gene4', 'gene5', 'gene6'] >>> index_sp,index_sc = find_index(sp_genes, sc_genes) >>> print(index_sp) [2, 3]
- nico_covariations.Covariations.find_interest_of_genes(source, pop, mu, rps_rpl_mt_genes_included, CC_gene, top_NOG)[source]
Find genes of interest based on factor values, population expression, and average expression.
Parameters:
- sourcenp.array
Factor values associated with genes.
- popnp.array
Population expression values.
- munp.array
Mean expression values.
- rps_rpl_mt_genes_includedbool
Include Rps, Rpl, and mt genes if True.
- CC_genenp.array
List of gene names.
- top_NOGint
Number of genes to select.
Returns:
- gp1list
Genes of interest based on factor values.
- vp1list
Corresponding factor values of the selected genes.
- nico_covariations.Covariations.find_logistic_regression_interacting_score(cmn, coef, CTFeatures, nameOfCellType, logistic_coef_cutoff)[source]
Helper function used in gene_covariation_analysis to find niche interaction scores from logistic regression classifier.
This function identifies the interacting cell types by analyzing the coefficients of a logistic regression classifier. It normalizes the coefficients, sorts them, and identifies the significant interactions based on a specified cutoff value.
Parameters
- cmnarray
Confusion matrix or similar matrix representing the performance of the logistic regression classifier.
- coefarray
Coefficients of the logistic regression model.
- CTFeatureslist
List of cell type features used in the logistic regression model.
- nameOfCellTypelist
List of names corresponding to cell types.
- logistic_coef_cutofffloat
The cutoff value to consider a coefficient as significant for interaction.
Returns
- logistic_predicted_interactionsdict
A dictionary where keys are cell types and values are lists of interacting cell types with their interaction scores.
Notes
The function normalizes the logistic regression coefficients.
It identifies the most significant interactions based on the absolute value of the coefficients.
Interactions with coefficients above the cutoff value are considered significant and are included in the output.
- nico_covariations.Covariations.gene_covariation_analysis(Radius=0, output_niche_prediction_dir=None, refpath='./inputRef/', quepath='./inputQuery/', ref_cluster_tag='cluster', ref_original_counts='Original_counts.h5ad', LRdbFilename='./utils/NiCoLRdb.txt', iNMFmode=True, no_of_factors=3, shap_analysis=False, shap_cluster_cutoff=0.5, cutoff_to_count_exp_cell_population=0, seed=541, spatial_integration_modality='double', anndata_object_name='nico_celltype_annotation.h5ad', lambda_c=[0.0009765625, 0.001953125, 0.00390625, 0.0078125, 0.015625, 0.03125, 0.0625, 0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0], coeff_cutoff_for_rid_reg=0, logistic_coef_cutoff=0)[source]
Perform gene covariation analysis within the niche.
This is the primary function called by the user to perform gene covariation analysis within the niche. Before calling this function, the user must call the spatial_neighborhood_analysis function from the interaction module.
Parameters
- Radiusint, optional
This radius parameter should be the same as used in spatial neighborhood analysis to find the niche interactions. Default is 0.
- output_niche_prediction_dirstr, optional
The output directory location from the previous niche interaction runs generated by the function spatial_neighborhood_analysis. Default is ‘./nico_out/’.
- refpathstr, optional
Path to the reference scRNAseq count matrix in scTransform-like normalization. The filename must be sct_singleCell.h5ad. Default is ‘./inputRef/’.
- quepathstr, optional
Path to the query spatial count matrix in scTransform-like normalization. The filename must be sct_spatial.h5ad. Default is ‘./inputQuery/’.
- ref_cluster_tagstr, optional
The slot in the reference anndata object file where cell type information is stored. Default is ‘cluster’.
- ref_original_countsstr, optional
Path to the original count data of scRNAseq in anndata object. Must have the cluster information in .obs and the umap information in .obsm[‘X_umap’]. anndata.raw layer should have count matrix data. It will used to find the Spearman correlation and cosine similarity. Default is ‘Original_counts.h5ad’.
- LRdbFilenamestr, optional
Filename of the ligand-receptor database. The first column should be Ligand, the second column Receptor, and the third column the resource list. Default is ‘./utils/NiCoLRdb.txt’.
- iNMFmodebool, optional
If True, uses an integrated NMF approach to learn a gene-by-factor submatrix from both modalities. If False, uses an ordinary NMF approach to learn a gene by factor submatrix only from scRNAseq data and transfers these factors to the spatial modality for learning the gene weights. Default is True.
- no_of_factorsint, optional
Number of factors used in NMF for finding the common gene latent dimension space. Default is 3.
- lambda_clist, optional
Initial range of regularization parameters used in the ridge regression step to find the optimal parameter. Default is list(np.power(2.0, np.arange(-10, 10))).
- shap_analysisbool, optional
Flag to perform SHAP analysis. Default is False.
- shap_cluster_cutofffloat, optional
SHAP analysis cutoff parameter. Default is 0.5.
- coeff_cutoff_for_rid_regfloat, optional
Cutoff used to create the list of significant celltype_factor-celltype_factor niche covariations with an absolute regression coefficient greater than this. Default is 0.
- cutoff_to_count_exp_cell_populationint, optional
Parameter to find the percentage of the cell population that express a given gene in a given cell type. Value 0 is acceptable with count data. Default is 0.
- seedint, optional
Random seed used in RepeatedStratifiedKFold. Default is 541.
- spatial_integration_modalitystr, optional
Modality for spatial integration if both scRNAseq and Spatial data is available. Default is ‘double’. For only spatial data this value must be ‘single’
- anndata_object_namestr, optional
Name of the spatial anndata object name. Default is ‘nico_celltype_annotation.h5ad’.
- logistic_coef_cutofffloat, optional
Cutoff to retrieve the positive niche interactions (cell type - cell type). For values >0, cell type pairs are likely to interact. Default is 0.
Outputs
The output is saved in the directory specified by output_niche_prediction_dir, with default location being ‘./nico_out/covariations_R*_F*’.
Notes
Please provide Original_counts.h5ad, sct_singleCell.h5ad files from scRNAseq data.
Provide sct_spatial.h5ad files for the spatial transcriptomics data.
Original_counts.h5ad object should also have the cluster information in .obs and the umap information in .obsm and .raw layer has count data.
- nico_covariations.Covariations.makePCneighboorhoodFeatureMatrix(input)[source]
Helper function in gene_covariation_analysis to find the weighted neighborhood average of cell types from the spatial factors.
This function computes a matrix where each row corresponds to a cell and each column corresponds to a weighted average of principal components (PCs) from neighboring cells.
Parameters:
- inputobject
An object containing various attributes required for computation, such as:
spatialcell_unique_clusterid: Unique cluster IDs for spatial cells.
neighbors: List of neighboring cells for each cell.
neigh_distances: Distances to neighbors.
annotation_spatial_barcode_id: Barcode IDs for spatial annotations.
annotation_spatial_cluster_id: Cluster IDs for spatial annotations.
pc_of_sp_clusterid: Principal components for spatial cluster IDs.
no_of_pc: Number of principal components.
outputname: Name of the output file.
Outputs:
A .npz file containing the matrix of weighted neighborhood principal components.
- nico_covariations.Covariations.make_excel_sheet_for_gene_correlation(input)[source]
Create an Excel sheet compiling gene correlations with factors across different cell types.
This function generates an Excel sheet that provides a structured and accessible representation of gene factors associated with cell types. It includes various types of information, such as average gene expression, Spearman correlation values, and cosine similarity values for both scRNASeq and spatial data.
Parameters
- inputobject
The main input is the output from gene_covariation_analysis.
Outputs
- Excel sheets categorized into different types of information:
‘avg gene exp’: Average gene expression.
‘spearman scRNAseq Fa(i)’: Spearman correlation values for different factors within scRNASeq data.
‘cosine scRNAseq Fa(i)’: Cosine similarity values within scRNASeq data.
‘spearman spatial Fa(i)’: Spearman correlation values for common genes in the spatial data.
‘cosine spatial Fa(i)’: Cosine similarity values for common genes in the spatial data.
Notes
In the sheet names, ‘i’ corresponds to the factor ID.
Columns include factors representing all cell types.
For each factor, genes are sorted based on their association with the factor ID corresponding to the respective sheet.
- A color-coding scheme is used to distinguish genes:
Ligands are depicted in blue.
Receptors are depicted in red.
Genes with both ligand and receptor functions are depicted in magenta.
- nico_covariations.Covariations.model_linear_regression(input, logistic_predicted_interactions)[source]
Helper function for gene_covariation_analysis to prepare data Y (central cell factors) and X (neighborhood average spatial cell factors) for each cell type to perform regression.
This function loads the precomputed neighborhood feature matrix and prepares the data for linear regression analysis. It then performs ridge regression for each cell type to find the relationship between the central cell factors (Y) and the neighborhood average spatial cell factors (X).
Parameters
- inputobject
An object containing the following attributes:
shap_cluster_cutoff : float, cutoff value for SHAP clustering.
outputname : str, the name of the input file containing precomputed neighborhood features.
no_of_pc : int, number of principal components.
spatialcell_unique_clusterid : list, unique cluster IDs of spatial cells.
annotation_spatial_cluster_id : list, cluster IDs for each spatial cell.
spatialcell_unique_clustername : list, unique cluster names of spatial cells.
seed : int, seed value for regression.
lambda_c : float, regularization parameter for ridge regression.
K_fold : int, number of folds for cross-validation.
n_repeats : int, number of repeats for cross-validation.
- logistic_predicted_interactionsdict
A dictionary where keys are cell type names and values are lists of tuples. Each tuple contains a cell type name and a score representing the predicted interaction strength with the key cell type.
Returns
- save_coefdict
A dictionary where keys are unique cluster IDs and values are lists containing the following elements:
coef : array, coefficients of the ridge regression model.
intercept : array, intercepts of the ridge regression model.
alpha : float, regularization parameter of the ridge regression model.
xlabel : array, names of the features.
score : array, scores of the features.
target : array, target values (central cell factors).
neighborhoodClass : array, neighborhood average spatial cell factors.
pv : array, p-values of the regression coefficients.
percent_variance_explained : array, percentage of variance explained by the model.
residual_variance_explained : array, residual variance explained by the model.
Notes
The function uses ridge regression to model the relationship between central cell factors and neighborhood factors.
The precomputed neighborhood feature matrix is loaded from a file and NaN values are replaced with zeros.
The function selects relevant features based on logistic_predicted_interactions and performs ridge regression.
The results are stored in a dictionary and returned for further analysis.
- nico_covariations.Covariations.multiplicative_method(W, H, A, max_iter)[source]
Perform conventional Non-negative Matrix Factorization (NMF) using a multiplicative update rule.
This helper function is used in find_PC_of_invidualCluster_in_SC to decompose matrix A into two non-negative matrices W and H such that A ≈ W @ H.
- Wndarray
Initial matrix representing the basis vectors. Shape(n_samples, n_components).
- Hndarray
Initial matrix representing the coefficients. Shape(n_components, n_features).
- Andarray
The input data matrix to be factorized. Shape(n_samples, n_features).
- max_iterint
The maximum number of iterations for the multiplicative update algorithm.
- Wndarray
Updated basis matrix after NMF. Shape: (n_samples, n_components).
- Hndarray
Updated coefficient matrix after NMF. Shape: (n_components, n_features).
- normslist
List of Frobenius norms of the difference between A and W @ H for each iteration.
The update rules for W and H are based on minimizing the Frobenius norm of the difference between A and W @ H. The update for H is performed as:
\[H_{ij} = H_{ij} \]rac{(W^T A)_{ij}}{(W^T W H)_{ij} + epsilon}
where ε is a small constant to prevent division by zero.
The update for W has been commented out but follows a similar form. Uncomment the lines under “Update W” to perform updates for W as well.
\[W_{ij} = W_{ij} \]rac{(A H^T)_{ij}}{(W H H^T)_{ij} + epsilon}
This method is sensitive to initializations of W and H, and the results may vary across runs.
- nico_covariations.Covariations.pathway_analysis(input, NOG_pathway=50, choose_factors_id=[], correlation_with_spearman=True, saveas='pdf', savefigure=False, positively_correlated=True, rps_rpl_mt_genes_included=True, choose_celltypes=[], circlesize=12, pvalue_cutoff_enrichr=0.05, pathwayorganism='Mouse', database=['GO_Biological_Process_2021', 'BioPlanet_2019', 'Reactome_2016'], dotplot_x_order=False, dotplot_y_order=False, pvalue_cutoff=0.05, top_term=10, figsize=(4, 6), dotplot_xticklabels_rot=None, dotplot_yticklabels_rot=None, dotplot_marker='o', dotplot_show_ring=False, object_for_sorting='Adjusted P-value', object_for_color='Adjusted P-value', object_for_xaxis='Odds Ratio', object_for_yaxis='Term', barplot_edgecolor='black', barplot_linewidth=0.5, barplot_ascending_order=True, barplot_colorbar_length_shrink=0.5, barplot_log10_pvalue_roundoff=2, display_plot_as='barplot', fontsize=12, showit=True, transparent_mode=False, dpi=300, input_colormap='hot_r')[source]
Perform pathway enrichment analysis for gene covariations within cell type niches.
This function analyzes the gene covariation identified through NMF or iNMF in gene_covariation_analysis and perform pathway enrichment analysis using the GSEApy library. Enriched pathways associated with specific cell types and NMF latent factors are visualized in bar or dot plots.
Parameters
- inputobject
The main input object, which is the output from the gene_covariation_analysis.
- NOG_pathwayint, optional
The number of top genes associated with each NMF factor to include in the pathway enrichment analysis. If no pathways are observed, increase the number of genes or try different databases. (default is 50)
- choose_factors_idlist, optional
A list of specific factor IDs for which pathway enrichments analysis should be conducted. If empty, enrichment will be computed for all factors. (default is [])
- correlation_with_spearmanbool, optional
If True, uses Spearman correlation coefficient for gene-factor association; otherwise, use cosine similarity. (default is True)
- positively_correlatedbool, optional
If True, selects positively correlated genes for enrichment analysis; otherwise, selects negatively correlated genes. (default is True)
- rps_rpl_mt_genes_includedbool, optional
If True, include rps, rpl, and mt- genes in the pathway enrichment analysis; if False, exclude these genes. (default is True)
- pvalue_cutoff_enrichrfloat, optional
The significance threshold for including pathways in the gseapy.enrichr (based on adjusted p-value). It show enriched terms which Adjusted P-value < cutoff. Only affects the output figure, not the final output file. (default is 0.05) For detail see here https://gseapy.readthedocs.io/en/latest/run.html
- pvalue_cutofffloat, optional
Terms with column value < cut-off are shown. Work only for (“Adjusted P-value”, “P-value”, “NOM p-val”, “FDR q-val”) https://gseapy.readthedocs.io/en/latest/run.html
- pathwayorganismstr, optional
The organism for which to perform pathway analysis, supported by the GSEApy package (e.g., ‘Mouse’, ‘Human’). (default is ‘Mouse’)
- databaselist, optional
A list of pathway databases to use for enrichment analysis in GSEApy package. The options includes ‘GO_Biological_Process_2021’, ‘BioPlanet_2019’, and ‘Reactome_2016’. See detail to find available databases https://gseapy.readthedocs.io/en/latest/gseapy_example.html (default is [‘GO_Biological_Process_2021’, ‘BioPlanet_2019’, ‘Reactome_2016’])
- choose_celltypeslist, optional
A list of cell types for which pathway enrichment analysis should be performed. If empty, analysis will be performed for all cell types. (default is [])
- saveasstr, optional
The file format for saving figures, either in PDF or PNG format. (default is ‘pdf’)
- circlesizeint, optional
The size of the dots in the dot plots in pathway enrichment visualization. Increase this value to control marker size in the visualization. (default is 12)
- savefigurebool, optional
If True, saves the generated figures to the specified path. (default is False)
- display_plot_asstr, optional
The format for displaying the pathway analysis plot, either ‘barplot’ or ‘dotplot’. (default is ‘barplot’)
- fontsizeint, optional
The font size for labels in the pathway visualization plots. (default is 12)
- input_colormapstr, optional
The color map used for visualizing the pathways, available from matplotlib. Please look for all the available colormap https://matplotlib.org/stable/users/explain/colors/colormaps.html Popular choices are ‘autumn_r’, ‘RdBu_r’, ‘viridis’, ‘viridis_r’, (default is ‘hot_r’)
- transparent_modebool, optional
Background color in the figures. (default is False)
- showitbool, optional
If True, the figures are shown interactively. (default is True)
- figsizetuple, optional
Dimension of the figure size. (default figure size is (4, 6)).
- top_termint, optional
The number of terms in the barplot. (default is 10)
- dpiint, optional
Resolution in dots per inch for saving the figure. Default is 300.
- object_for_colorstr, optional
The dataFrame column for plotting the color (default is ‘Adjusted P-value’)
- object_for_xaxisstr, optional
The dataFrame column for plotting the xaxis (default is ‘Odds Ratio’)
- object_for_yaxisstr, optional
The dataFrame column for plotting the yaxis (default is ‘Term’)
- object_for_sortingstr, optional
The dataFrame sorted as -log10 and top terms are plotted as barplot (default is ‘Adjusted P-value’)
- barplot_edgecolorstr, optional
The color of barplot edge (default is ‘black’)
- barplot_linewidthfloat, optional
The linewidth of barplot edge (default is 0.5)
- barplot_ascending_orderbool, optional
Order the y-axis in barplot (default is True).
- barplot_colorbar_length_shrinkfloat, optional
Length of colorbar in barplot (default is 0.5).
- barplot_log10_pvalue_roundoffint, optional
Roundoff the pvalue (default is 2)
- dotplot_x_order, dotplot_y_order, dotplot_xticklabels_rot, dotplot_yticklabels_rot, dotplot_marker, dotplot_show_ring
Used in gseapy.dotplot For details please check this website https://gseapy.readthedocs.io/en/latest/run.html (The default values are False, False, None, None, ‘o’, False)
Outputs
The pathway figures are saved in “./nico_out/covariations_R*_F*/Pathway_figures/”.
Notes General
In the sheet names, ‘i’ corresponds to the factor ID.
Columns include factors representing all cell types.
For each factor, genes are sorted based on their association with the factor ID corresponding to the respective sheet.
- A color-coding scheme is used to distinguish genes:
Ligands are depicted in blue.
Receptors are depicted in red.
Genes with both ligand and receptor functions are depicted in magenta.
Notes Enrichr
For original reference of below information please follow this url https://maayanlab.cloud/Enrichr/help#background&q=4
- Enrichr implements four scores to report enrichment results:
p-value
q-value
rank (Z-score) also called Odds Ratio
combined score
Columns contain: Term Overlap P-value Odds Ratio Combinde Score Adjusted_P-value Genes
The p-value is computed using a standard statistical method used by most enrichment analysis tools: Fisher’s exact test or the hypergeometric test.
This is a binomial proportion test that assumes a binomial distribution and independence for probability of any gene belonging to any set.
The q-value is an adjusted p-value using the Benjamini-Hochberg method for correction for multiple hypotheses testing. You can read more about this method,
and why it is needed here [https://www.jstor.org/stable/2346101].
- The odds ratio is computed using this formula:
- In Query | Not in Query | Row Total
In Gene Set | a (or x) | b | a + b (or m)
Not in Gene Set| c | d | c + d (or n)
Column Total | a + c (or k) | b + d | a + b + c + d (bg or Annotation Database Total)
oddsRatio = (1.0 * a * d) / Math.max(1.0 * b * c, 1)
- where:
a (x) are the overlapping genes,
b (m-x) are the genes in the annotated set - overlapping genes
c (k-x) are the genes in the input set - overlapping genes
d (bg-m-k+x) are the 20,000 genes (or total genes in the background) - genes in the annotated set - genes in the input set + overlapping genes
- The combined score is a combination of the p-value and odds ratio calculated by multiplying the two scores as follows:
c = -log(p) * oddsRatio
Where c is the combined score, p is the p-value computed using Fisher’s exact test, and oddsRatio is the odds ratio.
The combined score provides a compromise between both methods and in several benchmarks show that it reports the best rankings when compared with the other scoring schemes.
Enrichr provides all four options for sorting enriched terms.
- Python vs R differences (Please read GSEApy Documentation page 68 for more detail description https://readthedocs.org/projects/gseapy/downloads/pdf/latest/)
- scipy.hypergeom.sf(k, M, n, N, loc=0):
M: the total number of objects,
n: the total number of Type I objects.
k: the random variate represents the number of Type I objects in N drawn without replacement from the total population.
R: > phyper(x-1, m, n, k, lower.tail=FALSE)
Python: > hypergeom.sf(x-1, m+n, m, k)
Example
For other available databases, check for species ‘Human,’ ‘Mouse,’ ‘Yeast,’ ‘Fly,’ ‘Fish,’ and ‘Worm’ in the following way:
>>> import gseapy as gp >>> mouse = gp.get_library_name(organism='Mouse') >>> human = gp.get_library_name(organism='Human')
- nico_covariations.Covariations.plot_all_ct(CTname, PP, cellsinCT, ax, mycelltype, Fa, cmap, ms, msna)[source]
Visualize factor values in UMAP for all cell types.
This helper function is used for visualizing factor values in UMAP, showing the distribution of cells across different cell types and highlighting specific cell types of interest.
Parameters:
- CTnamelist of str
List of cell type names.
- PPnp.ndarray
UMAP embedding coordinates for all cells.
- cellsinCTdict
Dictionary where keys are cell type names and values are lists of cell indices corresponding to each cell type.
- axmatplotlib.axes.Axes
Matplotlib Axes object where the UMAP plot will be drawn.
- mycelltypelist of str
List of cell types to highlight.
- Fanp.ndarray
Array of factor values corresponding to each cell.
- cmapstr or matplotlib.colors.Colormap
Colormap used for plotting the factor values.
- msint
Marker size for the highlighted cell types.
- msnaint
Marker size for the non-highlighted (NA) cell types.
- nico_covariations.Covariations.plot_cosine_and_spearman_correlation_to_factors(input, choose_celltypes=[], NOG_Fa=30, saveas='pdf', transparent_mode=False, showit=True, dpi=300, figsize=(15, 10))[source]
Plots cosine and Spearman correlation to factors for given cell types.
Parameters:
- inputobject
The main input object containing the output from gene_covariation_analysis.
- choose_celltypeslist, optional
The cell types for which you want to inspect the covariation pattern. If the list is empty, the output will be generated for all cell types. Default is [].
- NOG_Faint, optional
Number of genes to visualize for each factor. Default is 30.
- saveasstr, optional
Format to save the figures. Options are ‘pdf’ or ‘png’. If ‘png’ is chosen, dpi is set to 300. Default is ‘pdf’.
- transparent_modebool, optional
Whether to save the figure with a transparent background. Default is False.
- showitbool, optional
Whether to display the plot interactively. Default is True.
- figsizetuple, optional
Dimensions of the figure size. Default is (15, 10).
Outputs:
The output NMF plots are saved in ./<output_nico_dir>/covariations_R*_F*/NMF_output.
- nico_covariations.Covariations.plot_feature_matrices(input, showit=True, saveas='pdf', transparent_mode=False, dpi=300, figsize=(10, 10))[source]
Plots feature vectors of the spatial factors from the regression step.
- nico_covariations.Covariations.plot_ligand_receptor_in_interacting_celltypes(CC_celltype_name, NC_celltype_name, logRegScore, pc1, pc2, ridgeRegScore, pvalue, Found1, Found2, saveLRplots, LR_plot_Exp_thres, saveas, transparent_mode, showit, figsize, flag, dpi)[source]
Plot ligand-receptor interactions for interacting cell types.
This helper function is used in find_LR_interactions_in_interacting_cell_types to plot rectangle p-value figures representing ligand-receptor interactions between two cell types.
The Y-axis shows the central cell type factors, and the X-axis shows the colocalized neighborhood cell type factors. The circle size denotes the p-values, and the circle size scales with significance.
Parameters:
- CC_celltype_namestr
Name of the central cell type.
- NC_celltype_namestr
Name of the neighborhood cell type.
- logRegScorefloat
Logistic regression score.
- pc1int
Principal component for the central cell type.
- pc2int
Principal component for the neighborhood cell type.
- ridgeRegScorefloat
Ridge regression score.
- pvaluefloat
P-value for the interaction.
- Found1list
List of found ligand-receptor interactions where the ligand is in the central cell type.
- Found2list
List of found ligand-receptor interactions where the ligand is in the neighborhood cell type.
- saveLRplotsstr
Directory to save the ligand-receptor plots.
- LR_plot_Exp_thresfloat
Expression threshold for plotting.
- saveasstr
File format to save the plots (e.g., ‘png’, ‘pdf’).
- transparent_modebool
Whether to save the plots with a transparent background.
- showitbool
Whether to display the plots.
- figsizetuple
Size of the figure.
- flagstr
Flag indicating which interactions to plot (‘First’, ‘Second’, ‘Both’).
- nico_covariations.Covariations.plot_significant_regression_covariations_as_circleplot(input, choose_celltypes=[], saveas='pdf', pvalue_cutoff=0.05, mention_pvalue=True, transparent_mode=False, showit=True, dpi=300, figsize=(6, 1.25))[source]
Plot significant regression covariations as a circle plot.
This function visualizes the significant regression covariations identified in the gene covariation analysis.
Parameters
- inputobject
The main input is the output from gene_covariation_analysis.
- choose_celltypeslist, optional
The cell type(s) for which you want to inspect the covariation pattern. If the list is empty, the output will be generated for all cell types. Default is [].
- saveasstr, optional
Format to save the figures in, either ‘pdf’ or ‘png’ (dpi for PNG format is 300). Default is ‘pdf’.
- pvalue_cutofffloat, optional
The p-value cutoff used to print the -log10(pvalue) on top of the circle. Default is 0.05.
- mention_pvaluebool, optional
Whether to highlight the p-value on the circle plot. If False, it will not be shown. Default is True.
- transparent_modebool, optional
Background color in the figures. If True, the background will be transparent. Default is False.
- showitbool, optional
Whether to display the plot. If False, the plot will be saved but not shown. Default is True.
- figsizetuple, optional
Dimension of the figure size. Default is (6, 1.25).
Outputs
The regression figures are saved in ‘./nico_out/covariations_R*_F*/Regression_outputs/’.
Notes
The main input is the output from gene_covariation_analysis.
The output directory for saving the figures should exist prior to running this function.
- nico_covariations.Covariations.plot_significant_regression_covariations_as_heatmap(input, choose_celltypes=[], saveas='pdf', transparent_mode=False, showit=True, dpi=300, figsize=(6, 10))[source]
Plot significant regression covariations as a heatmap.
This function visualizes the significant regression covariations from the gene covariation analysis as a heatmap.
Parameters
- inputobject
The main input is the output from the gene_covariation_analysis.
- choose_celltypeslist, optional
The cell types for which you want to inspect the covariation regression pattern. If the list is empty, the output will be generated for all cell types. Default is an empty list [].
- saveasstr, optional
Format to save the figures in, either ‘pdf’ or ‘png’ (dpi for PNG format is 300). Default is ‘pdf’.
- transparent_modebool, optional
Background color in the figures. If True, the background will be transparent. Default is False.
- showitbool, optional
Whether to display the plot. If False, the plot will be saved but not shown. Default is True.
- figsizetuple, optional
Dimension of the figure size. Default is (6, 10).
Outputs
The regression heatmap figures are saved in the specified format and location as defined in the function implementation. Default save location is ./nico_out/covariations_R*_F*/Regression_outputs/
Notes
Ensure the input contains the necessary data from the gene covariation analysis.
Ensure the output directory exists and is writable before running this function.
- nico_covariations.Covariations.plot_top_genes_for_a_given_celltype_from_all_factors(input, choose_celltypes=[], top_NOG=20, rps_rpl_mt_genes_included=True, correlation_with_spearman=True, saveas='pdf', transparent_mode=False, showit=True, dpi=300, figsize=(12, 10))[source]
Visualize top genes associated with given cell types across all three factors.
This function generates plots of the top N genes associated with specified cell types from the input data. The associations can be visualized using either Spearman correlation coefficient or cosine similarity. Optionally, the visualization can include rps, rpl, and mt genes.
Parameters:
- inputdict
The main input is the output from gene_covariation_analysis.
- choose_celltypeslist, optional
The cell type for which the gene-factor associations should be displayed. If the list is empty, the output will be generated for all the cell types. (default is [])
- top_NOGint, optional
Number of genes to visualize. (default is 20)
- rps_rpl_mt_genes_includedbool, optional
For pathway analysis, decide whether to include rps, rpl, and mt genes. If True, they are included. (default is True)
- correlation_with_spearmanbool, optional
If True, visualize gene-factor association obtained as Spearman correlation coefficient; otherwise, cosine similarity is displayed. (default is True)
- saveasstr, optional
Save the figures in PDF or PNG format (dpi for PNG format is 300). (default is ‘pdf’)
- transparent_modebool, optional
Background color of the figures. (default is False)
- showitbool, optional
Whether to display the plot or not. (default is True)
- figsizetuple, optional
Dimension of the figure size. (default is (12, 10))
Outputs:
The gene visualization figures are saved in ./nico_out/covariations_R*_F*/dotplots/*
Example:
>>> input_data = load_data_from_analysis() # hypothetical function to load data >>> plot_top_genes_for_a_given_celltype_from_all_factors(input_data, choose_celltypes=['CellType1'], top_NOG=10)
- nico_covariations.Covariations.plot_top_genes_for_pair_of_celltypes_from_two_chosen_factors(input, choose_interacting_celltype_pair, visualize_factors_id, top_NOG=20, dpi=300, rps_rpl_mt_genes_included=True, correlation_with_spearman=True, saveas='pdf', transparent_mode=False, showit=True, figsize=(5, 8))[source]
Visualize top genes associated with a pair of cell types from chosen factors.
This function generates plots of the top 20 genes in the factors associated with specified cell types from the input data, using either Spearman correlation coefficient or cosine similarity. The visualizations include comparisons between the chosen factors for each cell type.
Parameters:
- inputobject
The main input is the output from gene_covariation_analysis, which includes factor and expression data.
- choose_interacting_celltype_pairlist
Define the cell type pairs for visualization. The first entry is the central cell type, and the second is the niche cell type.
- visualize_factors_idlist
Define the factor IDs for visualization. The first entry is the factor ID of the central cell type, and the second is the factor ID of the niche cell type.
- top_NOGint, optional
Number of genes to visualize. (default is 20)
- rps_rpl_mt_genes_includedbool, optional
For pathway analysis, decide whether to include rps, rpl, and mt genes. If True, they are included. (default is True)
- correlation_with_spearmanbool, optional
If True, visualize gene-factor association obtained as Spearman correlation coefficient; otherwise, cosine similarity is displayed. (default is True)
- saveasstr, optional
Save the figures in PDF or PNG format (dpi for PNG format is 300). (default is ‘pdf’)
- transparent_modebool, optional
Background color of the figures. (default is False)
- showitbool, optional
Whether to display the plot or not. (default is True)
- figsizetuple, optional
Dimension of the figure size. (default is (5, 8))
Outputs:
The gene visualization figures are saved in ./nico_out/covariations_R*_F*/dotplots/*
Example:
>>> scov.plot_top_genes_for_pair_of_celltypes_from_two_chosen_factors(cov_out, choose_interacting_celltype_pair=['Stem/TA','Paneth'], visualize_factors_id=[1,1], top_NOG=20,saveas=saveas,transparent_mode=transparent_mode)
- nico_covariations.Covariations.read_LigRecDb(contdb)[source]
Reads the ligand-receptor database file.
This function processes a database of ligand-receptor pairs, identifying unique ligands, receptors, and elements that act as both. The input should be a list of strings, where each string represents a line from the database file.
Parameters:
- contdblist of str
A list of strings, where each string is a line from the ligand-receptor database file. Each line contains a ligand and a receptor separated by whitespace.
Example:
>>> contdb = [ "LIG1 REC1", "LIG2 REC2", "REC1 LIG1", # REC1 is both a receptor and a ligand "LIG3 REC3" ] >>> totalLRpairs, ligand, receptor, either = read_LigRecDb(contdb) >>> print(totalLRpairs) [['LIG1', 'REC1'], ['LIG2', 'REC2'], ['REC1', 'LIG1'], ['LIG3', 'REC3']] >>> print(ligand) {'LIG2': 1, 'LIG3': 1} >>> print(receptor) {'REC2': 1, 'REC3': 1} >>> print(either) {'LIG1': 1, 'REC1': 1}
- nico_covariations.Covariations.read_spatial_data(clusterFilename, celltypeFilename)[source]
Read the cluster information for spatial data.
This helper function is used within the gene_covariation_analysis function to read the cluster and cell type information from the specified files.
Parameters
- clusterFilenamestr
The file path of the cluster information file.
- celltypeFilenamestr
The file path of the cell type information file.
- nico_covariations.Covariations.remove_extra_character_from_name(name)[source]
Remove special characters from cell type names to avoid errors while saving figures.
This function replaces certain special characters in the input name with underscores or other appropriate characters to ensure the name is safe for use as a filename.
Parameters
- namestr
The original cell type name that may contain special characters.
Returns
- str
The modified cell type name with special characters removed or replaced.
Example
>>> name = 'T-cell (CD4+)/CD8+' >>> clean_name = remove_extra_character_from_name(name) >>> print(clean_name) 'T-cell_CD4p_CD8p'
Notes
The following replacements are made:
‘/’ is replaced with ‘_’
‘ ‘ (space) is replaced with ‘_’
‘”’ (double quote) is removed
“’” (single quote) is removed
‘)’ is removed
‘(’ is removed
‘+’ is replaced with ‘p’
‘-’ is replaced with ‘n’
‘.’ (dot) is removed
These substitutions help in creating filenames that do not contain characters that might be problematic for file systems or software.
- nico_covariations.Covariations.run_ridge_regression(input, saveoutname, ylabelname, target, neighborhoodClass, shap_cluster_cutoff)[source]
Helper function for model_linear_regression to perform ridge regression per cell type.
This function performs ridge regression for each target variable (central cell factors) using the neighborhood average spatial cell factors as predictors. It normalizes the data, fits the regression model, and computes various statistics including p-values and explained variance.
Parameters
- inputobject
An object containing the following attributes: - shap_analysis : bool, whether to perform SHAP analysis. - regression_outdir : str, directory to save regression outputs. - lambda_c : list, list of regularization parameters for RidgeCV. - no_of_pc : int, number of principal components.
- saveoutnamestr
The name to save the output of the regression results.
- ylabelnamelist
List of feature names for the predictors.
- targetarray
The target values (central cell factors).
- neighborhoodClassarray
The neighborhood average spatial cell factors.
- shap_cluster_cutofffloat
The cutoff value for clustering in SHAP analysis.
Returns
- coefarray
Coefficients of the ridge regression model.
- interceptarray
Intercepts of the ridge regression model.
- lambda_clist
List of regularization parameters used in the ridge regression model.
- percent_variance_explainedlist
Percentage of variance explained by the model.
- residual_variance_explainedfloat
Residual variance explained by the model (currently set to 0).
- pvarray
P-values of the regression coefficients.
Notes
The function normalizes the predictors and target variables.
It fits a ridge regression model for each target variable and computes various statistics.
If SHAP analysis is enabled, it performs SHAP analysis and saves the plots.
- nico_covariations.Covariations.save_LR_interactions_in_excelsheet_and_regression_summary_in_textfile_for_interacting_cell_types(input, pvalueCutoff=0.05, correlation_with_spearman=True, LR_plot_NMF_Fa_thres=0.2, LR_plot_Exp_thres=0.2, number_of_top_genes_to_print=20)[source]
Save ligand-receptor (LR) interactions in an Excel sheet and regression summary in a text file for interacting cell types.
This function processes the output from gene_covariation_analysis to identify significant LR interactions and saves the results in an Excel sheet and a text file.
Parameters:
- inputobject
The main input is the output from gene_covariation_analysis.
- pvalueCutofffloat, optional
The cutoff used to select the significant central cell type factor and niche cell type factor covariations. Default is 0.05.
- correlation_with_spearmanbool, optional
If True, genes factor correlations are computed as Spearman correlation coefficient; otherwise, cosine similarities are computed. Default is True.
- LR_plot_NMF_Fa_thresfloat, optional
Only ligands or receptors are retained that exhibit a correlation to the respective factors higher than this cutoff. Default is 0.2.
- LR_plot_Exp_thresfloat, optional
Only ligands or receptors are retained that are expressed in a fraction of cells of the respective cell types exceeding this cutoff. Default is 0.2.
- number_of_top_genes_to_printint, optional
The number of top correlating genes to print in the regression summary text file. Default is 20.
Outputs:
An Excel sheet with ligand-receptor interaction information for easy access. The columns are structured as follows in the sheets:
ID of the cell type-cell type interaction
BC. Interacting cell types A and B
Normalized interaction scores from the logistic regression classifier
EF. NMF factor IDs (metagenes) in cell types A and B
Ridge regression coefficient indicating the factors’ covariation
Ligand in cell type A
Receptor in cell type B
JK. Pearson correlation of ligand and receptor genes in cell types A and B with the corresponding factors
LM. Average expression of ligands and receptors in cell types A and B
NO. Fraction of cells expressing these genes with counts greater than zero in cell types A and B
A regression summary text file with the following structure:
First row: CC-Fa(i), CC (cell type), niche_score (from classifier), NC-Fa*, NC (cell type), RegCoeff (covariation score), p-value on normal scale, p-value on -log10 scale
Second row: Top 20 (number_of_top_genes_to_print) genes correlated to Fa(i) of central cell type, with genes and their factor ID indicated in the pair
Third row: Top 20 (number_of_top_genes_to_print) genes correlated to Fa(j) of niche cell type, with genes and their factor ID indicated in the pair
Notes
Our analysis accounts for bidirectional cellular crosstalk interactions of ligands and receptors in cell types A and B.
The ligand can be expressed on cell type A and signal to the receptor detected on cell type B, or vice versa.
Both ligand-receptor plots and Excel sheets profile bidirectional cellular crosstalk of ligand and receptor in cell types A and B.
Each central cell type is represented in a separate Excel sheet, while the LR enrichment sheet aggregates all interactions across central cell types.
- nico_covariations.Covariations.sort_index_in_right_order(correct, wrong)[source]
Sorts the ‘wrong’ array to match the order of the ‘correct’ array based on the first column values.
This function reorders the rows of the ‘wrong’ array to match the order of the ‘correct’ array based on the values in the first column. It is a helper function used in visualizing cell type annotations.
Parameters:
- correctndarray
An array with the correct order of elements. The sorting is based on the values in the first column.
- wrongndarray
An array that needs to be reordered to match the ‘correct’ array. The sorting is based on the values in the first column.
Returns:
- rightndarray
The ‘wrong’ array reordered to match the order of the ‘correct’ array based on the first column values.
- nico_covariations.Covariations.sorting_of_factors_for_showing_the_value_in_excelsheet(CC_corr, no_of_pc, gene, genenames)[source]
Sort factor values for displaying in an Excel sheet.
This helper function is used in make_excel_sheet_for_gene_correlation to sort the factor values from correlation data. It separates the sorted values into all genes and a subset of common genes.
Parameters:
- CC_corrnumpy.ndarray
Array of correlation values, where rows represent genes and columns represent principal components.
- no_of_pcint
Number of principal components.
- genelist of str
List of gene names corresponding to the rows in CC_corr.
- genenameslist of str
List of gene names to be included in the common subset.
- nico_covariations.Covariations.top_genes_in_correlation_list_without(genename, corr_NMFfactors_genes, n_top_words)[source]
Identify top genes associated with NMF factors, excluding duplicates.
This helper function sorts the factor values and selects the top genes associated with each factor. It is used in plot_cosine_and_spearman_correlation_to_factors.
Parameters
- genenamenumpy.ndarray or pandas.Series
Array or Series containing gene names.
- corr_NMFfactors_genesnumpy.ndarray or pandas.DataFrame
The matrix representing the correlation values between genes and NMF factors. Each row corresponds to a gene, and each column corresponds to an NMF factor.
- n_top_wordsint
The number of top genes to retrieve for each NMF factor.
Returns
- gnamenumpy.ndarray
Array containing the names of the top genes associated with the NMF factors.
- matnumpy.ndarray
Matrix containing the correlation values of the top genes associated with the NMF factors. Each row corresponds to a selected top gene, and each column corresponds to an NMF factor.
- nico_covariations.Covariations.triangulation_for_triheatmap(M, N)[source]
Create triangulation for plotting a ligand-receptor map.
This helper function generates the triangulation needed for plotting a rectangle four faces in the plot_ligand_receptor_in_interacting_celltypes function. It constructs the vertices and triangles required for visualizing the ligand-receptor interactions on a heatmap.
Parameters:
- Mint
Number of columns in the heatmap.
- Nint
Number of rows in the heatmap.
Returns:
list of matplotlib.tri.Triangulation
- nico_covariations.Covariations.visualize_factors_in_scRNAseq_umap(input, choose_interacting_celltype_pair, visualize_factors_id, umap_tag='X_umap', msna=0.1, ms=5, cmap='viridis', dpi=300, saveas='pdf', transparent_mode=False, showit=True, figsize=(8, 3.5))[source]
Visualize factors in scRNAseq UMAP embedding.
This function visualizes the factors in single-cell RNA sequencing (scRNAseq) UMAP embeddings. It highlights the interactions between specified cell type pairs and their corresponding factor IDs.
Parameters:
- inputstr
The primary input is the output from gene_covariation_analysis.
- choose_interacting_celltype_pairlist
List defining the cell type single or in pairs to visualize. At least one cell type need to put by the user.
- visualize_factors_idlist
List defining the factor IDs single or in pairs to visualize in the UMAP. The chosen factors analogous to defined cell types.
- umap_tagstr, optional
The UMAP embedding tag in the .obsm field of the AnnData object (default is ‘X_umap’).
- msnafloat, optional
The marker size for non selected (NA) cell types (default is 0.1).
- msint, optional
The marker size for selected cell types (default is 5).
- cmapstr, optional
Colormap for visualizing factors (default is plt.rcParams[“image.cmap”]).
- saveasstr, optional
Format to save the figures (‘pdf’ or ‘png’) (default is ‘pdf’).
- transparent_modebool, optional
Whether to save the figures with a transparent background (default is False).
- showitbool, optional
Whether to display the figures (default is True).
- figsizetuple, optional
Size of the figure (default is (8, 3.5)).
Outputs:
The factor visualization in scRNAseq embedding is saved in “./nico_out/covariations_R*_F*/scRNAseq_factors_in_umap”.
- nico_covariations.Covariations.visualize_factors_in_spatial_umap(input, choose_interacting_celltype_pair, visualize_factors_id, umap_tag='X_umap', quepath='./inputQuery/', msna=0.1, ms=5, cmap='viridis', dpi=300, saveas='pdf', transparent_mode=False, showit=True, figsize=(8, 3.5))[source]
Visualize factors in spatial UMAP for cell type interactions.
This function is used to visualize the factors of interacting cell types in a spatial UMAP embedding. It generates and saves plots showing the distribution and factor values of cells.
Parameters:
- inputstr
The primary input is the output from gene_covariation_analysis.
- choose_interacting_celltype_pairlist of str
Define the cell type single or in pairs for visualization in the spatial UMAP. Example: choose_interacting_celltype_pair=[‘CentralCellType’, ‘NicheCellType’]
- visualize_factors_idlist of str
Define the factor IDs single or in pairs for visualization in the spatial UMAP. Example: visualize_factors_id=[1, 3]
- umap_tagstr, optional
Slot for UMAP embedding in the AnnData object. Default is ‘X_umap’.
- quepathstr, optional
Path to the query spatial count matrix in scTransform-like normalization in the common gene space. The filename should be sct_spatial.h5ad. Default is ‘./inputQuery/’.
- msnafloat, optional
Marker size for not selected (NA) cell types. Default is 0.1.
- msfloat, optional
Marker size for selected cell types. Default is 5.
- cmapstr or matplotlib.colors.Colormap, optional
Colormap used for visualizing the factor values. Default is plt.rcParams[“image.cmap”].
- saveasstr, optional
Format to save the figures, either ‘pdf’ or ‘png’. Default is ‘pdf’.
- transparent_modebool, optional
Background color of the figures. If True, the figures have a transparent background. Default is False.
- showitbool, optional
If True, the figures will be displayed. Default is True.
- figsizetuple of float, optional
Dimension of the figure size. Default is (8, 3.5).
Output:
The output figure will be saved in nico_out/covariations_R*_F*/spatial_factors_in_umap*.