| Title: | Explain Gene Expression with Boolean Rules of Chromatin States |
|---|---|
| Description: | Infers Boolean rules among cis-regulatory regions using paired chromatin accessibility and gene expression data at bulk and single-cell levels. Links regulatory regions to target genes, providing insights into gene regulation mechanisms. |
| Authors: | Seyed Amir Malekpour [aut, cre] |
| Maintainer: | Seyed Amir Malekpour <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-11 06:47:20 UTC |
| Source: | https://github.com/compbioipm/ocrrbbr |
This function estimates the effective sample size (ESS) of single-cell RNA-seq data by accounting for correlation among cells within the same cell type.
ESS(rnaseq_data, cell_type, verbose = FALSE)ESS(rnaseq_data, cell_type, verbose = FALSE)
rnaseq_data |
A numeric matrix of RNA-seq expression values.
Rows correspond to genes and columns correspond to cells.
Expression values are assumed to be normalized (e.g., |
cell_type |
A data frame containing cell-type information.
Row names must match the column names of |
verbose |
Logical. If TRUE, progress messages are printed. |
A numeric value representing the effective sample size (ESS), adjusted for within-cell-type correlation.
Human ATAC-seq data
human_atacseq_datahuman_atacseq_data
An object of class dgCMatrix with 43 rows and 9834 columns.
Human metadata
human_meta_datahuman_meta_data
An object of class matrix (inherits from array) with 9834 rows and 3 columns.
Human peaks GRanges
human_peaks_grhuman_peaks_gr
An object of class GRanges of length 108377.
Human RNA-seq data
human_rnaseq_datahuman_rnaseq_data
An object of class dgCMatrix with 2 rows and 9834 columns.
This function identifies ATAC-seq peaks that are located within a specified window around the transcription start sites (TSS) of genes, and links those peaks to the respective genes.
link_peaks_to_tss(gtf_file, peaks_gr, gene_list = NA, tss_window = NA)link_peaks_to_tss(gtf_file, peaks_gr, gene_list = NA, tss_window = NA)
gtf_file |
A character string specifying the path to a GTF file containing gene annotations. |
peaks_gr |
A |
gene_list |
A character vector of gene names to filter for. Only peaks within the window around TSS of these genes will be considered. Default is |
tss_window |
An integer specifying the window size around the TSS. Default is |
A data frame containing the following columns:
peak |
The genomic coordinates of the peaks. |
peak_id |
The unique ID of each peak. |
gene_id |
The gene ID associated with the peak. |
gene_name |
The gene name associated with the peak. |
transcript_id |
The transcript ID associated with the peak. |
gene_type |
The type of the gene (e.g., protein-coding). |
distance |
The distance from the peak center to the TSS. |
# Load bulk mouse dataset data(multiome_human_mouse) # This will load atacseq_data, rnaseq_data, peaks_gr # Path to the GTF file in the package gtf_file <- system.file("extdata", "gencode.vM25.annotation.sample.gtf", package = "ocrRBBR") # Example usage for linking peaks to TSS linked_peaks <- link_peaks_to_tss( gtf_file = gtf_file, peaks_gr = mouse_peaks_gr, gene_list = c("Rag2"), tss_window = 100000 ) # Filter results for a specific gene linked_peaks_gene <- linked_peaks[linked_peaks$gene_name == "Rag2", ]# Load bulk mouse dataset data(multiome_human_mouse) # This will load atacseq_data, rnaseq_data, peaks_gr # Path to the GTF file in the package gtf_file <- system.file("extdata", "gencode.vM25.annotation.sample.gtf", package = "ocrRBBR") # Example usage for linking peaks to TSS linked_peaks <- link_peaks_to_tss( gtf_file = gtf_file, peaks_gr = mouse_peaks_gr, gene_list = c("Rag2"), tss_window = 100000 ) # Filter results for a specific gene linked_peaks_gene <- linked_peaks[linked_peaks$gene_name == "Rag2", ]
Mouse ATAC-seq data
mouse_atacseq_datamouse_atacseq_data
An object of class data.frame with 23 rows and 85 columns.
Mouse peaks GRanges
mouse_peaks_grmouse_peaks_gr
An object of class GRanges of length 512595.
Mouse RNA-seq data
mouse_rnaseq_datamouse_rnaseq_data
An object of class data.frame with 2 rows and 85 columns.
This function predicts Boolean rule sets for a given gene using bulk-level multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.
ocrRBBR_bulk( rnaseq_data, atacseq_data, gene_name, peak_ids, max_feature = NA, slope = NA, num_cores = NA, verbose = FALSE )ocrRBBR_bulk( rnaseq_data, atacseq_data, gene_name, peak_ids, max_feature = NA, slope = NA, num_cores = NA, verbose = FALSE )
rnaseq_data |
A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns correspond to cell types or samples.
**Note:** |
atacseq_data |
A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks, columns correspond to cell types or samples. Column names must match those of |
gene_name |
A character string specifying the gene for which to infer Boolean rules. |
peak_ids |
A vector of peak identifiers corresponding to rows in |
max_feature |
An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3. |
slope |
The slope parameter for the sigmoid activation function. Default is 10. |
num_cores |
Number of parallel workers to use for computation. Adjust according to your system. Default is NA (automatic selection). |
verbose |
Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE. |
A list containing predicted Boolean rules and associated metrics for the input gene.
# Load bulk mouse dataset data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr # Example usage: peak_ids <- c(278352, 278362, 278381, 278384) boolean_rules <- ocrRBBR_bulk(mouse_rnaseq_data, mouse_atacseq_data, "Rag2", peak_ids = peak_ids, max_feature = 3, slope = 10, num_cores = 1)# Load bulk mouse dataset data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr # Example usage: peak_ids <- c(278352, 278362, 278381, 278384) boolean_rules <- ocrRBBR_bulk(mouse_rnaseq_data, mouse_atacseq_data, "Rag2", peak_ids = peak_ids, max_feature = 3, slope = 10, num_cores = 1)
This function predicts Boolean rule sets for a given gene using single-cell multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.
ocrRBBR_single_cell( rnaseq_data, atacseq_data, gene_name, peak_ids, max_feature = NA, slope = NA, num_cores = NA, ESS = NA, meta_data, verbose = FALSE )ocrRBBR_single_cell( rnaseq_data, atacseq_data, gene_name, peak_ids, max_feature = NA, slope = NA, num_cores = NA, ESS = NA, meta_data, verbose = FALSE )
rnaseq_data |
A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns
correspond to cells or samples. RNA-seq values are assumed to be normalized using Seurat's
|
atacseq_data |
A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks,
columns correspond to cells or samples. ATAC-seq counts are assumed to be normalized
by the
|
gene_name |
A character string specifying the gene for which to infer Boolean rules. |
peak_ids |
A vector of peak identifiers corresponding to rows in |
max_feature |
An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3. |
slope |
The slope parameter for the sigmoid activation function. Default is 10. |
num_cores |
Number of parallel workers to use for computation. Adjust according to your system. Default is NA. |
ESS |
Effective sample size of the single-cell data after accounting for noise and cell-to-cell correlation. |
meta_data |
A numeric matrix or data.frame containing additional per-cell covariates (rows = cells, columns = covariates), such as: **nCount_RNA**: The total number of RNA molecules (unique molecular identifiers, UMIs) detected per cell, calculated as the sum of UMI counts across all genes. **nFeature_RNA**: The number of genes detected per cell (genes with at least one UMI). **Mitochondrial percentage**: The percentage of reads that map to mitochondrial genes, which can be used to assess the quality of the sample. This information is typically stored as columns in the |
verbose |
Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE. |
A list containing predicted Boolean rules and associated metrics for the input gene.
# Load single-cell human dataset data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr, meta_data # Example usage: peak_ids <- c(83456, 83458, 83460) boolean_rules <- ocrRBBR_single_cell(human_rnaseq_data, human_atacseq_data, "CD74", peak_ids = peak_ids, max_feature = 3, slope = 6, num_cores = 1, ESS = 261, meta_data = human_meta_data)# Load single-cell human dataset data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr, meta_data # Example usage: peak_ids <- c(83456, 83458, 83460) boolean_rules <- ocrRBBR_single_cell(human_rnaseq_data, human_atacseq_data, "CD74", peak_ids = peak_ids, max_feature = 3, slope = 6, num_cores = 1, ESS = 261, meta_data = human_meta_data)