Package 'ocrRBBR'

Title: Explain Gene Expression with Boolean Rules of Chromatin States
Description: Infers Boolean rules among cis-regulatory regions using paired chromatin accessibility and gene expression data at bulk and single-cell levels. Links regulatory regions to target genes, providing insights into gene regulation mechanisms.
Authors: Seyed Amir Malekpour [aut, cre]
Maintainer: Seyed Amir Malekpour <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2026-05-11 06:47:20 UTC
Source: https://github.com/compbioipm/ocrrbbr

Help Index


Estimate Effective Sample Size from Single-Cell RNA-seq Data

Description

This function estimates the effective sample size (ESS) of single-cell RNA-seq data by accounting for correlation among cells within the same cell type.

Usage

ESS(rnaseq_data, cell_type, verbose = FALSE)

Arguments

rnaseq_data

A numeric matrix of RNA-seq expression values. Rows correspond to genes and columns correspond to cells. Expression values are assumed to be normalized (e.g., Seurat LogNormalize with scale.factor = 1e4).

cell_type

A data frame containing cell-type information. Row names must match the column names of rnaseq_data. Must include a column named cell_type specifying cell types.

verbose

Logical. If TRUE, progress messages are printed.

Value

A numeric value representing the effective sample size (ESS), adjusted for within-cell-type correlation.


Human ATAC-seq data

Description

Human ATAC-seq data

Usage

human_atacseq_data

Format

An object of class dgCMatrix with 43 rows and 9834 columns.


Human metadata

Description

Human metadata

Usage

human_meta_data

Format

An object of class matrix (inherits from array) with 9834 rows and 3 columns.


Human peaks GRanges

Description

Human peaks GRanges

Usage

human_peaks_gr

Format

An object of class GRanges of length 108377.


Human RNA-seq data

Description

Human RNA-seq data

Usage

human_rnaseq_data

Format

An object of class dgCMatrix with 2 rows and 9834 columns.


Mouse ATAC-seq data

Description

Mouse ATAC-seq data

Usage

mouse_atacseq_data

Format

An object of class data.frame with 23 rows and 85 columns.


Mouse peaks GRanges

Description

Mouse peaks GRanges

Usage

mouse_peaks_gr

Format

An object of class GRanges of length 512595.


Mouse RNA-seq data

Description

Mouse RNA-seq data

Usage

mouse_rnaseq_data

Format

An object of class data.frame with 2 rows and 85 columns.


Predicts OCR-driven Boolean rules for a gene

Description

This function predicts Boolean rule sets for a given gene using bulk-level multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.

Usage

ocrRBBR_bulk(
  rnaseq_data,
  atacseq_data,
  gene_name,
  peak_ids,
  max_feature = NA,
  slope = NA,
  num_cores = NA,
  verbose = FALSE
)

Arguments

rnaseq_data

A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns correspond to cell types or samples. **Note:** ocrRBBR was tested using **quantile-normalized RNA-seq data**, but it should also work equally well on **TPM-normalized RNA-seq datasets**, provided the data is appropriately scaled across samples.

atacseq_data

A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks, columns correspond to cell types or samples. Column names must match those of rnaseq_data. **Note:** Similar to RNA-seq data, ocrRBBR is tested using **quantile-normalized ATAC-seq data** but is expected to work with other normalization methods, as long as the data distributions are comparable across samples.

gene_name

A character string specifying the gene for which to infer Boolean rules.

peak_ids

A vector of peak identifiers corresponding to rows in atacseq_data to be used as candidate regulatory regions for gene_name.

max_feature

An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3.

slope

The slope parameter for the sigmoid activation function. Default is 10.

num_cores

Number of parallel workers to use for computation. Adjust according to your system. Default is NA (automatic selection).

verbose

Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE.

Value

A list containing predicted Boolean rules and associated metrics for the input gene.

Examples

# Load bulk mouse dataset
data(multiome_human_mouse)  # loads atacseq_data, rnaseq_data, peaks_gr

# Example usage:
peak_ids <- c(278352, 278362, 278381, 278384)

boolean_rules <- ocrRBBR_bulk(mouse_rnaseq_data, mouse_atacseq_data, "Rag2",
    peak_ids = peak_ids, max_feature = 3, slope = 10, num_cores = 1)

Predicts OCR-driven Boolean rules for a gene

Description

This function predicts Boolean rule sets for a given gene using single-cell multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.

Usage

ocrRBBR_single_cell(
  rnaseq_data,
  atacseq_data,
  gene_name,
  peak_ids,
  max_feature = NA,
  slope = NA,
  num_cores = NA,
  ESS = NA,
  meta_data,
  verbose = FALSE
)

Arguments

rnaseq_data

A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns correspond to cells or samples. RNA-seq values are assumed to be normalized using Seurat's LogNormalize method with a scale factor of 10,000: NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 1e4).

atacseq_data

A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks, columns correspond to cells or samples. ATAC-seq counts are assumed to be normalized by the ReadsInTSS metric on a per-cell basis, where raw peak counts are divided by the number of Tn5 insertions falling within transcription start site (TSS) regions for each cell. This normalization corrects for differences in sequencing depth and chromatin accessibility signal across cells.

ReadsInTSS values are typically obtained from ArchR and applied as column-wise scaling factors to the ATAC-seq count matrix prior to downstream analysis.

gene_name

A character string specifying the gene for which to infer Boolean rules.

peak_ids

A vector of peak identifiers corresponding to rows in atacseq_data to be used as candidate regulatory regions for gene_name.

max_feature

An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3.

slope

The slope parameter for the sigmoid activation function. Default is 10.

num_cores

Number of parallel workers to use for computation. Adjust according to your system. Default is NA.

ESS

Effective sample size of the single-cell data after accounting for noise and cell-to-cell correlation.

meta_data

A numeric matrix or data.frame containing additional per-cell covariates (rows = cells, columns = covariates), such as:

**nCount_RNA**: The total number of RNA molecules (unique molecular identifiers, UMIs) detected per cell, calculated as the sum of UMI counts across all genes.

**nFeature_RNA**: The number of genes detected per cell (genes with at least one UMI).

**Mitochondrial percentage**: The percentage of reads that map to mitochondrial genes, which can be used to assess the quality of the sample.

This information is typically stored as columns in the meta_data object, which is associated with each cell in the dataset.

verbose

Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE.

Value

A list containing predicted Boolean rules and associated metrics for the input gene.

Examples

# Load single-cell human dataset
data(multiome_human_mouse)  # loads atacseq_data, rnaseq_data, peaks_gr, meta_data

# Example usage:
peak_ids <- c(83456, 83458, 83460)

boolean_rules <- ocrRBBR_single_cell(human_rnaseq_data, human_atacseq_data,
       "CD74", peak_ids = peak_ids, max_feature = 3, slope = 6,
        num_cores = 1, ESS = 261, meta_data = human_meta_data)