Skip to content

Make nicely crafted violins to represent SNV effect of on gene expression.

Notifications You must be signed in to change notification settings

HTGenomeAnalysisUnit/eQTL_luthier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

eQTL_luthier

eQTL_luthier logo

Make nicely crafted violins to represent SNV effect of on gene expression.

Installation

For the moment just source or import the eQTL_luthier.R script.

Usage

The main function is eqtl_plot(), which takes care of fetching the data, plotting it, and saving the output if desired. The function always returns a ggplot object invisibly that you can further customize if needed.

NB. at the moment the function only supports celltype_2 celltypes

eqtl_plot <- function(genotype_db, gene_exp_parquet, gene_exp_type, cohort, gene_id, snp_id, celltype, plot_filename = NULL, width = 10, height = 7, ...)

If you need to generate for a specific cohort, gene, SNP combination across cell-types you can use multi_celltype_eqtl_plot function. Similarly, if you want to generate for a specific celltype, gene, SNP combination across cohorts you can use multi_cohorts_eqtl_plot function. These will return a patchwork object with the individual plots.

  p <- multi_celltype_eqtl_plot(
    genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
    gene_exp_type = 'sc',
    cohort = 'ukb',
    gene_id = 'ENSG00000187608',
    snp_id = '1:996120:C:T',
    celltypes = c('T_CD4_CM', 'T_CD8_EM', 'T_CD4_naive'),
    plot_filename = "test_multi_celltypes.pdf",
    exclude_zeros = TRUE,
    device = 'pdf')

  p <- multi_cohorts_eqtl_plot(
    genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
    gene_exp_type = 'sc',
    cohorts = c('ukb', 'gh', 'pak', 'bang'),
    gene_id = 'ENSG00000187608',
    snp_id = '1:996120:C:T',
    celltype = 'T_CD4_CM',
    plot_filename = "test_multi_cohorts.pdf",
    exclude_zeros = TRUE,
    device = 'pdf',
    ncol=2)

Function Arguments

  • genotype_db (string): Path to the DuckDB database file containing genotype data.
  • gene_exp_parquet (string): Base directory path of the gene expression parquet files.
  • gene_exp_type (string): The type of gene expression data (e.g., "sc", "pseudobulk").
  • cohort (string): The cohort to filter donors by. One of "ukb", "gh", "pak", or "bang".
  • gene_id (string): The ENSEMBL ID of the gene to query (e.g., "ENSG00000119147").
  • snp_id (string): The SNP identifier (e.g., "2:106071035:C:T").
  • celltype (string): The name of the cell type to filter for (e.g., "T_CD4_EM").
  • plot_filename (string, optional): Path and name of the output file. If NULL, the plot is displayed in the active R graphics device. File format is inferred from the extension (e.g., "png", "pdf", "svg").
  • width (numeric): The width of the saved plot in inches.
  • height (numeric): The height of the saved plot in inches.
  • ... (various): Additional arguments passed to make_eqtl_violin and ggsave. Optional: show_trend, exclude_zeros, device.

You can pass show_trend = TRUE to add a trend line to the violin plot connecting the median points, and exclude_zeros = TRUE to exclude cells with zero expression from the plot.

Access the data

In case you want to make a customised plot, you can use the fetch_eqtl_data function to get the data frame used for plotting. This returns a data frame where

  • gene_expression is the expression of the specified gene in the specified cell type
  • genotype is the genotype at the specified SNP.
  • sample_id / cell_id is the record identifier (depending on whether pseudobulk or single-cell data is used)
  • celltype is the cell type selected
df <- fetch_eqtl_data(
  genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
  gene_exp_type = 'sc',
  cohort = 'ukb',
  gene_id = 'ENSG00000187608',
  snp_id = '1:996120:C:T',
  celltype = 'T_CD4_CM'
)
head(df)

Data sources

Genotype database

  • genotype_db: /lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb

Gene expression data

You don't need to input them, the correct data source for CARDINAL Freeze3 is selected automatically based on the cohort and gene_exp_type arguments.

  • gene expression data
    • UKB
      • single-cell: /lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/merged_h5ad/ducklake/ukb-ContaminationFree-noDoublets-annotated-QCed-pflog1ppf
      • pseudobulk: /lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/pseudobulk/celltype_2/ukb-qced-cells/parquet_dataset_long
    • GH
      • single-cell: /lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/merged_h5ad/ducklake/gh-ContaminationFree-noDoublets-annotated-QCed-pflog1ppf
      • pseudobulk: /lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/pseudobulk/celltype_2/gh-qced-cells/parquet_dataset_long

About

Make nicely crafted violins to represent SNV effect of on gene expression.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages