Make nicely crafted violins to represent SNV effect of on gene expression.
For the moment just source or import the eQTL_luthier.R script.
The main function is eqtl_plot(), which takes care of fetching the data, plotting it, and saving the output if desired. The function always returns a ggplot object invisibly that you can further customize if needed.
NB. at the moment the function only supports celltype_2 celltypes
eqtl_plot <- function(genotype_db, gene_exp_parquet, gene_exp_type, cohort, gene_id, snp_id, celltype, plot_filename = NULL, width = 10, height = 7, ...)If you need to generate for a specific cohort, gene, SNP combination across cell-types you can use multi_celltype_eqtl_plot function. Similarly, if you want to generate for a specific celltype, gene, SNP combination across cohorts you can use multi_cohorts_eqtl_plot function. These will return a patchwork object with the individual plots.
p <- multi_celltype_eqtl_plot(
genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
gene_exp_type = 'sc',
cohort = 'ukb',
gene_id = 'ENSG00000187608',
snp_id = '1:996120:C:T',
celltypes = c('T_CD4_CM', 'T_CD8_EM', 'T_CD4_naive'),
plot_filename = "test_multi_celltypes.pdf",
exclude_zeros = TRUE,
device = 'pdf')
p <- multi_cohorts_eqtl_plot(
genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
gene_exp_type = 'sc',
cohorts = c('ukb', 'gh', 'pak', 'bang'),
gene_id = 'ENSG00000187608',
snp_id = '1:996120:C:T',
celltype = 'T_CD4_CM',
plot_filename = "test_multi_cohorts.pdf",
exclude_zeros = TRUE,
device = 'pdf',
ncol=2)genotype_db(string): Path to the DuckDB database file containing genotype data.gene_exp_parquet(string): Base directory path of the gene expression parquet files.gene_exp_type(string): The type of gene expression data (e.g.,"sc","pseudobulk").cohort(string): The cohort to filter donors by. One of"ukb","gh","pak", or"bang".gene_id(string): The ENSEMBL ID of the gene to query (e.g.,"ENSG00000119147").snp_id(string): The SNP identifier (e.g.,"2:106071035:C:T").celltype(string): The name of the cell type to filter for (e.g.,"T_CD4_EM").plot_filename(string, optional): Path and name of the output file. IfNULL, the plot is displayed in the active R graphics device. File format is inferred from the extension (e.g.,"png","pdf","svg").width(numeric): The width of the saved plot in inches.height(numeric): The height of the saved plot in inches....(various): Additional arguments passed tomake_eqtl_violinandggsave. Optional:show_trend,exclude_zeros,device.
You can pass show_trend = TRUE to add a trend line to the violin plot connecting the median points, and exclude_zeros = TRUE to exclude cells with zero expression from the plot.
In case you want to make a customised plot, you can use the fetch_eqtl_data function to get the data frame used for plotting. This returns a data frame where
gene_expressionis the expression of the specified gene in the specified cell typegenotypeis the genotype at the specified SNP.sample_id/cell_idis the record identifier (depending on whether pseudobulk or single-cell data is used)celltypeis the cell type selected
df <- fetch_eqtl_data(
genotype_db = '/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb',
gene_exp_type = 'sc',
cohort = 'ukb',
gene_id = 'ENSG00000187608',
snp_id = '1:996120:C:T',
celltype = 'T_CD4_CM'
)
head(df)genotype_db:/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/genotypes/genotypes_db/genotypes.duckdb
You don't need to input them, the correct data source for CARDINAL Freeze3 is selected automatically based on the cohort and gene_exp_type arguments.
- gene expression data
- UKB
- single-cell:
/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/merged_h5ad/ducklake/ukb-ContaminationFree-noDoublets-annotated-QCed-pflog1ppf - pseudobulk:
/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/pseudobulk/celltype_2/ukb-qced-cells/parquet_dataset_long
- single-cell:
- GH
- single-cell:
/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/merged_h5ad/ducklake/gh-ContaminationFree-noDoublets-annotated-QCed-pflog1ppf - pseudobulk:
/lustre/scratch124/humgen/projects_v2/cardinal_analysis/analysis/core_dataset/freeze3/pseudobulk/celltype_2/gh-qced-cells/parquet_dataset_long
- single-cell:
- UKB
