Skip to content

YeoLab/encode_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the analysis for ENCODE 2025 paper

Setting up the Environment

To create the Conda environment, run the following command:

conda env create -f env.yaml
conda activate my_metadensity2

Setting up pre-commit for auto-linting

pre-commit install

Pre-commit is a framework for managing and maintaining Git pre-commit hooks. It automatically runs checks (like code formatting, linting, or other validations) on your code before you commit it to the repository. This ensures code quality and consistency across the project.

  • lint your jupyter notebook inline by %load_ext lab_black

Large files are tracked with git-lfs

Where are things?

  • scripts: contains snakemake commands to run large scale analysis including kicking off Skipper, overlap analysis etc
  • configs: Skipper configs
  • tables: intermediate tables

Where are things?

  • scripts: Contains Snakemake commands for running large-scale analyses, including initiating Skipper, overlap analysis, etc.
  • configs: Configuration files for Skipper.
  • tables: Intermediate tables generated during analysis.

Folder descriptions for figures and text:

  • 0_RBP_annotation: Counting eCLIPs, annotating GO/PPI/domains, aligning IDs, and processing other datasets.
  • 1_eCLIP_QC: Quality control of eCLIP data.
  • 2_eCLIP_clusters: Clustering and t-SNE plots (Figure 1).
  • 2_eCLIP_motifs: Motif analysis using HOMER, SELEX/RBNS, and RBPNet seqlets.
  • 2_eCLIP_overlap: Overlapping binding sites across eCLIP peaks.
  • 2_RBP_mutation: gnomAD analysis on RBPs, including o/e ratio calculations (Figure 2).
  • 3_RNAseq: Legacy folder (RNAseq is not included in this paper).
  • 4_CLIP_ML_RBPNet: Training, evaluation, model comparison, and benchmarking (Figures 3 and 4).
  • 5_gnomAD_reference_scaling: Scaling o/e and MAPs ratios to reproduce past gnomAD results (not a main finding).
  • 6_gnomAD_popgen_analysis: Selection analysis in binding sites (Figure 5).
  • 7_ClinVar_analysis: Pathogenic variant analysis in binding sites (Figure 6).

About

ENCODE paper analysis 2025 (Charlene)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages