h5ad clean and annotate

A small tool that help in cleaning, subsetting and annotation h5ad file efficiently.

The tool relies on AlphaSC package from Bioturing for efficient processing, but no GPU is required.

Usage

usage: clean_and_annotate.py [-h] --h5ad H5AD --out OUT --config CONFIG

Clean, annotate, subset and other useful operations on h5ad files.

options:
  -h, --help       show this help message and exit
  --h5ad H5AD      h5ad input file
  --out OUT        Output h5ad file
  --config CONFIG  JSON file defining configuration for cleaning and processing

JSON config file

The operations to perform can be defined in a JSON file. A template structure is below and in the the config_template.json file.

{
	// Template to construct new cell IDs from obs columns. 
	// Any obs column can be accessed using curly brakets, e.g. {mycolumn}.
	// Use {index} to access the current obs index value
	"new_cell_id": "{tranche.id}--{tranche.name}--{index}", 
	
	// If true, the index will be cleaned by removing any --[0-9]+$ suffix.
	"clean_index": true,
	
	// A list of obs columns to retain
	"select_obs_columns": [
		"col1",
		"col2"
	],
	
	// A list of obs columns to remove
	"exclude_obs_columns": [
		"col1",
		"col2"
	],

	// If true, the obs column names will be sanitized by removing any special characters and spaces. 
	// >=  and <= will be replaced with greaterthan and lessthan strings
	"sanitize_obs_column_names": true,
	
	// Path to a text file containing barcodes to include in the output h5ad file.
	"subset_bc": "subset_barcodes.txt",

	// Subset the data to keep only cells where obs[column] is in values list
	"subset_on_obs": [
		{
			"column": "tranche.id",
			"values": ["T1","T2","T3"],
			"values_file": "subset_values.txt" // Optional file with one value per line to use as values list
		}
	],

	// List of paths to TSV file(s) with 2 or more columns: cell_id and annotations.
	// Annotation columns will be added to obs as new columns using the cell_id as the key.
	"annot_bc": ["cell_annotations_1.tsv", "cell_annotations_2.tsv"],
	
	// A layer to be used as the default X in the output h5ad file.
	"X_layer": "layer_name",
	"keys": [
		"uns",
		"obsm",
		"raw/X",
		"layers/layer1"
	],

	// A dictionary of "old_name": "new_name" pairs to rename obs columns.
	"rename_columns": {
		"old_name1": "new_name1",
		"old_name2": "new_name2"
	},

	// A list of dictionaries defining TSV files that can be used to annotate samples.
	// The tool will annotate obs with the table_annotation_columns from the TSV file
	// Merging keys in the input table and obs are defined by table_key_column and obs_key_column, repectively.
	"sample_annotations": [
		{
			"filename": "sample_annotations.tsv",
			"table_key_column": "sample_id",
			"obs_key_column": "sample_id",
			"table_annotation_columns": [
				"tissue",
				"treatment"
			],
			"annotation_name": "myanno1"
		},
		{
			"filename": "another_annotation.tsv",
			"table_key_column": "sample_id",
			"obs_key_column": "sample_id",
			"table_annotation_columns": [
				"ancestry"
			],
			"annotation_name": "myanno2"
		}
	]
}

Order of operations

Clean index
Make new cell ID
Add barcode-based annotations
Rename columns using the rename map provided
Create new columns based on existing ones as defined in make_columns
Add column-based annotations
Filter obs columns based on include/exclude lists
Sanitize obs columns
Filter barcodes based on the subset_bc list
Filter based on obs column values from subset_on_obs

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
clean_and_annotate.py		clean_and_annotate.py
config_template.json		config_template.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

h5ad clean and annotate

Usage

JSON config file

Order of operations

About

Uh oh!

Releases

Packages

Languages

HTGenomeAnalysisUnit/h5ad_clean_annotate

Folders and files

Latest commit

History

Repository files navigation

h5ad clean and annotate

Usage

JSON config file

Order of operations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages