Usage¶
Currently this module only supports genotyping and merging small variants (SNV and INDELS).
For this we have the following command line submodule called small_variants.
Which have the following sub-commands:
- generate: To run GetBaseCountMultiSample on given BAM files
- merge: To merge MAF format files w.r.t counts generated from the generate command.
- all: This will run both of the sub-commands above generate and merge togather.
- multiple-samples: This will run sub-commands all for multiple patients in the provided metadata file
generate¶
To use small_variants generate via command line here are the options:
> genotype_variants small_variants generate --help
Usage: genotype_variants small_variants generate [OPTIONS]
Command that helps to generate genotyped MAF, the output file will be
labelled with patient identifier as prefix
Options:
-i, --input-maf PATH Full path to small variants input file in
MAF format [required]
-r, --reference-fasta PATH Full path to reference file in FASTA format
[required]
-p, --patient-id TEXT Alphanumeric string indicating patient
identifier [required]
-b, --standard-bam PATH Full path to standard bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-d, --duplex-bam PATH Full path to duplex bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-s, --simplex-bam PATH Full path to simplex bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-g, --gbcms-path PATH Full path to GetBaseCountMultiSample
executable with fragment support [required]
-fd, --filter-duplicate INTEGER
Filter duplicate parameter for
GetBaseCountMultiSample
-fc, --fragment-count INTEGER Fragment Count parameter for
GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER Number of threads to use for
GetBaseCountMultiSample
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
genotype_variants small_variants generate \
-i /path/to/input_maf \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample \
-p patient_id \
-b standard_bam \
-d duplex_bam \
-s simplex_bam
Expected Output¶
In the current worrking directory if the above command is executed you will find the following files:
- patient_id-STANDARD_genotyped.maf
- patient_id-DUPLEX_genotyped.maf
- patient_id-SIMPLEX_genotyped.maf
merge¶
To use small_variants merge via command line here are the options:
> genotype_variants small_variants merge --help
Usage: genotype_variants small_variants merge [OPTIONS]
Given original input MAF used as an input for GBCMS along with GBCMS
generated output MAF for standard_bam, duplex_bam or simplex bam, Merge
them into a single output MAF format. If both duplex_bam and simplex_bam
based MAF are provided the program will generate merged genotypes as well.
The output file will be based on the give alphanumeric patient identifier
as suffix.
Options:
-i, --input-maf PATH Full path to small variants input file in
MAF format used for input to GBCMS for
generating genotypes
-std, --input-standard-maf PATH
Full path to small variants input file in
MAF format generated by GBCMS for
standard_bam
-d, --input-duplex-maf PATH Full path to small variants input file in
MAF format generated by GBCMS for duplex_bam
-s, --input-simplex-maf PATH Full path to small variants input file in
MAF format generated by GBCMS for
simplex_bam
-p, --patient-id TEXT Alphanumeric string indicating patient
identifier [required]
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
genotype_variants small_variants merge \
-i /path/to/input_maf \
-std /path/to/standard_bam_genotyped_maf \
-d /path/to/duplex_bam_genotyped_maf \
-s /path/to/simplex_bam_genotyped_maf \
-p patient_id \
Expected Output¶
In the current worrking directory if the above command is executed you will find the following files:
- patient_id-ORG-STD-SIMPLEX-DUPLEX_genotyped.maf
If only input_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:
- patient_id-ORG-SIMPLEX-DUPLEX_genotyped.maf
If only standard_bam_genotyped_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:
- patient_id-STD-SIMPLEX-DUPLEX_genotyped.maf
If only duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:
- patient_id-SIMPLEX-DUPLEX_genotyped.maf
all¶
To use small_variants all via command line here are the options:
> genotype_variants small_variants all --help
Usage: genotype_variants small_variants all [OPTIONS]
Command that helps to generate genotyped MAF and merge the genotyped MAF.
the output file will be labelled with patient identifier as prefix
Options:
-i, --input-maf PATH Full path to small variants input file in
MAF format [required]
-r, --reference-fasta PATH Full path to reference file in FASTA format
[required]
-p, --patient-id TEXT Alphanumeric string indicating patient
identifier [required]
-b, --standard-bam PATH Full path to standard bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-d, --duplex-bam PATH Full path to duplex bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-s, --simplex-bam PATH Full path to simplex bam file, Note: This
option assumes that the .bai file is present
at same location as the bam file
-g, --gbcms-path PATH Full path to GetBaseCountMultiSample
executable with fragment support [required]
-fd, --filter-duplicate INTEGER
Filter duplicate parameter for
GetBaseCountMultiSample
-fc, --fragment-count INTEGER Fragment Count parameter for
GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER Number of threads to use for
GetBaseCountMultiSample
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
genotype_variants small_variants all \
-i /path/to/input_maf \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample \
-p patient_id \
-b standard_bam \
-d duplex_bam \
-s simplex_bam
Expected Output¶
Please refer to the generate and merge usage for the expected output.
multiple-samples¶
To use small_variants multiple-samples via command line here are the options:
genotype_variants small_variants multiple-samples --help
Usage: genotype_variants small_variants multiple-samples [OPTIONS]
Command that helps to generate genotyped MAF and merge the genotyped MAF
for multiple patients. the output file will be labelled with sample
identifier as prefix
Expected header of metadata_file in any order: sample_id maf standard_bam
duplex_bam simplex_bam
For maf, standard_bam, duplex_bam and simplex_bam please include full path
to the file.
Options:
-i, --input-metadata PATH Full path to metadata file in TSV/EXCEL
format, with following headers: sample_id,
maf, standard_bam, duplex_bam, simplex_bam.
Make sure to use full paths inside the
metadata file [required]
-r, --reference-fasta PATH Full path to reference file in FASTA format
[required]
-g, --gbcms-path PATH Full path to GetBaseCountMultiSample
executable with fragment support [required]
-fd, --filter-duplicate INTEGER
Filter duplicate parameter for
GetBaseCountMultiSample
-fc, --fragment-count INTEGER Fragment Count parameter for
GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER Number of threads to use for
GetBaseCountMultiSample
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
genotype_variants small_variants multiple-samples \
-i /path/to/input_metadata \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample
Expected Output¶
Please refer to the generate and merge usage for the expected output.
To use genotype_variants in a project:
import genotype_variants