Usage

Currently this module only supports genotyping and merging small variants (SNV and INDELS).

For this we have the following command line submodule called small_variants.

Which have the following sub-commands:

  • generate: To run GetBaseCountMultiSample on given BAM files
  • merge: To merge MAF format files w.r.t counts generated from the generate command.
  • all: This will run both of the sub-commands above generate and merge togather.
  • multiple-samples: This will run sub-commands all for multiple patients in the provided metadata file

generate

To use small_variants generate via command line here are the options:

> genotype_variants small_variants generate --help
Usage: genotype_variants small_variants generate [OPTIONS]

Command that helps to generate genotyped MAF, the output file will be
labelled with  patient identifier as prefix

Options:
-i, --input-maf PATH            Full path to small variants input file in
                                MAF format  [required]
-r, --reference-fasta PATH      Full path to reference file in FASTA format
                                [required]
-p, --patient-id TEXT           Alphanumeric string indicating patient
                                identifier  [required]
-b, --standard-bam PATH         Full path to standard bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-d, --duplex-bam PATH           Full path to duplex bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-s, --simplex-bam PATH          Full path to simplex bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-g, --gbcms-path PATH           Full path to GetBaseCountMultiSample
                                executable with fragment support  [required]
-fd, --filter-duplicate INTEGER
                                Filter duplicate parameter for
                                GetBaseCountMultiSample
-fc, --fragment-count INTEGER   Fragment Count parameter for
                                GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
                                Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER           Number of threads to use for
                                GetBaseCountMultiSample
-v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                DEBUG
--help                          Show this message and exit.
genotype_variants small_variants generate \
-i /path/to/input_maf \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample \
-p patient_id \
-b standard_bam \
-d duplex_bam \
-s simplex_bam

Expected Output

In the current worrking directory if the above command is executed you will find the following files:

  • patient_id-STANDARD_genotyped.maf
  • patient_id-DUPLEX_genotyped.maf
  • patient_id-SIMPLEX_genotyped.maf

merge

To use small_variants merge via command line here are the options:

> genotype_variants small_variants merge --help
Usage: genotype_variants small_variants merge [OPTIONS]

Given original input MAF used as an input for GBCMS along with  GBCMS
generated output MAF for standard_bam, duplex_bam or simplex bam,  Merge
them into a single output MAF format.  If both duplex_bam and simplex_bam
based MAF are provided the program will generate merged genotypes as well.
The output file will be based on the give alphanumeric patient identifier
as suffix.

Options:
-i, --input-maf PATH            Full path to small variants input file in
                                MAF format used for input to GBCMS for
                                generating genotypes
-std, --input-standard-maf PATH
                                Full path to small variants input file in
                                MAF format generated by GBCMS for
                                standard_bam
-d, --input-duplex-maf PATH     Full path to small variants input file in
                                MAF format generated by GBCMS for duplex_bam
-s, --input-simplex-maf PATH    Full path to small variants input file in
                                MAF format generated by GBCMS for
                                simplex_bam
-p, --patient-id TEXT           Alphanumeric string indicating patient
                                identifier  [required]
-v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                DEBUG
--help                          Show this message and exit.
genotype_variants small_variants merge \
-i /path/to/input_maf \
-std /path/to/standard_bam_genotyped_maf \
-d /path/to/duplex_bam_genotyped_maf \
-s /path/to/simplex_bam_genotyped_maf \
-p patient_id \

Expected Output

In the current worrking directory if the above command is executed you will find the following files:

  • patient_id-ORG-STD-SIMPLEX-DUPLEX_genotyped.maf

If only input_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:

  • patient_id-ORG-SIMPLEX-DUPLEX_genotyped.maf

If only standard_bam_genotyped_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:

  • patient_id-STD-SIMPLEX-DUPLEX_genotyped.maf

If only duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be:

  • patient_id-SIMPLEX-DUPLEX_genotyped.maf

all

To use small_variants all via command line here are the options:

> genotype_variants small_variants all --help
Usage: genotype_variants small_variants all [OPTIONS]

Command that helps to generate genotyped MAF and merge the genotyped MAF.
the output file will be labelled with patient identifier as prefix

Options:
-i, --input-maf PATH            Full path to small variants input file in
                                MAF format  [required]
-r, --reference-fasta PATH      Full path to reference file in FASTA format
                                [required]
-p, --patient-id TEXT           Alphanumeric string indicating patient
                                identifier  [required]
-b, --standard-bam PATH         Full path to standard bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-d, --duplex-bam PATH           Full path to duplex bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-s, --simplex-bam PATH          Full path to simplex bam file, Note: This
                                option assumes that the .bai file is present
                                at same location as the bam file
-g, --gbcms-path PATH           Full path to GetBaseCountMultiSample
                                executable with fragment support  [required]
-fd, --filter-duplicate INTEGER
                                Filter duplicate parameter for
                                GetBaseCountMultiSample
-fc, --fragment-count INTEGER   Fragment Count parameter for
                                GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
                                Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER           Number of threads to use for
                                GetBaseCountMultiSample
-v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                DEBUG
--help                          Show this message and exit.
genotype_variants small_variants all \
-i /path/to/input_maf \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample \
-p patient_id \
-b standard_bam \
-d duplex_bam \
-s simplex_bam

Expected Output

Please refer to the generate and merge usage for the expected output.

multiple-samples

To use small_variants multiple-samples via command line here are the options:

genotype_variants small_variants multiple-samples --help
Usage: genotype_variants small_variants multiple-samples [OPTIONS]

Command that helps to generate genotyped MAF and  merge the genotyped MAF
for multiple patients. the output file will be labelled with sample
identifier as prefix

Expected header of metadata_file in any order: sample_id maf standard_bam
duplex_bam simplex_bam

For maf, standard_bam, duplex_bam and simplex_bam please include full path
to the file.

Options:
-i, --input-metadata PATH       Full path to metadata file in TSV/EXCEL
                                format, with following headers: sample_id,
                                maf, standard_bam, duplex_bam, simplex_bam.
                                Make sure to use full paths inside the
                                metadata file  [required]
-r, --reference-fasta PATH      Full path to reference file in FASTA format
                                [required]
-g, --gbcms-path PATH           Full path to GetBaseCountMultiSample
                                executable with fragment support  [required]
-fd, --filter-duplicate INTEGER
                                Filter duplicate parameter for
                                GetBaseCountMultiSample
-fc, --fragment-count INTEGER   Fragment Count parameter for
                                GetBaseCountMultiSample
-mapq, --mapping-quality INTEGER
                                Mapping quality for GetBaseCountMultiSample
-t, --threads INTEGER           Number of threads to use for
                                GetBaseCountMultiSample
-v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                DEBUG
--help                          Show this message and exit.
genotype_variants small_variants multiple-samples \
-i /path/to/input_metadata \
-r /path/to/reference_fasta \
-g /path/to/GetBaseCountsMultiSample

Expected Output

Please refer to the generate and merge usage for the expected output.

To use genotype_variants in a project:

import genotype_variants