===== Usage ===== Currently this module only supports genotyping and merging small variants (SNV and INDELS). For this we have the following command line submodule called **small_variants**. Which have the following sub-commands: * `generate`_: To run GetBaseCountMultiSample on given BAM files * `merge`_: To merge MAF format files w.r.t counts generated from the `generate` command. * `all`_: This will run both of the sub-commands above `generate` and `merge` togather. * `multiple-samples`_: This will run sub-commands `all` for multiple patients in the provided metadata file generate -------- To use `small_variants generate` via command line here are the options:: > genotype_variants small_variants generate --help Usage: genotype_variants small_variants generate [OPTIONS] Command that helps to generate genotyped MAF, the output file will be labelled with patient identifier as prefix Options: -i, --input-maf PATH Full path to small variants input file in MAF format [required] -r, --reference-fasta PATH Full path to reference file in FASTA format [required] -p, --patient-id TEXT Alphanumeric string indicating patient identifier [required] -b, --standard-bam PATH Full path to standard bam file, Note: This option assumes that the .bai file is present at same location as the bam file -d, --duplex-bam PATH Full path to duplex bam file, Note: This option assumes that the .bai file is present at same location as the bam file -s, --simplex-bam PATH Full path to simplex bam file, Note: This option assumes that the .bai file is present at same location as the bam file -g, --gbcms-path PATH Full path to GetBaseCountMultiSample executable with fragment support [required] -fd, --filter-duplicate INTEGER Filter duplicate parameter for GetBaseCountMultiSample -fc, --fragment-count INTEGER Fragment Count parameter for GetBaseCountMultiSample -mapq, --mapping-quality INTEGER Mapping quality for GetBaseCountMultiSample -t, --threads INTEGER Number of threads to use for GetBaseCountMultiSample -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG --help Show this message and exit. .. code-block:: console genotype_variants small_variants generate \ -i /path/to/input_maf \ -r /path/to/reference_fasta \ -g /path/to/GetBaseCountsMultiSample \ -p patient_id \ -b standard_bam \ -d duplex_bam \ -s simplex_bam Expected Output """"""""""""""" In the current worrking directory if the above command is executed you will find the following files: * patient_id-STANDARD_genotyped.maf * patient_id-DUPLEX_genotyped.maf * patient_id-SIMPLEX_genotyped.maf merge ----- To use `small_variants merge` via command line here are the options:: > genotype_variants small_variants merge --help Usage: genotype_variants small_variants merge [OPTIONS] Given original input MAF used as an input for GBCMS along with GBCMS generated output MAF for standard_bam, duplex_bam or simplex bam, Merge them into a single output MAF format. If both duplex_bam and simplex_bam based MAF are provided the program will generate merged genotypes as well. The output file will be based on the give alphanumeric patient identifier as suffix. Options: -i, --input-maf PATH Full path to small variants input file in MAF format used for input to GBCMS for generating genotypes -std, --input-standard-maf PATH Full path to small variants input file in MAF format generated by GBCMS for standard_bam -d, --input-duplex-maf PATH Full path to small variants input file in MAF format generated by GBCMS for duplex_bam -s, --input-simplex-maf PATH Full path to small variants input file in MAF format generated by GBCMS for simplex_bam -p, --patient-id TEXT Alphanumeric string indicating patient identifier [required] -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG --help Show this message and exit. .. code-block:: console genotype_variants small_variants merge \ -i /path/to/input_maf \ -std /path/to/standard_bam_genotyped_maf \ -d /path/to/duplex_bam_genotyped_maf \ -s /path/to/simplex_bam_genotyped_maf \ -p patient_id \ Expected Output """"""""""""""" In the current worrking directory if the above command is executed you will find the following files: * patient_id-ORG-STD-SIMPLEX-DUPLEX_genotyped.maf If only input_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be: * patient_id-ORG-SIMPLEX-DUPLEX_genotyped.maf If only standard_bam_genotyped_maf with duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be: * patient_id-STD-SIMPLEX-DUPLEX_genotyped.maf If only duplex_bam_genotyped_maf and simplex_bam_genotyped_maf is given then the output file will be: * patient_id-SIMPLEX-DUPLEX_genotyped.maf all --- To use `small_variants all` via command line here are the options:: > genotype_variants small_variants all --help Usage: genotype_variants small_variants all [OPTIONS] Command that helps to generate genotyped MAF and merge the genotyped MAF. the output file will be labelled with patient identifier as prefix Options: -i, --input-maf PATH Full path to small variants input file in MAF format [required] -r, --reference-fasta PATH Full path to reference file in FASTA format [required] -p, --patient-id TEXT Alphanumeric string indicating patient identifier [required] -b, --standard-bam PATH Full path to standard bam file, Note: This option assumes that the .bai file is present at same location as the bam file -d, --duplex-bam PATH Full path to duplex bam file, Note: This option assumes that the .bai file is present at same location as the bam file -s, --simplex-bam PATH Full path to simplex bam file, Note: This option assumes that the .bai file is present at same location as the bam file -g, --gbcms-path PATH Full path to GetBaseCountMultiSample executable with fragment support [required] -fd, --filter-duplicate INTEGER Filter duplicate parameter for GetBaseCountMultiSample -fc, --fragment-count INTEGER Fragment Count parameter for GetBaseCountMultiSample -mapq, --mapping-quality INTEGER Mapping quality for GetBaseCountMultiSample -t, --threads INTEGER Number of threads to use for GetBaseCountMultiSample -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG --help Show this message and exit. .. code-block:: console genotype_variants small_variants all \ -i /path/to/input_maf \ -r /path/to/reference_fasta \ -g /path/to/GetBaseCountsMultiSample \ -p patient_id \ -b standard_bam \ -d duplex_bam \ -s simplex_bam Expected Output """"""""""""""" Please refer to the `generate` and `merge` usage for the expected output. multiple-samples ---------------- To use `small_variants multiple-samples` via command line here are the options:: genotype_variants small_variants multiple-samples --help Usage: genotype_variants small_variants multiple-samples [OPTIONS] Command that helps to generate genotyped MAF and merge the genotyped MAF for multiple patients. the output file will be labelled with sample identifier as prefix Expected header of metadata_file in any order: sample_id maf standard_bam duplex_bam simplex_bam For maf, standard_bam, duplex_bam and simplex_bam please include full path to the file. Options: -i, --input-metadata PATH Full path to metadata file in TSV/EXCEL format, with following headers: sample_id, maf, standard_bam, duplex_bam, simplex_bam. Make sure to use full paths inside the metadata file [required] -r, --reference-fasta PATH Full path to reference file in FASTA format [required] -g, --gbcms-path PATH Full path to GetBaseCountMultiSample executable with fragment support [required] -fd, --filter-duplicate INTEGER Filter duplicate parameter for GetBaseCountMultiSample -fc, --fragment-count INTEGER Fragment Count parameter for GetBaseCountMultiSample -mapq, --mapping-quality INTEGER Mapping quality for GetBaseCountMultiSample -t, --threads INTEGER Number of threads to use for GetBaseCountMultiSample -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG --help Show this message and exit. .. code-block:: console genotype_variants small_variants multiple-samples \ -i /path/to/input_metadata \ -r /path/to/reference_fasta \ -g /path/to/GetBaseCountsMultiSample Expected Output """"""""""""""" Please refer to the `generate` and `merge` usage for the expected output. To use genotype_variants in a project:: import genotype_variants