A high-throughput genotyping-by-sequencing (GBS) pipeline integrating DeepVariant and GLnexus for accurate variant calling in plant genomics.
DeepFastGBS is an enhanced version of the FastGBS pipeline that integrates Google's DeepVariant for superior variant calling accuracy. This pipeline streamlines the processing of genotyping-by-sequencing (GBS) data, from raw sequences to high-quality variant calls.
- Complete GBS data processing pipeline
- Integration with DeepVariant for accurate variant calling
- GLnexus-based cohort variant calling
- Support for both single-end and paired-end sequencing
- Automatic handling of ILLUMINA and IONTORRENT data
- Parallel processing capabilities
- Comprehensive logging and quality control
- Automated sample filtering based on read depth
- Built-in imputation using BEAGLE 5.0
- Linux operating system
- Singularity (for running DeepVariant and GLnexus containers)
- Required software modules:
- sabre (v1.000)
- cutadapt (v3.2)
- bwa (v0.7.17)
- samtools (v1.8)
- vcftools (v0.1.16)
- java (v1.8.0)
- beagle (v5.0)
- python (v3.7)
- htslib (v1.8)
- Clone the repository:
git clone https://github.com/yourusername/FastGBS-DV.git cd FastGBS-DV- Make the scripts executable:
chmod +x fastgbs_dv.sh chmod +x Summary4VCF.py- Configure your parameters in
parameters_V2.txt:
; Edit parameters according to your data LOGFILE=logfile_fastgbs.log FLOWCELL=your_flowcell_id ...- Run the pipeline using SLURM:
sbatch SLURM_GBS.shOr run directly:
./fastgbs_dv.sh parameters_V2.txt- Demultiplexing (sabre)
- Adapter trimming (cutadapt)
- Read alignment (BWA-MEM)
- BAM processing (samtools)
- Variant calling (DeepVariant)
- Variant merging (GLnexus)
- Variant filtering and imputation (vcftools, BEAGLE)
- Summary statistics generation
- Demultiplexed and trimmed FASTQ files
- Aligned BAM files
- Variant calls in VCF format
- Imputed genotypes
- Summary statistics for variants and samples
The pipeline is configured through the parameters_V2.txt file. Key parameters include:
- Sequencing technology (ILLUMINA/IONTORRENT)
- Sequence type (SE/PE)
- Reference genome
- Processing threads
- DeepVariant model type
- GLnexus settings