Snakemake workflow: BuscoClade

Description

Pipeline to construct species phylogenies using BUSCO.

Alignment: PRANK, MAFFT.
Trimming: GBlocks, TrimAl.
Phylogenetic tree constraction: IQTree, MrBayes, ASTRAL III, RapidNJ, PHYLIP.
Visualization: Etetoolkit, Matplotlib.

Usage

Step 1. Deploy workflow

To use this workflow, you can either download and extract the latest release or clone the repository:

git clone https://github.com/tomarovsky/BuscoClade.git

Step 2. Add species genomes

Place your unpacked FASTA genome assemblies into the genomes/ directory. Keep in mind that the file prefixes will influence the output phylogeny. Ensure that your files have a .fasta extension.

Step 3. Configure workflow

To set up the workflow, modify config/default.yaml. I recommend to copy config gile and do all modifications in this copy. Some of the options (all nonested options from default.yaml) could also be set via command line using --config flag. Sections of config file:

Pipeline Configuration: This section outlines the workflow. By default, it includes alignments and following filtration of nucleotide sequences, and all tools for phylogeny reconstruction, except for MrBayes (it is recommended to run the GPU compiled version separately). To disable a tool, set its value to False or comment out the corresponding line.
Tool Parameters: Specify parameters for each tool. To perform BUSCO, it is important to specify:
- busco_dataset_path: Download the BUSCO dataset beforehand and specify its path here.
- busco_params: Use the --offline flag and the --download_path parameter, indicating the path to the busco_downloads/ directory.
Directory structure: Define output file structure in the results/ directory. It is recommended to leave it unchanged.
Resources: Specify Slurm queue, threads, memory, and runtime for each tool.

Step 4. Execute workflow

Install snakemake:

mamba create -c conda-forge -c bioconda -c nodefaults -n snakemake snakemake snakemake-executor-plugin-cluster-generic mamba activate snakemake

For a dry run:

snakemake --profile profile/slurm/ --configfile config/default.yaml --dry-run

Snakemake will print all the rules that will be executed. Remove --dry-run to initiate the actual run.

How to run the workflow if I have completed BUSCOs?

First, move the genome assemblies to the genomes/ directory or create empty files with corresponding names. Then, create a results/busco/ directory and move the BUSCO output directories into it. Note that BUSCO output must be formatted. Thus, for Ailurus_fulgens.fasta BUSCO output should look like this:

results/ busco/ Ailurus_fulgens/ busco_sequences/ fragmented_busco_sequences/ multi_copy_busco_sequences/ single_copy_busco_sequences/ hmmer_output/ logs/ metaeuk_output/ full_table_Ailurus_fulgens.tsv missing_busco_list_Ailurus_fulgens.tsv short_summary_Ailurus_fulgens.txt short_summary.json short_summary.specific.mammalia_odb10.Ailurus_fulgens.json short_summary.specific.mammalia_odb10.Ailurus_fulgens.txt

Contact

Please email me at: andrey.tomarovsky@gmail.com for any questions or feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
benchmarks		benchmarks
cluster_logs		cluster_logs
config		config
input		input
logs		logs
profile/slurm		profile/slurm
resources		resources
results		results
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snakemake workflow: BuscoClade

Description

Usage

Step 1. Deploy workflow

Step 2. Add species genomes

Step 3. Configure workflow

Step 4. Execute workflow

Contact

About

Uh oh!

Releases 10

Packages

Contributors 2

Uh oh!

Languages

License

tomarovsky/BuscoClade

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: BuscoClade

Description

Usage

Step 1. Deploy workflow

Step 2. Add species genomes

Step 3. Configure workflow

Step 4. Execute workflow

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 2

Uh oh!

Languages

Packages