allpaths-lg

Summary

Software Description

ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.

General Linux

To run this software interactively in a Linux environment run the commands:

module load allpathslg
PrepareAllPathsInputs.pl DATA_DIR=/path/to/data
RunAllPathsLG PRE=$pre DATA_SUBDIR=$data RUN=$run REFERENCE_NAME=$ref

Note:

The PrepareAllPathsInputs.pl script requires one parameter, the path to the directory containing the input data. $pre is the root directory ALLPATHS-LG will use. $data is the subdirectory containing the input data. $run is the directory used for assembly pre-processing. $ref is the organism or reference genome name.

ALLPATHS-LG is composed of a number of modules, each of which performs a step in the assembly process. While each module can be run individually, ALLPATHS-LG provides a module that controls the entire assembly pipeline, called RunAllPathsLG. In addition, before ALLPATHS-LG can be used, data must be converted using the Perl script PrepareAllPathsInputs.pl.

AllPathsLG assembler has specific requirement for the paired-end read libraries. It requires the paired read to be actually interleaved.

A more detailed discussion of each of these directories, as well as a list of other command-line arguments, is available in the user manual. Other ALLPATHS-LG utilities may be found in the directory:

/common/software/install/migrated/allpathslg/VER/bin

where VER is the version of ALLPATHS-LG you are using. An example Slurm script for submitting ALLPATHS-LG jobs to the queue is shown below:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem=1gb
#SBATCH --time=4:00:00
#SBATCH --partition=msismall

module load allpaths-lg

# Prepare input data
mkdir -p test.genome/data
PrepareAllPathsInput.pl DATA_DIR=$PWD/test.genome/data

# Assemble data
RunAllPathsLG \
PRE=$PWD \
DATA_SUBDIR=data \
RUN=run \
REFERENCE_NAME=test.genome