Introduction

Musket is a well-established leading next-generation sequencing read error correction algorithm targetting Illumina sequencing. This corrector employs the k-mer spectrum approach and introduces three correction techniques in a multistage workflow. Our performance evaluation results, in terms of correction quality and de novo genome assembly measures, reveal that Musket is consistently one of the top performing substitution-error-based correctors. In addition, Musket is multi-threaded using a master-slave model and demonstrates superior parallel scalability compared to all other evaluated correctors as well as a highly competitive overall execution time.


Downloads

  1. Latest source code (release 1.1)

    more details about the changes in this version are availabe at changelog.

  2. Other tools
    • memusage: a program for peak virtual and resident memory calculation on Linux.

Citation

Other related papers


Parameters

Basic:

Advanced:


Installation and Usage

Compilation

  1. Type "make" in the root dirctory of the software to compile it.

Paired-end correction

Musket works with FASTA and FASTQ file formats , including gzip compressed files.

We have simplified the paired-end read correction from release 1.0.5, where only two options "-omulti" and "-inorder" are requred.

  1. The option "-omulti" enables each input file to have its own output file name with prefix specified by this option.
  2. The option "-inorder" eanbles all reads in any input file are output in the same order as in the input.
  3. For example, the command
    • musket -omutli myout -inorder infile_1.fa infile_2 fa infile2_1.fa infile2_2.fa
    will create 4 output files with names "myout.0", "myout.1", "myout.2" and "myout.3". File "myout.0" corresponds to file "infile_1.fa" and "myout.1" corresponds to "infile_2.fa", and so on.

For old versions (<=1.0.4), click here to read the manual for paired-end corrections.

Maximum sequence length

  1. In the Makefile, we have used the macro "MAX_SEQ_LENGTH" to set the maximum sequence length supported. The default value is 200 for the present time. In order to support longer sequences, please change the value of this macro and re-compile the code accordingly.

Maximum k-mer size

  1. In the Makefile, we have used the macro "MAX_KMER_SIZE" to set the maximum k-mer size supported. The default value is 28 for the present time. In order to support larger k-mer sizes, please change the value of this macro and re-compile the code accordingly.

Typical Commands

  1. musket -k 21 536870912 -p 12 reads.fa reads2.fa -o output.fa
  2. musket -p 12 reads.fa reads2.fa -o output.fa
  3. musket reads.fa reads2.fa -o output.fa -inorder
  4. musket reads_1.fa reads_2.fa reads1_1.fa reads1_2.fa -omulti output -inorder

Change Log


Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.