[PDF] fastp: an ultra-fast all-in-one FASTQ preprocessor

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo

DOI:10.1093/bioinformatics/bty560
Corpus ID: 52196534

@article{Chen2018fastpAU, title={fastp: an ultra-fast all-in-one FASTQ preprocessor}, author={Shifu Chen and Yanqing Zhou and Yaru Chen and Jia Gu}, journal={Bioinformatics}, year={2018}, volume={34}, pages={i884 - i890}, url={https://api.semanticscholar.org/CorpusID:52196534}}

Shifu Chen, Yanqing Zhou, Jia Gu
Published in bioRxiv 1 March 2018
Computer Science

Fastp is developed as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features that can perform quality control, adapter trimming, quality filtering, per-read quality cutting, and many other operations with a single scan of the FastQ data.

View on Wolters Kluwer

biorxiv.org

10,599 Citations

Highly Influential Citations

1,379

Background Citations

562

Methods Citations

2,831

Results Citations

Topics

Fastp (opens in a new tab)Adapter Trimming (opens in a new tab)SOAPnuke (opens in a new tab)AfterQC (opens in a new tab)Cutadapt (opens in a new tab)Adapter Trimmer (opens in a new tab)FastQC (opens in a new tab)Adapter Contamination (opens in a new tab)Base Correction (opens in a new tab)Adapter Sequences (opens in a new tab)

10,599 Citations

Atria: an ultra-fast and accurate trimmer for adapter and quality trimming

Jiacheng ChuanAiguo ZhouL. HaleMiao HeXiang Li

Computer Science, Biology

bioRxiv

2021

Atria matches the adapters in paired reads and finds possible overlapped regions with a super-fast and carefully designed byte-based matching algorithm (O(n) time with O(1) space) that can be used in a broad range of short-sequence matching applications.

20 References

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data

Yuxin ChenYongsheng Chen Qiang Chen

Computer Science, Biology

GigaScience

2018

SOAPnuke is demonstrated as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework that enables large scalability to distribute all the processing works to an entire compute cluster.

1,163

[PDF]

Trimmomatic: a flexible trimmer for Illumina sequence data

Anthony M. BolgerM. LohseB. Usadel

Computer Science, Biology

Bioinform.

2014

Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.

42,425

[PDF]

AfterQC: automatic filtering, trimming, error removing and quality control for fastq data

Shifu ChenTanxiao HuangYanqing ZhouYue HanMingyan XuJia Gu

Computer Science

BMC Bioinformatics

2017

Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations.

Cutadapt removes adapter sequences from high-throughput sequencing reads

Marcel Martin

Computer Science, Biology

2011

The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.

22,238
PDF

Fast gapped-read alignment with Bowtie 2

Ben LangmeadS. Salzberg

Computer Science, Biology

Nature Methods

2012

Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

39,888
PDF

SpeedSeq: Ultra-fast personal genome analysis and interpretation

Colby ChiangRyan M. Layer Ira M. Hall

Computer Science, Biology

Nature Methods

2015

The SpeedSeq platform accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement.

The Sequence Alignment/Map format and SAMtools

Heng LiR. Handsaker R. Durbin

Computer Science, Biology

Bioinform.

2009

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by…

46,865

[PDF]

UMI-tools: Modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

Tom S. SmithA. HegerI. Sudbery

Computer Science, Biology

bioRxiv

2016

It is shown that errors in the UMI sequence are common and network-based methods to account for these errors when identifying PCR duplicates are introduced, demonstrating the value of properly accounting for errors in UMIs.

1,219
PDF

Detecting ultralow-frequency mutations by Duplex Sequencing

Scott R. KennedyMichael W. Schmitt L. Loeb

Biology

Nature Protocols

2014

A detailed protocol for efficient DS adapter synthesis, library preparation and target enrichment, as well as an overview of the data analysis workflow are provided.

Theoretical and practical advances in genome halving

F. CollynL. GuyM. MarceauM. SimonetClaude-Alain H. Roten

Biology

2004

The authors' tighter bounds on genome halving distance yield a new algorithm for reconstructing an ancestral duplicated genome, and a software package GenomeHalving is created based on this new algorithm, identifying a sequence of translocations for halving the yeast genome that is shorter than previously conjectured possible.

28,326

...

Related Papers

Showing 1 through 3 of 0 Related Papers

[PDF] fastp: an ultra-fast all-in-one FASTQ preprocessor | Semantic Scholar (2024)

FAQs

Does Fastp remove duplicates? ›

duplication rate and deduplication

For both SE and PE data, fastp supports evaluating its duplication rate and removing duplicated reads/pairs. fastp considers one read as duplicated only if its all base pairs are identical as another one.

Explore More ›

What is the difference between Fastqc and Fastp? ›

The tool with second speed is FASTQC, which takes about 2x the time of fastp. However, FASTQC only performs quality control, while fastp performs quality control (for both pre-filtering data and post-filtering data), data filtering and other operations. The other tools take 3x~5x time of fastp.

Get More Info ›

What is FastP used for? ›

fastp is a FASTQ data pre-processing tool. The algorithm has functions for quality control, trimming of adapters, filtering by quality, and read pruning. It also supports multi-threading.

Read The Full Story ›

What is the difference between Trimmomatic and Fastp? ›

The fastp-filtered data contains no suspected adapters when four or fewer mismatches are allowed. Comparing to fastp-filtered data, Trimmomatic-filtered data contains less suspected adapters when five or more mismatches are allowed, but contains more when four mismatches are allowed.

Get More Info Here ›

How do I remove duplicates from data preprocessing? ›

To delete duplicates, we use a function drop_duplicates in Pandas. An argument “keep” can also be used with drop_duplicates. keep = 'first' keeps the first record and deletes the other duplicates, keep = 'last' keeps the last record and deletes the rest, and keep = False deletes all the records.

Explore More ›

What is the easiest way to remove duplicates? ›

Remove duplicate values

Select the range of cells that has duplicate values you want to remove. ...
Select Data > Remove Duplicates, and then under Columns, check or uncheck the columns where you want to remove the duplicates. ...
Select OK.

What is the FASTQ file format? ›

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.

Explore More ›

How to speed up FastQC? ›

How do we speed this up? FastQC has the capability of splitting up a single process to run on multiple cores! To do this, we will need to specify an additional argument -t indicating number of cores. We will also need to exit the current interactive session, since we started this interactive session with only 1 core.

Get More Info ›

What is the difference between Fastp and FastQC? ›

fastp supports duplication level evaluation for both single-end and paired-end data. Different from FASTQC that uses a hash table to store the duplication keys, fastp stores them by a duplication array D and a counting array C to provide much faster access.

Tell Me More ›

What is the purpose of FastQC? ›

FastQC is used to quality control checks on raw sequence data coming from high throughput sequencing pipelines.

Show Me More ›

What is the purpose of Trimmomatic? ›

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. The selection of trimming steps and their associated parameters are supplied on the command line.

How to remove duplicates in Fasta? ›

Remove Duplicates from a Fasta File and manipulate names :

Detect and remove duplicated IDs.
Detect and remove duplicated sequences.
Detect and remove duplicated sequences & generate a new ID by pasting the sequence IDs that have the same sequence.
Manipulate the sequences names (eliminate a certain string)

What tool removes duplicates from a list? ›

List Deduper:: Online Tool for Removing Duplicates and Sorting Lists

Grab your list into the buffer.
Paste it into the "source" field below.
Optional: choose what you want to delimit on (default is line feed).
Press the "dedupe" button.
Grab the deduped, sorted list from the "target" field below.
Go about your business!

Get More Info Here ›

Do sets remove duplicates? ›

Sets in Python are unordered collections of unique elements. By their nature, duplicates aren't allowed. Therefore, converting a list into a set removes the duplicates.

Does Python remove remove duplicates? ›

If the order of the elements is not critical, we can remove duplicates using the Set method and the Numpy unique() function. We can use Pandas functions, OrderedDict, reduce() function, Set + sort() method, and iterative approaches to keep the order of elements.

Learn More ›

[PDF] fastp: an ultra-fast all-in-one FASTQ preprocessor | Semantic Scholar (2024)

Topics

10,599 Citations

20 References

Related Papers

FAQs

Does Fastp remove duplicates? ›

What is the FASTQ file format? ›