Investigation into the annotation of protocol sequencing steps in the sequence read archive

Jamie Alnasir; Hugh Shanahan

doi:10.1186/s13742-015-0064-7

Investigation into the annotation of protocol sequencing steps in the sequence read archive

Jamie Alnasir, Hugh Shanahan

Research output: Contribution to journal › Article › peer-review

169 Downloads (Pure)

Abstract

Background
The work-flow for the production of high-throughput sequencing data from nucleic acid samples is a complex one. There are a series of protocol steps in the preparation of samples for next generation sequencing. The quantification of bias remains to be determined in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment.
Results
We examined the experimental metadata of the Sequence Read Archive (SRA), a public repository in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords that commonly occur in key preparatory protocol steps (fragmentation, ligation and enrichment) partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records, respectively, had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three protocol steps (5.58% of all SRA records).
Conclusions
The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on this data will have a source of bias that at present cannot be quantified.

Original language	English
Pages (from-to)	1-11
Number of pages	11
Journal	GigaScience
Volume	4
Issue number	1
Early online date	9 May 2015
DOIs	https://doi.org/10.1186/s13742-015-0064-7
Publication status	Published - Dec 2015

Keywords

Annotation
Sequencing
Next-generation
Ligation
Fragmentation
Enrichment
Protocol
Metadata
Experiment

Access to Document

10.1186/s13742-015-0064-7Licence: CC BY

1205220503146144_articleAccepted author manuscript, 1.44 MBLicence: CC BY
Supplementary informationLicence: CC BY

Cite this

@article{cc0dc54bf1e345c8b46df849789928d3,

title = "Investigation into the annotation of protocol sequencing steps in the sequence read archive",

abstract = "BackgroundThe work-flow for the production of high-throughput sequencing data from nucleic acid samples is a complex one. There are a series of protocol steps in the preparation of samples for next generation sequencing. The quantification of bias remains to be determined in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment. ResultsWe examined the experimental metadata of the Sequence Read Archive (SRA), a public repository in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords that commonly occur in key preparatory protocol steps (fragmentation, ligation and enrichment) partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records, respectively, had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three protocol steps (5.58% of all SRA records).ConclusionsThe current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on this data will have a source of bias that at present cannot be quantified.",

keywords = "Annotation, Sequencing, Next-generation, Ligation, Fragmentation, Enrichment, Protocol, Metadata, Experiment",

author = "Jamie Alnasir and Hugh Shanahan",

year = "2015",

month = dec,

doi = "10.1186/s13742-015-0064-7",

language = "English",

volume = "4",

pages = "1--11",

journal = "GigaScience",

issn = "2047-217X",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Investigation into the annotation of protocol sequencing steps in the sequence read archive

AU - Alnasir, Jamie

AU - Shanahan, Hugh

PY - 2015/12

Y1 - 2015/12

N2 - BackgroundThe work-flow for the production of high-throughput sequencing data from nucleic acid samples is a complex one. There are a series of protocol steps in the preparation of samples for next generation sequencing. The quantification of bias remains to be determined in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment. ResultsWe examined the experimental metadata of the Sequence Read Archive (SRA), a public repository in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords that commonly occur in key preparatory protocol steps (fragmentation, ligation and enrichment) partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records, respectively, had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three protocol steps (5.58% of all SRA records).ConclusionsThe current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on this data will have a source of bias that at present cannot be quantified.

AB - BackgroundThe work-flow for the production of high-throughput sequencing data from nucleic acid samples is a complex one. There are a series of protocol steps in the preparation of samples for next generation sequencing. The quantification of bias remains to be determined in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment. ResultsWe examined the experimental metadata of the Sequence Read Archive (SRA), a public repository in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords that commonly occur in key preparatory protocol steps (fragmentation, ligation and enrichment) partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records, respectively, had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three protocol steps (5.58% of all SRA records).ConclusionsThe current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on this data will have a source of bias that at present cannot be quantified.

KW - Annotation

KW - Sequencing

KW - Next-generation

KW - Ligation

KW - Fragmentation

KW - Enrichment

KW - Protocol

KW - Metadata

KW - Experiment

U2 - 10.1186/s13742-015-0064-7

DO - 10.1186/s13742-015-0064-7

M3 - Article

SN - 2047-217X

VL - 4

SP - 1

EP - 11

JO - GigaScience

JF - GigaScience

IS - 1

ER -