Jump to content

BioPerl: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Bluelinking 1 books for verifiability.) #IABot (v2.1alpha3
m →‎Features and examples: {{sxhl|2=perl|}}
 
(25 intermediate revisions by 15 users not shown)
Line 1: Line 1:
{{short description|Collection of Perl modules for bioinformatics}}
{{Infobox software
{{Infobox software
| name = BioPerl
| name = BioPerl
| logo = BioPerlLogo.png
| logo = BioPerlLogo.png
| released = {{Start date|2002|06|11|df=yes}}
| released = {{Start date and age|2002|06|11|df=yes}}
| latest release version = 1.7.2
| latest release version = {{wikidata|property|edit|P548=Q2804309|P348}}
| latest release date = {{Release date and age|2018|11|29|df=yes}}
| latest release date = {{start date and age|{{wikidata|qualifier|P548=Q2804309|P348|P577}}}}
| programming language = [[Perl]]
| programming language = [[Perl]]
| genre = [[Bioinformatics]]
| genre = [[Bioinformatics]]
Line 10: Line 11:
| website = {{URL|bioperl.org}}
| website = {{URL|bioperl.org}}
}}
}}
'''BioPerl'''<ref>{{Cite journal | last1 = Stajich | first1 = J. E. | last2 = Block | first2 = D. | last3 = Boulez | first3 = K. | last4 = Brenner | first4 = S.| authorlink4 = Steven E. Brenner | last5 = Chervitz | first5 = S. | last6 = Dagdigian | first6 = C. | last7 = Fuellen | first7 = G. | last8 = Gilbert | first8 = J. | last9 = Korf | first9 = I. | last10 = Lapp | first10 = H. | last11 = Lehväslaiho | first11 = H. | last12 = Matsalla | first12 = C. | last13 = Mungall | first13 = C. J. | last14 = Osborne | first14 = B. I. | last15 = Pocock | first15 = M. R. | last16 = Schattner | first16 = P. | last17 = Senger | first17 = M. | last18 = Stein | first18 = L. D. | authorlink18 = Lincoln Stein| last19 = Stupka | first19 = E. | last20 = Wilkinson | first20 = M. D. | last21 = Birney | first21 = E. | authorlink21 = Ewan Birney| title = The BioPerl Toolkit: Perl Modules for the Life Sciences | doi = 10.1101/gr.361602 | journal = Genome Research | volume = 12 | issue = 10 | pages = 1611–1618 | year = 2002 | pmid = 12368254 | pmc =187536 }}</ref><ref>{{cite web |url=http://www.bioperl.org/wiki/BioPerl_publications |title=Archived copy |accessdate=2007-01-21 |deadurl=yes |archiveurl=https://web.archive.org/web/20070202113842/http://www.bioperl.org/wiki/BioPerl_publications |archivedate=2007-02-02 |df= }} A complete, up-to-date list of BioPerl references</ref> is a collection of [[Perl]] modules that facilitate the development of Perl scripts for [[bioinformatics]] applications. It has played an integral role in the [[Human Genome Project]].<ref>{{cite journal
'''BioPerl'''<ref>{{Cite journal | last1 = Stajich | first1 = J. E. | last2 = Block | first2 = D. | last3 = Boulez | first3 = K. | last4 = Brenner | first4 = S.| author-link4 = Steven E. Brenner | last5 = Chervitz | first5 = S. | last6 = Dagdigian | first6 = C. | last7 = Fuellen | first7 = G. | last8 = Gilbert | first8 = J. | last9 = Korf | first9 = I. | last10 = Lapp | first10 = H. | last11 = Lehväslaiho | first11 = H. | last12 = Matsalla | first12 = C. | last13 = Mungall | first13 = C. J. | last14 = Osborne | first14 = B. I. | last15 = Pocock | first15 = M. R. | last16 = Schattner | first16 = P. | last17 = Senger | first17 = M. | last18 = Stein | first18 = L. D. | author-link18 = Lincoln Stein| last19 = Stupka | first19 = E. | last20 = Wilkinson | first20 = M. D. | last21 = Birney | first21 = E. | author-link21 = Ewan Birney| title = The BioPerl Toolkit: Perl Modules for the Life Sciences | doi = 10.1101/gr.361602 | journal = Genome Research | volume = 12 | issue = 10 | pages = 1611–1618 | year = 2002 | pmid = 12368254 | pmc =187536 }}</ref><ref>{{cite web |url=http://www.bioperl.org/wiki/BioPerl_publications |title=BioPerl publications - BioPerl |access-date=2007-01-21 |url-status=dead |archive-url=https://web.archive.org/web/20070202113842/http://www.bioperl.org/wiki/BioPerl_publications |archive-date=2007-02-02 }} A complete, up-to-date list of BioPerl references</ref> is a collection of [[Perl]] modules that facilitate the development of Perl scripts for [[bioinformatics]] applications. It has played an integral role in the [[Human Genome Project]].<ref>{{cite journal
| url=http://www.bioperl.org/wiki/How_Perl_saved_human_genome
| url=http://www.bioperl.org/wiki/How_Perl_saved_human_genome
| title=How Perl saved the human genome project
| title=How Perl saved the human genome project
| author=[[Lincoln Stein]]
| author=Lincoln Stein
| year=1996
| year=1996
| volume=1
| volume=1
| issue=2
| issue=2
| journal=The Perl Journal
| journal=The Perl Journal
| accessdate=2009-02-25
| access-date=2009-02-25
| url-status=dead
| deadurl=yes
| archiveurl=https://web.archive.org/web/20070202101624/http://www.bioperl.org/wiki/How_Perl_saved_human_genome
| archive-url=https://web.archive.org/web/20070202101624/http://www.bioperl.org/wiki/How_Perl_saved_human_genome
| archivedate=2007-02-02
| archive-date=2007-02-02
| author-link=Lincoln Stein
| df=
}}</ref>
}}</ref>


==Background==
==Background==


BioPerl is an active [[Open-source software|open source]] software project supported by the [[Open Bioinformatics Foundation]]. The first set of Perl codes of BioPerl was created by [[Tim Hubbard]] and Jong Bhak{{citation needed|date=February 2014}} at [[Medical Research Council (United Kingdom)|MRC]] Centre Cambridge, where the first genome sequencing was carried out by [[Fred Sanger]]. MRC Centre was one of the hubs and birth places of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the th_lib.pl Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created jong_lib.pl. Bhak merged the two Perl subroutine libraries into Bio.pl. The name BioPerl was coined jointly by Bhak and [[Steven E. Brenner|Steven Brenner]] at the [[Centre for Protein Engineering]] (CPE). In 1995, Brenner organized a BioPerl session at the [[Intelligent Systems for Molecular Biology]] conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student [[Ewan Birney]] joined the development team.{{citation needed|date=December 2013}}
BioPerl is an active [[Open-source software|open source]] software project supported by the [[Open Bioinformatics Foundation]]. The first set of Perl codes of BioPerl was created by [[Tim Hubbard]] and Jong Bhak{{citation needed|date=February 2014}} at [[Medical Research Council (United Kingdom)|MRC]] Centre Cambridge, where the first genome sequencing was carried out by [[Fred Sanger]]. MRC Centre was one of the hubs and birthplaces of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the <code>th_lib.pl</code> Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created <code>jong_lib.pl</code>. Bhak merged the two Perl subroutine libraries into <code>Bio.pl</code>. The name BioPerl was coined jointly by Bhak and [[Steven E. Brenner|Steven Brenner]] at the [[Centre for Protein Engineering]] (CPE). In 1995, Brenner organized a BioPerl session at the [[Intelligent Systems for Molecular Biology]] conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student [[Ewan Birney]] joined the development team.{{citation needed|date=December 2013}}


The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 07 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use.
The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 7 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use.


In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects and methods.
In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects, and methods.

===Influence on the Human Genome Project===
The Human Genome Project faced several challenges during its lifetime. A few of these problems were solved when many of the genomics labs started to use Perl. The process of analyzing all of the DNA sequences was one such problem. Some labs built large monolithic systems with complex relational databases that took forever to debug and implement, and got surpassed by new technologies. Other labs learned to build modular, loosely-coupled systems whose parts could be swapped in and out when new technologies arose. Many of the initial results from all of the labs were mixed. It was eventually discovered that many of the steps could be implemented as loosely coupled programs that were run with a Perl shell script. Another problem that was fixed was interchange of data. Each lab usually had different programs that they ran with their scripts, resulting in several conversions when comparing results. To fix this the labs collectively started using a super-set of data. One script was used to convert from super-set to each lab's set and one was used to convert back. This minimized the number of scripts needed and data exchange became simplified with Perl.


==Features and examples==
==Features and examples==
Line 41: Line 39:
* Accessing [[Nucleotide sequence|nucleotide]] and [[Peptide sequence|peptide]] sequence data from local and remote [[Biological database|databases]]
* Accessing [[Nucleotide sequence|nucleotide]] and [[Peptide sequence|peptide]] sequence data from local and remote [[Biological database|databases]]
'''Example of accessing GenBank to retrieve a sequence:'''
'''Example of accessing GenBank to retrieve a sequence:'''
{{sxhl|2=perl|1=
<pre>
use Bio::DB::GenBank;
use Bio::DB::GenBank;


Line 47: Line 45:


$seq_obj = $db_obj->get_Seq_by_acc( # Insert Accession Number );
$seq_obj = $db_obj->get_Seq_by_acc( # Insert Accession Number );
}}
</pre>
* Transforming [[List of file formats#Biology|formats]] of database/ file records
* Transforming [[List of file formats#Biology|formats]] of database/ file records
'''Example code for transforming formats'''
'''Example code for transforming formats'''
{{sxhl|2=perl|1=
<pre>
use Bio::SeqIO;
use Bio::SeqIO;


Line 65: Line 63:
$seqout->write_seq($inseq);
$seqout->write_seq($inseq);
}
}
}}
</pre>
* Manipulating individual sequences
* Manipulating individual sequences
'''Example of gathering statistics for a given sequence'''
'''Example of gathering statistics for a given sequence'''
{{sxhl|2=perl|1=
<pre>
use Bio::Tools::SeqStats;
use Bio::Tools::SeqStats;
$seq_stats = Bio::Tools::SeqStats->new($seqobj);
$seq_stats = Bio::Tools::SeqStats->new($seqobj);
Line 77: Line 75:
# for nucleic acid sequence
# for nucleic acid sequence
$codon_ref = $seq_stats->count_codons();
$codon_ref = $seq_stats->count_codons();
}}
</pre>
* Searching for similar sequences
* Searching for similar sequences
* Creating and manipulating [[sequence alignment]]s
* Creating and manipulating [[sequence alignment]]s
* Searching for [[gene]]s and other structures on [[Genome|genomic]] DNA
* Searching for [[gene]]s and other structures on [[Genome|genomic]] DNA
* Developing machine readable sequence [[Genome project#Genome annotation|annotations]]
* Developing machine-readable sequence [[Genome project#Genome annotation|annotations]]


==Usage==
==Usage==
In addition to being used directly by end-users,<ref>{{cite book |vauthors=Khaja R, MacDonald J, Zhang J, Scherer S |title=Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes |journal=Methods Mol Biol |volume=338 |issue= |pages=9–20 |year=2006 |pmid=16888347 |doi=10.1385/1-59745-097-9:9 |isbn=978-1-59745-097-3 |url-access=registration |url=https://archive.org/details/genemappingdisco00mino }}</ref> BioPerl has also provided the base for a wide variety of bioinformatic tools, including [https://web.archive.org/web/20070202113842/http://www.bioperl.org/wiki/BioPerl_publications amongst others]:
In addition to being used directly by end-users,<ref>{{cite book |vauthors=Khaja R, MacDonald J, Zhang J, Scherer S |chapter=Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes |series=Methods Mol Biol |volume=338 |pages=9–20 |year=2006 |pmid=16888347 |doi=10.1385/1-59745-097-9:9 |isbn=978-1-59745-097-3 |chapter-url-access=registration |chapter-url=https://archive.org/details/genemappingdisco00mino |title=Gene Mapping, Discovery, and Expression |publisher=Totowa, N.J. : Humana Press |url=https://archive.org/details/genemappingdisco00mino/page/9 }}</ref> BioPerl has also provided the base for a wide variety of bioinformatic tools, including [https://web.archive.org/web/20070202113842/http://www.bioperl.org/wiki/BioPerl_publications amongst others]:


* SynBrowse<ref>{{Cite journal | last1 = Pan | first1 = X. | last2 = Stein | first2 = L. | authorlink2 = Lincoln Stein| last3 = Brendel | first3 = V. | doi = 10.1093/bioinformatics/bti555 | title = SynBrowse: A synteny browser for comparative sequence analysis | journal = Bioinformatics | volume = 21 | issue = 17 | pages = 3461–3468 | year = 2005 | pmid = 15994196| pmc = }}</ref>
* SynBrowse<ref>{{Cite journal | last1 = Pan | first1 = X. | last2 = Stein | first2 = L. | author-link2 = Lincoln Stein| last3 = Brendel | first3 = V. | doi = 10.1093/bioinformatics/bti555 | title = SynBrowse: A synteny browser for comparative sequence analysis | journal = Bioinformatics | volume = 21 | issue = 17 | pages = 3461–3468 | year = 2005 | pmid = 15994196| doi-access = free }}</ref>
* GeneComber<ref>{{Cite journal | last1 = Shah | first1 = S. P. | last2 = McVicker | first2 = G. P. | last3 = MacKworth | first3 = A. K. | last4 = Rogic | first4 = S. | last5 = Ouellette | first5 = B. F. F. | title = GeneComber: Combining outputs of gene prediction programs for improved results | doi = 10.1093/bioinformatics/btg139 | journal = Bioinformatics | volume = 19 | issue = 10 | pages = 1296–1297 | year = 2003 | pmid = 12835277| pmc = }}</ref>
* GeneComber<ref>{{Cite journal | last1 = Shah | first1 = S. P. | last2 = McVicker | first2 = G. P. | last3 = MacKworth | first3 = A. K. | last4 = Rogic | first4 = S. | last5 = Ouellette | first5 = B. F. F. | title = GeneComber: Combining outputs of gene prediction programs for improved results | doi = 10.1093/bioinformatics/btg139 | journal = Bioinformatics | volume = 19 | issue = 10 | pages = 1296–1297 | year = 2003 | pmid = 12835277| doi-access = free }}</ref>
* TFBS<ref>{{Cite journal | last1 = Lenhard | first1 = B. | last2 = Wasserman | first2 = W. W. | doi = 10.1093/bioinformatics/18.8.1135 | title = TFBS: Computational framework for transcription factor binding site analysis | journal = Bioinformatics | volume = 18 | issue = 8 | pages = 1135–1136 | year = 2002 | pmid = 12176838| pmc = }}</ref>
* TFBS<ref>{{Cite journal | last1 = Lenhard | first1 = B. | last2 = Wasserman | first2 = W. W. | doi = 10.1093/bioinformatics/18.8.1135 | title = TFBS: Computational framework for transcription factor binding site analysis | journal = Bioinformatics | volume = 18 | issue = 8 | pages = 1135–1136 | year = 2002 | pmid = 12176838| doi-access = free }}</ref>
* MIMOX<ref>{{Cite journal | last1 = Huang | first1 = J. | last2 = Gutteridge | first2 = A. | last3 = Honda | first3 = W. | last4 = Kanehisa | first4 = M. | title = MIMOX: A web tool for phage display based epitope mapping | journal = BMC Bioinformatics | volume = 7 | pages = 451 | doi = 10.1186/1471-2105-7-451 | year = 2006 | pmid = 17038191| pmc =1618411 }}</ref>
* MIMOX<ref>{{Cite journal | last1 = Huang | first1 = J. | last2 = Gutteridge | first2 = A. | last3 = Honda | first3 = W. | last4 = Kanehisa | first4 = M. | title = MIMOX: A web tool for phage display based epitope mapping | journal = BMC Bioinformatics | volume = 7 | pages = 451 | doi = 10.1186/1471-2105-7-451 | year = 2006 | pmid = 17038191| pmc =1618411 | doi-access = free }}</ref>
* BioParser<ref>{{Cite journal | last1 = Catanho | first1 = M. | last2 = Mascarenhas | first2 = D. | last3 = Degrave | first3 = W. | last4 = De Miranda | first4 = A. B. ?L. | title = BioParser | doi = 10.2165/00822942-200605010-00007 | journal = Applied Bioinformatics | volume = 5 | issue = 1 | pages = 49–53 | year = 2006 | pmid = 16539538| pmc = }}</ref>
* BioParser<ref>{{Cite journal | last1 = Catanho | first1 = M. | last2 = Mascarenhas | first2 = D. | last3 = Degrave | first3 = W. | last4 = De Miranda | first4 = A. B. ?L. | title = BioParser | doi = 10.2165/00822942-200605010-00007 | journal = Applied Bioinformatics | volume = 5 | issue = 1 | pages = 49–53 | year = 2006 | pmid = 16539538| doi-access = free }}</ref>
* Degenerate primer design<ref>{{Cite journal
* Degenerate primer design<ref>{{Cite journal
| last1 = Wei | first1 = X.
| last1 = Wei | first1 = X.
| last2 = Kuhn | first2 = D. N.
| last2 = Kuhn | first2 = D. N.
| last3 = Narasimhan | first3 = G.
| last3 = Narasimhan | first3 = G.
| title = Degenerate primer design via clustering
| title = Degenerate primer design via clustering
| journal = Proceedings / IEEE Computer Society Bioinformatics Conference. IEEE Computer Society Bioinformatics Conference
| journal = Proceedings. IEEE Computer Society Bioinformatics Conference
| volume = 2
| volume = 2
| pages = 75–83
| pages = 75–83
| year = 2003
| year = 2003
| pmid = 16452781
| pmid = 16452781
}}</ref>
}}</ref>
* Querying the public databases<ref>{{Cite journal | last1 = Croce | first1 = O. | last2 = Lamarre | first2 = M. L. | last3 = Christen | first3 = R. | title = Querying the public databases for sequences using complex keywords contained in the feature lines | journal = BMC Bioinformatics | volume = 7 | pages = 45 | year = 2006 | doi = 10.1186/1471-2105-7-45 | pmid = 16441875| pmc =1403806 }}</ref>
* Querying the public databases<ref>{{Cite journal | last1 = Croce | first1 = O. | last2 = Lamarre | first2 = M. L. | last3 = Christen | first3 = R. | title = Querying the public databases for sequences using complex keywords contained in the feature lines | journal = BMC Bioinformatics | volume = 7 | pages = 45 | year = 2006 | doi = 10.1186/1471-2105-7-45 | pmid = 16441875| pmc =1403806 | doi-access = free }}</ref>
* Current Comparative Table<ref>{{Cite journal | last1 = Landsteiner | first1 = B. R. | last2 = Olson | first2 = M. R. | last3 = Rutherford | first3 = R. | doi = 10.1093/nar/gki432 | title = Current Comparative Table (CCT) automates customized searches of dynamic biological databases | journal = Nucleic Acids Research | volume = 33 | issue = Web Server issue | pages = W770–W773 | year = 2005 | pmid = 15980582| pmc =1160193 }}</ref>
* Current Comparative Table<ref>{{Cite journal | last1 = Landsteiner | first1 = B. R. | last2 = Olson | first2 = M. R. | last3 = Rutherford | first3 = R. | doi = 10.1093/nar/gki432 | title = Current Comparative Table (CCT) automates customized searches of dynamic biological databases | journal = Nucleic Acids Research | volume = 33 | issue = Web Server issue | pages = W770–W773 | year = 2005 | pmid = 15980582| pmc =1160193 }}</ref>


New tools and algorithms from external developers are often integrated directly into BioPerl itself:
New tools and algorithms from external developers are often integrated directly into BioPerl itself:


* Dealing with phylogenetic trees and nested taxa<ref>{{Cite journal | last1 = Llabrés | first1 = M. | last2 = Rocha | first2 = J. | last3 = Rosselló | first3 = F. | last4 = Valiente | first4 = G. | title = On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa | doi = 10.1007/s00285-006-0011-4 | journal = Journal of Mathematical Biology | volume = 53 | issue = 3 | pages = 340–364 | year = 2006 | pmid = 16823581| pmc = | arxiv = cs/0505086 }}</ref>
* Dealing with phylogenetic trees and nested taxa<ref>{{Cite journal | last1 = Llabrés | first1 = M. | last2 = Rocha | first2 = J. | last3 = Rosselló | first3 = F. | last4 = Valiente | first4 = G. | title = On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa | doi = 10.1007/s00285-006-0011-4 | journal = Journal of Mathematical Biology | volume = 53 | issue = 3 | pages = 340–364 | year = 2006 | pmid = 16823581| arxiv = cs/0505086 | s2cid = 1704494 }}</ref>
* FPC Web tools<ref>{{Cite journal | last1 = Pampanwar | first1 = V. | last2 = Engler | first2 = F. | last3 = Hatfield | first3 = J. | last4 = Blundy | first4 = S. | last5 = Gupta | first5 = G. | last6 = Soderlund | first6 = C. | title = FPC Web Tools for Rice, Maize, and Distribution | doi = 10.1104/pp.104.056291 | journal = Plant Physiology | volume = 138 | issue = 1 | pages = 116–126 | year = 2005 | pmid = 15888684| pmc =1104167 }}</ref>
* FPC Web tools<ref>{{Cite journal | last1 = Pampanwar | first1 = V. | last2 = Engler | first2 = F. | last3 = Hatfield | first3 = J. | last4 = Blundy | first4 = S. | last5 = Gupta | first5 = G. | last6 = Soderlund | first6 = C. | title = FPC Web Tools for Rice, Maize, and Distribution | doi = 10.1104/pp.104.056291 | journal = Plant Physiology | volume = 138 | issue = 1 | pages = 116–126 | year = 2005 | pmid = 15888684| pmc =1104167 }}</ref>


Line 114: Line 112:


==Disadvantages==
==Disadvantages==
There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.
There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.{{Citation needed|date=August 2021}}


==Related libraries in other programming languages==
==Related libraries in other programming languages==
Line 130: Line 128:
{{Perl}}
{{Perl}}


[[Category:Bioinformatics software]]
[[Category:Perl software]]
[[Category:Perl software]]
[[Category:Free bioinformatics software]]
[[Category:Free bioinformatics software]]
[[Category:Bioinformatics software]]

Latest revision as of 05:35, 18 June 2024

BioPerl
Initial release11 June 2002; 22 years ago (2002-06-11)
Stable release
1.7.7 Edit this on Wikidata / 7 December 2019; 4 years ago (7 December 2019)
Repository
Written inPerl
TypeBioinformatics
LicenseArtistic License and GPL
Websitebioperl.org

BioPerl[1][2] is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.[3]

Background

[edit]

BioPerl is an active open source software project supported by the Open Bioinformatics Foundation. The first set of Perl codes of BioPerl was created by Tim Hubbard and Jong Bhak[citation needed] at MRC Centre Cambridge, where the first genome sequencing was carried out by Fred Sanger. MRC Centre was one of the hubs and birthplaces of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the th_lib.pl Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created jong_lib.pl. Bhak merged the two Perl subroutine libraries into Bio.pl. The name BioPerl was coined jointly by Bhak and Steven Brenner at the Centre for Protein Engineering (CPE). In 1995, Brenner organized a BioPerl session at the Intelligent Systems for Molecular Biology conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student Ewan Birney joined the development team.[citation needed]

The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 7 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use.

In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects, and methods.

Features and examples

[edit]

BioPerl provides software modules for many of the typical tasks of bioinformatics programming. These include:

Example of accessing GenBank to retrieve a sequence:

use Bio::DB::GenBank;

$db_obj = Bio::DB::GenBank->new;

$seq_obj = $db_obj->get_Seq_by_acc( # Insert Accession Number );
  • Transforming formats of database/ file records

Example code for transforming formats

use Bio::SeqIO;

my $usage = "all2y.pl informat outfile outfileformat";
my $informat = shift or die $usage;
my $outfile = shift or die $usage;
my $outformat = shift or die $usage;

my $seqin = Bio::SeqIO->new( -fh  => *STDIN,  -format => $informat, );
my $seqout = Bio::SeqIO->new( -file  => ">$outfile",  -format => $outformat, );

while (my $inseq = $seqin->next_seq)
{
   $seqout->write_seq($inseq);
}
  • Manipulating individual sequences

Example of gathering statistics for a given sequence

use Bio::Tools::SeqStats;
$seq_stats = Bio::Tools::SeqStats->new($seqobj);

$weight = $seq_stats->get_mol_wt();
$monomer_ref = $seq_stats->count_monomers();

# for nucleic acid sequence
$codon_ref = $seq_stats->count_codons();

Usage

[edit]

In addition to being used directly by end-users,[4] BioPerl has also provided the base for a wide variety of bioinformatic tools, including amongst others:

  • SynBrowse[5]
  • GeneComber[6]
  • TFBS[7]
  • MIMOX[8]
  • BioParser[9]
  • Degenerate primer design[10]
  • Querying the public databases[11]
  • Current Comparative Table[12]

New tools and algorithms from external developers are often integrated directly into BioPerl itself:

  • Dealing with phylogenetic trees and nested taxa[13]
  • FPC Web tools[14]

Advantages

[edit]

BioPerl was one of the first biological module repositories that increased its usability. It has very easy to install modules, along with a flexible global repository. BioPerl uses good test modules for a large variety of processes.

Disadvantages

[edit]

There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.[citation needed]

[edit]

Several related bioinformatics libraries implemented in other programming languages exist as part of the Open Bioinformatics Foundation, including:

References

[edit]
  1. ^ Stajich, J. E.; Block, D.; Boulez, K.; Brenner, S.; Chervitz, S.; Dagdigian, C.; Fuellen, G.; Gilbert, J.; Korf, I.; Lapp, H.; Lehväslaiho, H.; Matsalla, C.; Mungall, C. J.; Osborne, B. I.; Pocock, M. R.; Schattner, P.; Senger, M.; Stein, L. D.; Stupka, E.; Wilkinson, M. D.; Birney, E. (2002). "The BioPerl Toolkit: Perl Modules for the Life Sciences". Genome Research. 12 (10): 1611–1618. doi:10.1101/gr.361602. PMC 187536. PMID 12368254.
  2. ^ "BioPerl publications - BioPerl". Archived from the original on 2007-02-02. Retrieved 2007-01-21. A complete, up-to-date list of BioPerl references
  3. ^ Lincoln Stein (1996). "How Perl saved the human genome project". The Perl Journal. 1 (2). Archived from the original on 2007-02-02. Retrieved 2009-02-25.
  4. ^ Khaja R, MacDonald J, Zhang J, Scherer S (2006). "Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes". Gene Mapping, Discovery, and Expression. Methods Mol Biol. Vol. 338. Totowa, N.J. : Humana Press. pp. 9–20. doi:10.1385/1-59745-097-9:9. ISBN 978-1-59745-097-3. PMID 16888347.
  5. ^ Pan, X.; Stein, L.; Brendel, V. (2005). "SynBrowse: A synteny browser for comparative sequence analysis". Bioinformatics. 21 (17): 3461–3468. doi:10.1093/bioinformatics/bti555. PMID 15994196.
  6. ^ Shah, S. P.; McVicker, G. P.; MacKworth, A. K.; Rogic, S.; Ouellette, B. F. F. (2003). "GeneComber: Combining outputs of gene prediction programs for improved results". Bioinformatics. 19 (10): 1296–1297. doi:10.1093/bioinformatics/btg139. PMID 12835277.
  7. ^ Lenhard, B.; Wasserman, W. W. (2002). "TFBS: Computational framework for transcription factor binding site analysis". Bioinformatics. 18 (8): 1135–1136. doi:10.1093/bioinformatics/18.8.1135. PMID 12176838.
  8. ^ Huang, J.; Gutteridge, A.; Honda, W.; Kanehisa, M. (2006). "MIMOX: A web tool for phage display based epitope mapping". BMC Bioinformatics. 7: 451. doi:10.1186/1471-2105-7-451. PMC 1618411. PMID 17038191.
  9. ^ Catanho, M.; Mascarenhas, D.; Degrave, W.; De Miranda, A. B. ?L. (2006). "BioParser". Applied Bioinformatics. 5 (1): 49–53. doi:10.2165/00822942-200605010-00007. PMID 16539538.
  10. ^ Wei, X.; Kuhn, D. N.; Narasimhan, G. (2003). "Degenerate primer design via clustering". Proceedings. IEEE Computer Society Bioinformatics Conference. 2: 75–83. PMID 16452781.
  11. ^ Croce, O.; Lamarre, M. L.; Christen, R. (2006). "Querying the public databases for sequences using complex keywords contained in the feature lines". BMC Bioinformatics. 7: 45. doi:10.1186/1471-2105-7-45. PMC 1403806. PMID 16441875.
  12. ^ Landsteiner, B. R.; Olson, M. R.; Rutherford, R. (2005). "Current Comparative Table (CCT) automates customized searches of dynamic biological databases". Nucleic Acids Research. 33 (Web Server issue): W770–W773. doi:10.1093/nar/gki432. PMC 1160193. PMID 15980582.
  13. ^ Llabrés, M.; Rocha, J.; Rosselló, F.; Valiente, G. (2006). "On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa". Journal of Mathematical Biology. 53 (3): 340–364. arXiv:cs/0505086. doi:10.1007/s00285-006-0011-4. PMID 16823581. S2CID 1704494.
  14. ^ Pampanwar, V.; Engler, F.; Hatfield, J.; Blundy, S.; Gupta, G.; Soderlund, C. (2005). "FPC Web Tools for Rice, Maize, and Distribution". Plant Physiology. 138 (1): 116–126. doi:10.1104/pp.104.056291. PMC 1104167. PMID 15888684.