Discussion:
[Bioperl-l] Google Summer of Code student Chase Miller
Mark A. Jensen
2009-05-11 14:31:35 UTC
Permalink
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google Summer of Code
student from George Washington University, to the community. Chase will be
working with me and Rutger Vos on a BioPerl wrapper for Rutger's Bio::Phylo
package, with a particular emphasis on creating a BioPerl-native way to import
and export the NeXML (http://nexml.org) phylogenetic data format. He wrote a
great proposal, available here:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit.
We will be working throughout the summer on the project, and will of course come
to you for sage advice. I know you will welcome him warmly, as you did me.
Cheers,
Mark
Hilmar Lapp
2009-05-11 16:09:20 UTC
Permalink
Welcome to the fold, Chase, and looking forward to the project! :-)

-hilmar
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Jason Stajich
2009-05-11 16:24:06 UTC
Permalink
Welcome Chase.

Look forward to the project and helping where needed.

-jason
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
Albert Vilella
2009-05-14 13:45:17 UTC
Permalink
Hi all,

In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.

If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.

Looking forward to hearing from this SoC. Have you got a blog?

Cheers,

Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 14:16:46 UTC
Permalink
Albert,

Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run.
Do we know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to believe 1.2.3 is absolutely required)? The previous
answers have been pretty nebulous and unspecific.

I would have to go on record as being opposed to this. If there is a
true compatibility issue, I would much rather spend the energy and
tuits driving towards ensembl compatibility with the current bioperl
version than backporting to 1.2.3. What about having users popping in
with bug reports on list (here or ensembl) about bioperl versions 5+
years out-of-date? Furthermore, it's a slippery slope; the next thing
will be requests to backport specific bug fixes in the current branch
to 1.2.3.

Who's willing to maintain that branch? We have few devs as it is, so
is someone on the ensembl end willing to take that up?

Perl 5 development has been held up with the same issues, something
they have recently just started digging themselves out of.
Regardless, I think way too many changes have occurred in that
particular code that make such endeavors unrealistic, unfeasible, and
unmaintainable.

chris
Post by Albert Vilella
Hi all,
In Ensembl, we are interested in providing NeXML dumps for our
Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a
BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chase Miller
2009-05-14 17:15:26 UTC
Permalink
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.

Albert, I don't have a blog yet. Currently, you can check the project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).

Chase
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 19:33:18 UTC
Permalink
Welcome to the BioPerl community Chase! Let us know if you need any
help.

chris

(less cranky now I've had my coffee)
Post by Albert Vilella
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.
Albert, I don't have a blog yet. Currently, you can check the
project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).
Chase
On Thu, May 14, 2009 at 10:16 AM, Chris Fields
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-
run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-
date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of
great use
to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*.
Bioperl 1.2.3
is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google
Summer
of
Code student from George Washington University, to the
community. Chase
will
be working with me and Rutger Vos on a BioPerl wrapper for
Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org)
phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly,
as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 19:33:18 UTC
Permalink
Welcome to the BioPerl community Chase! Let us know if you need any
help.

chris

(less cranky now I've had my coffee)
Post by Albert Vilella
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.
Albert, I don't have a blog yet. Currently, you can check the
project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).
Chase
On Thu, May 14, 2009 at 10:16 AM, Chris Fields
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-
run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-
date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of
great use
to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*.
Bioperl 1.2.3
is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google
Summer
of
Code student from George Washington University, to the
community. Chase
will
be working with me and Rutger Vos on a BioPerl wrapper for
Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org)
phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly,
as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 19:33:18 UTC
Permalink
Welcome to the BioPerl community Chase! Let us know if you need any
help.

chris

(less cranky now I've had my coffee)
Post by Albert Vilella
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.
Albert, I don't have a blog yet. Currently, you can check the
project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).
Chase
On Thu, May 14, 2009 at 10:16 AM, Chris Fields
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-
run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-
date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of
great use
to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*.
Bioperl 1.2.3
is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google
Summer
of
Code student from George Washington University, to the
community. Chase
will
be working with me and Rutger Vos on a BioPerl wrapper for
Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org)
phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly,
as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 19:33:18 UTC
Permalink
Welcome to the BioPerl community Chase! Let us know if you need any
help.

chris

(less cranky now I've had my coffee)
Post by Albert Vilella
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.
Albert, I don't have a blog yet. Currently, you can check the
project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).
Chase
On Thu, May 14, 2009 at 10:16 AM, Chris Fields
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-
run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-
date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of
great use
to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*.
Bioperl 1.2.3
is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google
Summer
of
Code student from George Washington University, to the
community. Chase
will
be working with me and Rutger Vos on a BioPerl wrapper for
Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org)
phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly,
as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-15 17:07:43 UTC
Permalink
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p

One of the things that has changed a lot is swissprot support (swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Hilmar Lapp
2009-05-15 18:44:50 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the
ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
That should be a positive, no?

I understand that there have been (are?) good reasons for inertia on
the Ensembl end - undoubtedly such a switch would require a huge
amount of testing to be sure all the wrinkles have been ironed out. So
the question I'd like to ask is, from an Ensembl perspective what
BioPerl features or functions or other things we can actually control
would make that effort worth it?

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Chris Fields
2009-05-17 22:40:24 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the
point that I would prefer to switch to a more modern bioperl but the
ensembl comparative genomics code -- ensembl-compara -- relies on
the ensembl-core code, which relies on bioperl 1.2.3. We could all
switch to bioperl 1.6 but I cannot switch the ensembl-compara code
if code doesn't switch as well. I haven't been very successful in
raising this issue so far, but I can try again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm). Another object that I am using a lot is SimpleAlign.pm,
which in the modern version has a lot more methods.
I understand that the reasoning for requiring 1.2.3 has something to
do with Bio::Annotation being too heavyweight. If that is the only
impediment I think we can work something out.

chris
Smithies, Russell
2009-05-18 04:53:05 UTC
Permalink
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?

Fasta from Uniprot's FTP site doesn't formatdb correctly (with the -o T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10 queries/second (the limit changed recently) it would take too long.

Any ideas?
Is there a swissprot2gi list somewhere?

Thanx,


Russell Smithies

Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz

Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809??
F? +64 3 489 9174?
www.agresearch.co.nz



=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Cook, Malcolm
2009-05-18 14:34:39 UTC
Permalink
you could:

1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to download the 458,445 ginumbers.
I just did it.

2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)


Does this work for you?


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-18 20:44:17 UTC
Permalink
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.

The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
last words):

ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists

chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Cook, Malcolm
2009-05-18 21:11:40 UTC
Permalink
Chris,

livelists, eh? Cool! So, the gis could be obtained using eutil search, which could be translated to accessions using livelists.

On a side note.... Do you happen if livelists includes refseq identifiers/gis?

Thx,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu]
Sent: Monday, May 18, 2009 3:44 PM
To: Cook, Malcolm
Cc: 'Smithies, Russell'; 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
If you need to retain mapping between acc => gi it gets a
little more complicated; most procedures to NCBI return a
'bag' of gi's w/o any relation to their original accession.
You can grab them via esummary, though, but you'll have to
iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties] If you use a retmax of 100000 it should
only take a
Post by Cook, Malcolm
few seconds to download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis,
and parse
Post by Cook, Malcolm
the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies,
Post by Cook, Malcolm
Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o
Post by Cook, Malcolm
T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want
Post by Cook, Malcolm
to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10
queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer T +64 3 489 9085 E
russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for
the persons
Post by Cook, Malcolm
or entities to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission,
dissemination
Post by Cook, Malcolm
or other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended
recipients
Post by Cook, Malcolm
is prohibited by AgResearch Limited. If you have received this
message in error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
granjeau
2009-05-18 21:39:07 UTC
Permalink
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.

I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).

Answer us your solution.

Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Smithies, Russell
2009-05-18 23:11:40 UTC
Permalink
As far as I can see, none of the fasta at ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/ will correctly formatdb with the "-o T" option. This is with the latest version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above link, they successfully create the required files but the blast result descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.

A quick hack of prepending fake GI numbers to each accession gets the files formatted correctly and allows sequence retrieval but it's not an ideal solution.


--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
bill
2009-05-19 00:11:51 UTC
Permalink
The problem is that makeblastdb does not recognize the first block of
UniRef50_P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

to
sp|P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

and it works!

It seems that prefixing your protein id with 'sp|' right after '>' will work.

Good luck!

Bill at genenformics
As far as I can see, none of the fasta at
ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/
will correctly formatdb with the "-o T" option. This is with the latest
version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above
link, they successfully create the required files but the blast result
descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.
A quick hack of prepending fake GI numbers to each accession gets the
files formatted correctly and allows sequence retrieval but it's not an
ideal solution.
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:11:51 UTC
Permalink
The problem is that makeblastdb does not recognize the first block of
UniRef50_P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

to
sp|P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

and it works!

It seems that prefixing your protein id with 'sp|' right after '>' will work.

Good luck!

Bill at genenformics
As far as I can see, none of the fasta at
ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/
will correctly formatdb with the "-o T" option. This is with the latest
version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above
link, they successfully create the required files but the blast result
descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.
A quick hack of prepending fake GI numbers to each accession gets the
files formatted correctly and allows sequence retrieval but it's not an
ideal solution.
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:11:51 UTC
Permalink
The problem is that makeblastdb does not recognize the first block of
UniRef50_P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

to
sp|P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

and it works!

It seems that prefixing your protein id with 'sp|' right after '>' will work.

Good luck!

Bill at genenformics
As far as I can see, none of the fasta at
ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/
will correctly formatdb with the "-o T" option. This is with the latest
version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above
link, they successfully create the required files but the blast result
descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.
A quick hack of prepending fake GI numbers to each accession gets the
files formatted correctly and allows sequence retrieval but it's not an
ideal solution.
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:11:51 UTC
Permalink
The problem is that makeblastdb does not recognize the first block of
UniRef50_P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

to
sp|P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

and it works!

It seems that prefixing your protein id with 'sp|' right after '>' will work.

Good luck!

Bill at genenformics
As far as I can see, none of the fasta at
ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/
will correctly formatdb with the "-o T" option. This is with the latest
version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above
link, they successfully create the required files but the blast result
descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.
A quick hack of prepending fake GI numbers to each accession gets the
files formatted correctly and allows sequence retrieval but it's not an
ideal solution.
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-18 23:11:40 UTC
Permalink
As far as I can see, none of the fasta at ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/ will correctly formatdb with the "-o T" option. This is with the latest version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above link, they successfully create the required files but the blast result descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.

A quick hack of prepending fake GI numbers to each accession gets the files formatted correctly and allows sequence retrieval but it's not an ideal solution.


--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-18 23:11:40 UTC
Permalink
As far as I can see, none of the fasta at ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/ will correctly formatdb with the "-o T" option. This is with the latest version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above link, they successfully create the required files but the blast result descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.

A quick hack of prepending fake GI numbers to each accession gets the files formatted correctly and allows sequence retrieval but it's not an ideal solution.


--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-18 23:11:40 UTC
Permalink
As far as I can see, none of the fasta at ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/ will correctly formatdb with the "-o T" option. This is with the latest version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above link, they successfully create the required files but the blast result descriptions are truncated.
NCBI say it's not their fault and EBI don't answer their email.

A quick hack of prepending fake GI numbers to each accession gets the files formatted correctly and allows sequence retrieval but it's not an ideal solution.


--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
Sent: Tuesday, 19 May 2009 9:39 a.m.
Cc: 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.
I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
Answer us your solution.
Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Cook, Malcolm
2009-05-18 21:11:40 UTC
Permalink
Chris,

livelists, eh? Cool! So, the gis could be obtained using eutil search, which could be translated to accessions using livelists.

On a side note.... Do you happen if livelists includes refseq identifiers/gis?

Thx,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu]
Sent: Monday, May 18, 2009 3:44 PM
To: Cook, Malcolm
Cc: 'Smithies, Russell'; 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
If you need to retain mapping between acc => gi it gets a
little more complicated; most procedures to NCBI return a
'bag' of gi's w/o any relation to their original accession.
You can grab them via esummary, though, but you'll have to
iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties] If you use a retmax of 100000 it should
only take a
Post by Cook, Malcolm
few seconds to download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis,
and parse
Post by Cook, Malcolm
the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies,
Post by Cook, Malcolm
Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o
Post by Cook, Malcolm
T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want
Post by Cook, Malcolm
to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10
queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer T +64 3 489 9085 E
russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for
the persons
Post by Cook, Malcolm
or entities to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission,
dissemination
Post by Cook, Malcolm
or other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended
recipients
Post by Cook, Malcolm
is prohibited by AgResearch Limited. If you have received this
message in error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
granjeau
2009-05-18 21:39:07 UTC
Permalink
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.

I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).

Answer us your solution.

Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Cook, Malcolm
2009-05-18 21:11:40 UTC
Permalink
Chris,

livelists, eh? Cool! So, the gis could be obtained using eutil search, which could be translated to accessions using livelists.

On a side note.... Do you happen if livelists includes refseq identifiers/gis?

Thx,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu]
Sent: Monday, May 18, 2009 3:44 PM
To: Cook, Malcolm
Cc: 'Smithies, Russell'; 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
If you need to retain mapping between acc => gi it gets a
little more complicated; most procedures to NCBI return a
'bag' of gi's w/o any relation to their original accession.
You can grab them via esummary, though, but you'll have to
iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties] If you use a retmax of 100000 it should
only take a
Post by Cook, Malcolm
few seconds to download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis,
and parse
Post by Cook, Malcolm
the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies,
Post by Cook, Malcolm
Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o
Post by Cook, Malcolm
T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want
Post by Cook, Malcolm
to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10
queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer T +64 3 489 9085 E
russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for
the persons
Post by Cook, Malcolm
or entities to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission,
dissemination
Post by Cook, Malcolm
or other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended
recipients
Post by Cook, Malcolm
is prohibited by AgResearch Limited. If you have received this
message in error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
granjeau
2009-05-18 21:39:07 UTC
Permalink
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.

I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).

Answer us your solution.

Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Cook, Malcolm
2009-05-18 21:11:40 UTC
Permalink
Chris,

livelists, eh? Cool! So, the gis could be obtained using eutil search, which could be translated to accessions using livelists.

On a side note.... Do you happen if livelists includes refseq identifiers/gis?

Thx,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu]
Sent: Monday, May 18, 2009 3:44 PM
To: Cook, Malcolm
Cc: 'Smithies, Russell'; 'BioPerl List'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
If you need to retain mapping between acc => gi it gets a
little more complicated; most procedures to NCBI return a
'bag' of gi's w/o any relation to their original accession.
You can grab them via esummary, though, but you'll have to
iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties] If you use a retmax of 100000 it should
only take a
Post by Cook, Malcolm
few seconds to download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis,
and parse
Post by Cook, Malcolm
the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies,
Post by Cook, Malcolm
Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o
Post by Cook, Malcolm
T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want
Post by Cook, Malcolm
to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10
queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer T +64 3 489 9085 E
russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for
the persons
Post by Cook, Malcolm
or entities to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission,
dissemination
Post by Cook, Malcolm
or other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended
recipients
Post by Cook, Malcolm
is prohibited by AgResearch Limited. If you have received this
message in error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
granjeau
2009-05-18 21:39:07 UTC
Permalink
May be you try the PICR service at EBI
http://www.ebi.ac.uk/Tools/picr/
or some other ID converter (as for example some Gene Ontology tools) or
even SRS.

I think there could be more than one gi per sp (it's not clear to me if
you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).

Answer us your solution.

Regards,
Samuel
Post by Chris Fields
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.
The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Chris Fields
2009-05-18 20:44:17 UTC
Permalink
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.

The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
last words):

ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists

chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-18 20:44:17 UTC
Permalink
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.

The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
last words):

ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists

chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-18 20:44:17 UTC
Permalink
If you need to retain mapping between acc => gi it gets a little more
complicated; most procedures to NCBI return a 'bag' of gi's w/o any
relation to their original accession. You can grab them via esummary,
though, but you'll have to iterate through them.

The other option is LiveLists (has both nuc and protein acc => gi).
I'm assuming this would have the swissprot accessions included (famous
last words):

ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists

chris
Post by Cook, Malcolm
1) Use eutils search with -database protein -term "srcdb swiss
prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to
download the 458,445 ginumbers.
I just did it.
2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)
Does this work for you?
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Cook, Malcolm
2009-05-18 14:34:39 UTC
Permalink
you could:

1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to download the 458,445 ginumbers.
I just did it.

2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)


Does this work for you?


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Cook, Malcolm
2009-05-18 14:34:39 UTC
Permalink
you could:

1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to download the 458,445 ginumbers.
I just did it.

2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)


Does this work for you?


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Cook, Malcolm
2009-05-18 14:34:39 UTC
Permalink
you could:

1) Use eutils search with -database protein -term "srcdb swiss prot"[Properties]
If you use a retmax of 100000 it should only take a few seconds to download the 458,445 ginumbers.
I just did it.

2) use fastacmd to extract the fasta from nr for these gis, and parse the defline.
(assuming you have a copy of nr)


Does this work for you?


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Smithies, Russell
Sent: Sunday, May 17, 2009 11:53 PM
To: 'BioPerl List'
Subject: [Bioperl-l] Uniprot/Swiss accessions?
Does anyone know of a way to get GI numbers for
Uniprot/Swissprot accessions?
Fasta from Uniprot's FTP site doesn't formatdb correctly
(with the -o T option) as it's missing the gi number in the
fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I
don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at
10 queries/second (the limit changed recently) it would take too long.
Any ideas?
Is there a swissprot2gi list somewhere?
Thanx,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz
==============================================================
=========
Attention: The information contained in this message and/or
attachments from AgResearch Limited is intended only for the
persons or entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by
AgResearch Limited. If you have received this message in
error, please notify the sender immediately.
==============================================================
=========
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-18 04:53:05 UTC
Permalink
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?

Fasta from Uniprot's FTP site doesn't formatdb correctly (with the -o T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10 queries/second (the limit changed recently) it would take too long.

Any ideas?
Is there a swissprot2gi list somewhere?

Thanx,


Russell Smithies

Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz

Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809??
F? +64 3 489 9174?
www.agresearch.co.nz



=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-18 04:53:05 UTC
Permalink
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?

Fasta from Uniprot's FTP site doesn't formatdb correctly (with the -o T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10 queries/second (the limit changed recently) it would take too long.

Any ideas?
Is there a swissprot2gi list somewhere?

Thanx,


Russell Smithies

Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz

Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809??
F? +64 3 489 9174?
www.agresearch.co.nz



=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-18 04:53:05 UTC
Permalink
Does anyone know of a way to get GI numbers for Uniprot/Swissprot accessions?

Fasta from Uniprot's FTP site doesn't formatdb correctly (with the -o T option) as it's missing the gi number in the fasta header.
NCBI won't let you use SwissProt ids in batch-entrez and I don't want to have to look up all 466,739 of them.
I could use Bio::DB::Eutilities and query each id but even at 10 queries/second (the limit changed recently) it would take too long.

Any ideas?
Is there a swissprot2gi list somewhere?

Thanx,


Russell Smithies

Bioinformatics Applications Developer
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz

Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809??
F? +64 3 489 9174?
www.agresearch.co.nz



=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Hilmar Lapp
2009-05-15 18:44:50 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the
ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
That should be a positive, no?

I understand that there have been (are?) good reasons for inertia on
the Ensembl end - undoubtedly such a switch would require a huge
amount of testing to be sure all the wrinkles have been ironed out. So
the question I'd like to ask is, from an Ensembl perspective what
BioPerl features or functions or other things we can actually control
would make that effort worth it?

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Chris Fields
2009-05-17 22:40:24 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the
point that I would prefer to switch to a more modern bioperl but the
ensembl comparative genomics code -- ensembl-compara -- relies on
the ensembl-core code, which relies on bioperl 1.2.3. We could all
switch to bioperl 1.6 but I cannot switch the ensembl-compara code
if code doesn't switch as well. I haven't been very successful in
raising this issue so far, but I can try again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm). Another object that I am using a lot is SimpleAlign.pm,
which in the modern version has a lot more methods.
I understand that the reasoning for requiring 1.2.3 has something to
do with Bio::Annotation being too heavyweight. If that is the only
impediment I think we can work something out.

chris
Hilmar Lapp
2009-05-15 18:44:50 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the
ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
That should be a positive, no?

I understand that there have been (are?) good reasons for inertia on
the Ensembl end - undoubtedly such a switch would require a huge
amount of testing to be sure all the wrinkles have been ironed out. So
the question I'd like to ask is, from an Ensembl perspective what
BioPerl features or functions or other things we can actually control
would make that effort worth it?

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Chris Fields
2009-05-17 22:40:24 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the
point that I would prefer to switch to a more modern bioperl but the
ensembl comparative genomics code -- ensembl-compara -- relies on
the ensembl-core code, which relies on bioperl 1.2.3. We could all
switch to bioperl 1.6 but I cannot switch the ensembl-compara code
if code doesn't switch as well. I haven't been very successful in
raising this issue so far, but I can try again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm). Another object that I am using a lot is SimpleAlign.pm,
which in the modern version has a lot more methods.
I understand that the reasoning for requiring 1.2.3 has something to
do with Bio::Annotation being too heavyweight. If that is the only
impediment I think we can work something out.

chris
Hilmar Lapp
2009-05-15 18:44:50 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the
ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
That should be a positive, no?

I understand that there have been (are?) good reasons for inertia on
the Ensembl end - undoubtedly such a switch would require a huge
amount of testing to be sure all the wrinkles have been ironed out. So
the question I'd like to ask is, from an Ensembl perspective what
BioPerl features or functions or other things we can actually control
would make that effort worth it?

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Chris Fields
2009-05-17 22:40:24 UTC
Permalink
Post by Albert Vilella
Heh, I understand what you say. I am in a similar position from the
point that I would prefer to switch to a more modern bioperl but the
ensembl comparative genomics code -- ensembl-compara -- relies on
the ensembl-core code, which relies on bioperl 1.2.3. We could all
switch to bioperl 1.6 but I cannot switch the ensembl-compara code
if code doesn't switch as well. I haven't been very successful in
raising this issue so far, but I can try again :-p
One of the things that has changed a lot is swissprot support
(swiss.pm). Another object that I am using a lot is SimpleAlign.pm,
which in the modern version has a lot more methods.
I understand that the reasoning for requiring 1.2.3 has something to
do with Bio::Annotation being too heavyweight. If that is the only
impediment I think we can work something out.

chris
Chase Miller
2009-05-14 17:15:26 UTC
Permalink
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.

Albert, I don't have a blog yet. Currently, you can check the project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).

Chase
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-15 17:07:43 UTC
Permalink
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p

One of the things that has changed a lot is swissprot support (swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chase Miller
2009-05-14 17:15:26 UTC
Permalink
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.

Albert, I don't have a blog yet. Currently, you can check the project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).

Chase
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-15 17:07:43 UTC
Permalink
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p

One of the things that has changed a lot is swissprot support (swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chase Miller
2009-05-14 17:15:26 UTC
Permalink
Hi all,
Thanks for the warm welcome. I'm really looking forward to working with
everyone.

Albert, I don't have a blog yet. Currently, you can check the project page
for any updates (
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
).

Chase
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-15 17:07:43 UTC
Permalink
Heh, I understand what you say. I am in a similar position from the point
that I would prefer to switch to a more modern bioperl but the ensembl
comparative genomics code -- ensembl-compara -- relies on the ensembl-core
code, which relies on bioperl 1.2.3. We could all switch to bioperl 1.6 but
I cannot switch the ensembl-compara code if code doesn't switch as well. I
haven't been very successful in raising this issue so far, but I can try
again :-p

One of the things that has changed a lot is swissprot support (swiss.pm).
Another object that I am using a lot is SimpleAlign.pm, which in the modern
version has a lot more methods.
Post by Chris Fields
Albert,
Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run. Do we
know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads everyone to
believe 1.2.3 is absolutely required)? The previous answers have been
pretty nebulous and unspecific.
I would have to go on record as being opposed to this. If there is a true
compatibility issue, I would much rather spend the energy and tuits driving
towards ensembl compatibility with the current bioperl version than
backporting to 1.2.3. What about having users popping in with bug reports
on list (here or ensembl) about bioperl versions 5+ years out-of-date?
Furthermore, it's a slippery slope; the next thing will be requests to
backport specific bug fixes in the current branch to 1.2.3.
Who's willing to maintain that branch? We have few devs as it is, so is
someone on the ensembl end willing to take that up?
Perl 5 development has been held up with the same issues, something they
have recently just started digging themselves out of. Regardless, I think
way too many changes have occurred in that particular code that make such
endeavors unrealistic, unfeasible, and unmaintainable.
chris
Hi all,
Post by Albert Vilella
In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Welcome Chase.
Post by Jason Stajich
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 14:16:46 UTC
Permalink
Albert,

Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run.
Do we know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to believe 1.2.3 is absolutely required)? The previous
answers have been pretty nebulous and unspecific.

I would have to go on record as being opposed to this. If there is a
true compatibility issue, I would much rather spend the energy and
tuits driving towards ensembl compatibility with the current bioperl
version than backporting to 1.2.3. What about having users popping in
with bug reports on list (here or ensembl) about bioperl versions 5+
years out-of-date? Furthermore, it's a slippery slope; the next thing
will be requests to backport specific bug fixes in the current branch
to 1.2.3.

Who's willing to maintain that branch? We have few devs as it is, so
is someone on the ensembl end willing to take that up?

Perl 5 development has been held up with the same issues, something
they have recently just started digging themselves out of.
Regardless, I think way too many changes have occurred in that
particular code that make such endeavors unrealistic, unfeasible, and
unmaintainable.

chris
Post by Albert Vilella
Hi all,
In Ensembl, we are interested in providing NeXML dumps for our
Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a
BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 14:16:46 UTC
Permalink
Albert,

Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run.
Do we know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to believe 1.2.3 is absolutely required)? The previous
answers have been pretty nebulous and unspecific.

I would have to go on record as being opposed to this. If there is a
true compatibility issue, I would much rather spend the energy and
tuits driving towards ensembl compatibility with the current bioperl
version than backporting to 1.2.3. What about having users popping in
with bug reports on list (here or ensembl) about bioperl versions 5+
years out-of-date? Furthermore, it's a slippery slope; the next thing
will be requests to backport specific bug fixes in the current branch
to 1.2.3.

Who's willing to maintain that branch? We have few devs as it is, so
is someone on the ensembl end willing to take that up?

Perl 5 development has been held up with the same issues, something
they have recently just started digging themselves out of.
Regardless, I think way too many changes have occurred in that
particular code that make such endeavors unrealistic, unfeasible, and
unmaintainable.

chris
Post by Albert Vilella
Hi all,
In Ensembl, we are interested in providing NeXML dumps for our
Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a
BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields
2009-05-14 14:16:46 UTC
Permalink
Albert,

Just to note, I have been using bioperl 1.6.0 with the ensembl API w/o
problems, and Sendu Bala added an ensembl 'wrapper' to bioperl-run.
Do we know precisely what breaks btwn 1.2.3 and 1.6 (and thus leads
everyone to believe 1.2.3 is absolutely required)? The previous
answers have been pretty nebulous and unspecific.

I would have to go on record as being opposed to this. If there is a
true compatibility issue, I would much rather spend the energy and
tuits driving towards ensembl compatibility with the current bioperl
version than backporting to 1.2.3. What about having users popping in
with bug reports on list (here or ensembl) about bioperl versions 5+
years out-of-date? Furthermore, it's a slippery slope; the next thing
will be requests to backport specific bug fixes in the current branch
to 1.2.3.

Who's willing to maintain that branch? We have few devs as it is, so
is someone on the ensembl end willing to take that up?

Perl 5 development has been held up with the same issues, something
they have recently just started digging themselves out of.
Regardless, I think way too many changes have occurred in that
particular code that make such endeavors unrealistic, unfeasible, and
unmaintainable.

chris
Post by Albert Vilella
Hi all,
In Ensembl, we are interested in providing NeXML dumps for our
Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.
If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.
Looking forward to hearing from this SoC. Have you got a blog?
Cheers,
Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a
BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-14 13:45:17 UTC
Permalink
Hi all,

In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.

If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.

Looking forward to hearing from this SoC. Have you got a blog?

Cheers,

Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-14 13:45:17 UTC
Permalink
Hi all,

In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.

If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.

Looking forward to hearing from this SoC. Have you got a blog?

Cheers,

Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Albert Vilella
2009-05-14 13:45:17 UTC
Permalink
Hi all,

In Ensembl, we are interested in providing NeXML dumps for our Comparative
Genomics data. Because our pipeline is
written in Perl, I guess most of the work done here will be of great use to
us.

If I could only ask for only a feature, that would be to *try* and backport
the NeXML support to bioperl-1.2.3 --- stress on the *try*. Bioperl 1.2.3 is
the release that Ensembl decided to stick to many years ago, so it's cleaner
for people to use our Perl API with only one version of bioperl as a
dependency.

Looking forward to hearing from this SoC. Have you got a blog?

Cheers,

Albert.
Post by Jason Stajich
Welcome Chase.
Look forward to the project and helping where needed.
-jason
Hello all,
Post by Mark A. Jensen
With great pleasure, I want to introduce Chase Miller, my Google Summer of
Code student from George Washington University, to the community. Chase will
be working with me and Rutger Vos on a BioPerl wrapper for Rutger's
Bio::Phylo package, with a particular emphasis on creating a BioPerl-native
way to import and export the NeXML (http://nexml.org) phylogenetic data
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him warmly, as
you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-18 21:52:31 UTC
Permalink
Hi guys,
Thanx for your suggestions.

With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.

sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids

* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.

I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.

The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254


So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.


Thanx again for your help,



Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz


Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people




=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
bill
2009-05-19 00:19:53 UTC
Permalink
Hi, Smithies,

Using an integral local id should work as well.

A define will look like '>lcl|12345 ...'

Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 01:44:35 UTC
Permalink
No, that doesn't work :-(
Here's some blast output with the database formatted with local ids:
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN Unknown 421 e-117
sp|P15711|104K_THEPA Unknown 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33 4.2


Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209

===========================================================================

If I tweak the fasta and change the ids from lcl to gi and re-formatdb, all works correctly:

===========================================================================
Query= test
(612 letters)

Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theile... 421 e-117
sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theile... 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Probable chemoreceptor glutamine deamidase ... 33 4.2
sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theileria annulata GN=TA08425
PE=3 SV=1
Length = 893

Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 201/209 (96%), Positives = 201/209 (96%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED
Sbjct: 72 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 131

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
QYLA IHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD
Sbjct: 132 QYLARLQQLRQQIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 191

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY
Sbjct: 192 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 251

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
VATIPKLKDFAEPYHPIILDISDIDYVNF
Sbjct: 252 VATIPKLKDFAEPYHPIILDISDIDYVNF 280

============================================================================

To my mind, this is a bug in formatdb but NCBI don't see it that way.

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 12:20 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
Hi, Smithies,
Using an integral local id should work as well.
A define will look like '>lcl|12345 ...'
Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 03:13:19 UTC
Permalink
I could not see the difference.

Do you follow the rules for FASTA defline:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632

Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
Smithies, Russell
2009-05-19 03:43:20 UTC
Permalink
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-19 03:54:41 UTC
Permalink
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.

I think we'll email NCBI and tell them they broke formatdb in their "upgrade"

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Bernd Web
2009-05-20 19:19:25 UTC
Permalink
Hi Russel,

Thanks for posting this issue. I have the same problem with formatdb
from 2.2.19. When using my older still installed 2.2.17 everything was
fine again :)
So you were lucky to revert from to 2.2.18.


Regards,
Bernd

On Tue, May 19, 2009 at 5:54 AM, Smithies, Russell
Post by Smithies, Russell
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.
I think we'll email NCBI and tell them they broke formatdb in their "upgrade"
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in
the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry
name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Bernd Web
2009-05-20 19:19:25 UTC
Permalink
Hi Russel,

Thanks for posting this issue. I have the same problem with formatdb
from 2.2.19. When using my older still installed 2.2.17 everything was
fine again :)
So you were lucky to revert from to 2.2.18.


Regards,
Bernd

On Tue, May 19, 2009 at 5:54 AM, Smithies, Russell
Post by Smithies, Russell
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.
I think we'll email NCBI and tell them they broke formatdb in their "upgrade"
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in
the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry
name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Bernd Web
2009-05-20 19:19:25 UTC
Permalink
Hi Russel,

Thanks for posting this issue. I have the same problem with formatdb
from 2.2.19. When using my older still installed 2.2.17 everything was
fine again :)
So you were lucky to revert from to 2.2.18.


Regards,
Bernd

On Tue, May 19, 2009 at 5:54 AM, Smithies, Russell
Post by Smithies, Russell
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.
I think we'll email NCBI and tell them they broke formatdb in their "upgrade"
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in
the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry
name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Bernd Web
2009-05-20 19:19:25 UTC
Permalink
Hi Russel,

Thanks for posting this issue. I have the same problem with formatdb
from 2.2.19. When using my older still installed 2.2.17 everything was
fine again :)
So you were lucky to revert from to 2.2.18.


Regards,
Bernd

On Tue, May 19, 2009 at 5:54 AM, Smithies, Russell
Post by Smithies, Russell
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.
I think we'll email NCBI and tell them they broke formatdb in their "upgrade"
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in
the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry
name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 03:54:41 UTC
Permalink
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.

I think we'll email NCBI and tell them they broke formatdb in their "upgrade"

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 03:54:41 UTC
Permalink
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.

I think we'll email NCBI and tell them they broke formatdb in their "upgrade"

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 03:54:41 UTC
Permalink
We re-installed blast version 2.2.18 and everything works perfectly.
It formats the Uniprot fasta as it should and retrieves sequences with fastacmd as it should.

I think we'll email NCBI and tell them they broke formatdb in their "upgrade"

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
Sent: Tuesday, 19 May 2009 3:43 p.m.
To: 'bill at genenformics.com'; 'bioperl-l at lists.open-bio.org'
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
Post by Smithies, Russell
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Post by Smithies, Russell
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 03:43:20 UTC
Permalink
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-19 03:43:20 UTC
Permalink
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Smithies, Russell
2009-05-19 03:43:20 UTC
Permalink
There's no descriptions in the top of the blast output and no accessions in the alignments.
The fasta is coming from UniProt so surely they know how to format files.
And it does match what NCBI require in their defline i.e. sp|accession|entry name

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 3:13 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
I could not see the difference.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632
Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
bill
2009-05-19 03:13:19 UTC
Permalink
I could not see the difference.

Do you follow the rules for FASTA defline:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632

Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
bill
2009-05-19 03:13:19 UTC
Permalink
I could not see the difference.

Do you follow the rules for FASTA defline:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632

Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
bill
2009-05-19 03:13:19 UTC
Permalink
I could not see the difference.

Do you follow the rules for FASTA defline:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.table.632

Bill
Post by Smithies, Russell
No, that doesn't work :-(
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits) Value
sp|Q4U9M9|104K_THEAN Unknown 421
e-117
sp|P15711|104K_THEPA Unknown 265
6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33
4.2
Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)
Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
===========================================================================
If I tweak the fasta and change the ids from lcl to gi and re-formatdb,
===========================================================================
Query= test
(612 letters)
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters
Searching..................................................done
%2
Smithies, Russell
2009-05-19 01:44:35 UTC
Permalink
No, that doesn't work :-(
Here's some blast output with the database formatted with local ids:
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN Unknown 421 e-117
sp|P15711|104K_THEPA Unknown 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33 4.2


Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209

===========================================================================

If I tweak the fasta and change the ids from lcl to gi and re-formatdb, all works correctly:

===========================================================================
Query= test
(612 letters)

Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theile... 421 e-117
sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theile... 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Probable chemoreceptor glutamine deamidase ... 33 4.2
sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theileria annulata GN=TA08425
PE=3 SV=1
Length = 893

Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 201/209 (96%), Positives = 201/209 (96%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED
Sbjct: 72 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 131

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
QYLA IHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD
Sbjct: 132 QYLARLQQLRQQIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 191

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY
Sbjct: 192 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 251

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
VATIPKLKDFAEPYHPIILDISDIDYVNF
Sbjct: 252 VATIPKLKDFAEPYHPIILDISDIDYVNF 280

============================================================================

To my mind, this is a bug in formatdb but NCBI don't see it that way.

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 12:20 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
Hi, Smithies,
Using an integral local id should work as well.
A define will look like '>lcl|12345 ...'
Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 01:44:35 UTC
Permalink
No, that doesn't work :-(
Here's some blast output with the database formatted with local ids:
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN Unknown 421 e-117
sp|P15711|104K_THEPA Unknown 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33 4.2


Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209

===========================================================================

If I tweak the fasta and change the ids from lcl to gi and re-formatdb, all works correctly:

===========================================================================
Query= test
(612 letters)

Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theile... 421 e-117
sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theile... 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Probable chemoreceptor glutamine deamidase ... 33 4.2
sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theileria annulata GN=TA08425
PE=3 SV=1
Length = 893

Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 201/209 (96%), Positives = 201/209 (96%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED
Sbjct: 72 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 131

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
QYLA IHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD
Sbjct: 132 QYLARLQQLRQQIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 191

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY
Sbjct: 192 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 251

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
VATIPKLKDFAEPYHPIILDISDIDYVNF
Sbjct: 252 VATIPKLKDFAEPYHPIILDISDIDYVNF 280

============================================================================

To my mind, this is a bug in formatdb but NCBI don't see it that way.

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 12:20 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
Hi, Smithies,
Using an integral local id should work as well.
A define will look like '>lcl|12345 ...'
Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Smithies, Russell
2009-05-19 01:44:35 UTC
Permalink
No, that doesn't work :-(
Here's some blast output with the database formatted with local ids:
=====================================================================
Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN Unknown 421 e-117
sp|P15711|104K_THEPA Unknown 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Unknown 33 4.2


Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 0/209 (0%), Positives = 0/209 (0%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209

===========================================================================

If I tweak the fasta and change the ids from lcl to gi and re-formatdb, all works correctly:

===========================================================================
Query= test
(612 letters)

Database: uniprot_sprot.fasta
466,739 sequences; 165,389,953 total letters

Searching..................................................done



Score E
Sequences producing significant alignments: (bits) Value

sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theile... 421 e-117
sp|P15711|104K_THEPA 104 kDa microneme/rhoptry antigen OS=Theile... 265 6e-70
sp|Q2SPQ2|CHED_HAHCH Probable chemoreceptor glutamine deamidase ... 33 4.2
sp|Q4U9M9|104K_THEAN 104 kDa microneme/rhoptry antigen OS=Theileria annulata GN=TA08425
PE=3 SV=1
Length = 893

Score = 421 bits (1083), Expect = e-117, Method: Compositional matrix adjust.
Identities = 201/209 (96%), Positives = 201/209 (96%)

Query: 1 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 60
VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED
Sbjct: 72 VHKVVEGDIVIWENEEMPLYTCAIVTQNEVPYMAYVELLEDPDLIFFLKEGDQWAPIPED 131

Query: 61 QYLAXXXXXXXXIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 120
QYLA IHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD
Sbjct: 132 QYLARLQQLRQQIHTESFFSLNLSFQHENYKYEMVSSFQHSIKMVVFTPKNGHICKMVYD 191

Query: 121 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 180
KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY
Sbjct: 192 KNIRIFKALYNEYVTSVIGFFRGLKLLLLNIFVIDDRGMIGNKYFQLLDDKYAPISVQGY 251

Query: 181 VATIPKLKDFAEPYHPIILDISDIDYVNF 209
VATIPKLKDFAEPYHPIILDISDIDYVNF
Sbjct: 252 VATIPKLKDFAEPYHPIILDISDIDYVNF 280

============================================================================

To my mind, this is a bug in formatdb but NCBI don't see it that way.

--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
bounces at lists.open-bio.org] On Behalf Of bill at genenformics.com
Sent: Tuesday, 19 May 2009 12:20 p.m.
To: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
Hi, Smithies,
Using an integral local id should work as well.
A define will look like '>lcl|12345 ...'
Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:19:53 UTC
Permalink
Hi, Smithies,

Using an integral local id should work as well.

A define will look like '>lcl|12345 ...'

Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:19:53 UTC
Permalink
Hi, Smithies,

Using an integral local id should work as well.

A define will look like '>lcl|12345 ...'

Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
bill
2009-05-19 00:19:53 UTC
Permalink
Hi, Smithies,

Using an integral local id should work as well.

A define will look like '>lcl|12345 ...'

Bill
Post by Smithies, Russell
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and
created lists of swissprot IDs for both the file from NCBI and the file
from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none
could be found in the "protein" database at NCBI but all were in the
"gene" database as "reference sequences that belong to a specific genome
build" and all belonged to recently sequenced bacterial genomes. As none
are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random
sample again) described as "putative protein" (or "very putative protein"
in one case) and are the result of gene predictions. Eg
http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids
unique to Uniprot and assign them fake GI numbers so I can formatdb them
with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Mark A. Jensen
2009-05-11 14:31:35 UTC
Permalink
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google Summer of Code
student from George Washington University, to the community. Chase will be
working with me and Rutger Vos on a BioPerl wrapper for Rutger's Bio::Phylo
package, with a particular emphasis on creating a BioPerl-native way to import
and export the NeXML (http://nexml.org) phylogenetic data format. He wrote a
great proposal, available here:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit.
We will be working throughout the summer on the project, and will of course come
to you for sage advice. I know you will welcome him warmly, as you did me.
Cheers,
Mark
Hilmar Lapp
2009-05-11 16:09:20 UTC
Permalink
Welcome to the fold, Chase, and looking forward to the project! :-)

-hilmar
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Jason Stajich
2009-05-11 16:24:06 UTC
Permalink
Welcome Chase.

Look forward to the project and helping where needed.

-jason
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
Smithies, Russell
2009-05-18 21:52:31 UTC
Permalink
Hi guys,
Thanx for your suggestions.

With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.

sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids

* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.

I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.

The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254


So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.


Thanx again for your help,



Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz


Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people




=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Mark A. Jensen
2009-05-11 14:31:35 UTC
Permalink
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google Summer of Code
student from George Washington University, to the community. Chase will be
working with me and Rutger Vos on a BioPerl wrapper for Rutger's Bio::Phylo
package, with a particular emphasis on creating a BioPerl-native way to import
and export the NeXML (http://nexml.org) phylogenetic data format. He wrote a
great proposal, available here:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit.
We will be working throughout the summer on the project, and will of course come
to you for sage advice. I know you will welcome him warmly, as you did me.
Cheers,
Mark
Hilmar Lapp
2009-05-11 16:09:20 UTC
Permalink
Welcome to the fold, Chase, and looking forward to the project! :-)

-hilmar
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Jason Stajich
2009-05-11 16:24:06 UTC
Permalink
Welcome Chase.

Look forward to the project and helping where needed.

-jason
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
Smithies, Russell
2009-05-18 21:52:31 UTC
Permalink
Hi guys,
Thanx for your suggestions.

With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.

sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids

* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.

I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.

The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254


So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.


Thanx again for your help,



Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz


Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people




=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Mark A. Jensen
2009-05-11 14:31:35 UTC
Permalink
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google Summer of Code
student from George Washington University, to the community. Chase will be
working with me and Rutger Vos on a BioPerl wrapper for Rutger's Bio::Phylo
package, with a particular emphasis on creating a BioPerl-native way to import
and export the NeXML (http://nexml.org) phylogenetic data format. He wrote a
great proposal, available here:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit.
We will be working throughout the summer on the project, and will of course come
to you for sage advice. I know you will welcome him warmly, as you did me.
Cheers,
Mark
Hilmar Lapp
2009-05-11 16:09:20 UTC
Permalink
Welcome to the fold, Chase, and looking forward to the project! :-)

-hilmar
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
Jason Stajich
2009-05-11 16:24:06 UTC
Permalink
Welcome Chase.

Look forward to the project and helping where needed.

-jason
Post by Mark A. Jensen
Hello all,
With great pleasure, I want to introduce Chase Miller, my Google
Summer of Code student from George Washington University, to the
community. Chase will be working with me and Rutger Vos on a BioPerl
wrapper for Rutger's Bio::Phylo package, with a particular emphasis
on creating a BioPerl-native way to import and export the NeXML (http://nexml.org
) phylogenetic data format. He wrote a great proposal, available
here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit
.
We will be working throughout the summer on the project, and will of
course come to you for sage advice. I know you will welcome him
warmly, as you did me.
Cheers,
Mark
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
Smithies, Russell
2009-05-18 21:52:31 UTC
Permalink
Hi guys,
Thanx for your suggestions.

With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.

sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids

* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.

I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.

The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254


So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.


Thanx again for your help,



Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz


Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people




=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
Loading...