Difference between revisions of "Sources and Issues Next Release"
Line 148: | Line 148: | ||
== Sequence related resources == | == Sequence related resources == | ||
− | {| {{table}} | + | == Sequence related resources == |
+ | |||
+ | {| {{table}} cellpadding="10" cellspacing="0" border="1" | ||
| align="center" style="background:#f0f0f0;"|'''Source''' | | align="center" style="background:#f0f0f0;"|'''Source''' | ||
| align="center" style="background:#f0f0f0;"|'''Format''' | | align="center" style="background:#f0f0f0;"|'''Format''' | ||
Line 154: | Line 156: | ||
| align="center" style="background:#f0f0f0;"|'''Version (date)''' | | align="center" style="background:#f0f0f0;"|'''Version (date)''' | ||
|- | |- | ||
− | | SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation|| | + | | SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation||July 24th, 2007 (timestamp) |
|- | |- | ||
| UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz) | | UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz) | ||
− | | rowspan="5" | UniProt Knowledgebase Release | + | | rowspan="5" | UniProt Knowledgebase Release 2010_12(Downloaded 21-December-2010):<br>UniProtKB/Swiss-Prot <br>UniProtKB/TrEMBL <br>(from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt) |
|- | |- | ||
| UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz) | | UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz) | ||
Line 167: | Line 169: | ||
| UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase. | | UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase. | ||
|- | |- | ||
− | | NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release | + | | NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release 44 (Downloaded on December 21st, 2010)<br>(from http://www.ncbi.nlm.nih.gov/refseq/) |
|- | |- | ||
− | | NCBI, | + | | NCBI, MMDB/PDB||Tab-delimited text ||ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table|| (Downloaded on December 21st, 2010) |
|- | |- | ||
− | | NCBI, | + | | NCBI, PDB sequences||FASTA||ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz||(Downloaded on December 21st, 2010) |
|- | |- | ||
− | | NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on | + | | NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on December 21st, 2010) |
− | |||
− | |||
|} | |} | ||
Revision as of 10:52, 6 June 2011
This is a planning template for the next release. It does not correspond to a released product. See http://irefindex.uio.no/ for the most recent release and related documentation. This page can be used to create the sources page. Check for xxx before cut and paste to the appropriate sources page for the new release. Do not edit xxx in this page. Leave this page as a template.
Last edited: 2011-06-06
Applies to iRefIndex release: xxx
Release date: xxx
Authors: Ian Donaldson, Sabry Razick and Paul Boddie
Database: iRefIndex (http://irefindex.uio.no)
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex. Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.
- For statistics on full public dataset please refer to: http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx
- For statistics on the public dataset (distributed on the FTP site contains) please refer to:http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_xxx
Contents
Issues
Hard Release date: July 1st.
Yeast taxon id changes See http://www.uniprot.org/news/2011/05/03/release
New databases To Be Discussed
BioGrid interaction record ids (pre-build issue) To Be Done
Capture Biogrid interaction record ids so iRefWeb can link out to BioGrid.
- The only interaction id available from the BioGrid files are already being used and also there in the iRefWeb
e.g <primaryRef db="grid" id="103" refType="identity" refTypeAc="MI:0356" dbAc="MI:0463" />
RIGID recalculation (pre-build issue)
See bug 242. Modify existing RIGID table or loose continuity of iRIGIDs with last release.
Taxon specific MITAB files (post-processing issue)
Taxon specific files should contain interactions ONLY if one or both taxa, taxb have the appropriate taxon (regardless of what the source database said the interaction taxon was. Change README. Example see PMID http://wodaklab.org/iRefWeb/pubReport/detail?pubmed=12565857+ A "mouse" interaction from HPRD lists only human interactors (the paper is about mouse and they have made a transfer to human without noting what they have done.) As a result, this human interaction ends up in the mouse MITAB (because HPRD says it was mouse). BioGRID correctly curates the paper as about mouse.
CORUM methods (code change implemented)
Ensure that all CORUM methods (with MI terms) are parsed.
- This is now fixed (Sabry) - please get the latest mapper from CVS
Repeated lines (post-processing issue)
There are multiple lines that are repeated many times. These appear to arise from BIND 3DBP division (see for example lines 5,13,117,125 in Ecoli MITAB and others arising from BIND ID 92720 - 44 pieces of experimental evidence and 5 PMIDs) because the accessions for the different experimental forms are not present in MITAB. See Antonio and bug# 245. Could be handled as a post-processing step on MITAB to take the unique set of all MITAB lines.
MITAB/irefscape canonicalization (post-processing issue)
Change this to choose canonical sequence rather than longest sequence (mapping score L). Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.
Decided not to chnage L method...instead:
Resolve by distributing non-canonicalized data as before AND a canonicalized MITAB file with complete provenance info (this will become the main MITAB file we release and it will support PSICQUIC services and we will drop non-canonicalised version in future releases). Also, canonicalize irefscape data and include provenace data for interactors in edge attribute viewer.
Requires review of current MITAB file format by Ian.
Other issues
- Discuss the way to include I2D -- No I2D will not be included
- Parse all new datasets to a temporary database and test before homogenizing. -- not required, no new data sources
- Whether to use both BIND text and BIND_Translation OR only one of them
- The default output data will be the canonical form. The MITAB will have the canonical Accession as the UIDA and UIDB. There will be new columns beforeCanonicalizationReferenceA, beforeCanonicalizationReferenceB. The aliases and the alternative identifiers will be of the canonical group not of a specific protein. With the new columns and all the references, it has to be tested whether the row width will exceed any thresholds (e.g. MySQL maximum row with), (I assume this would not be a problem).
- For iRefScape, once the canonicalization is performed there will be no "uncanonicalize option" (currently there is a option to use canonical expansion).
Source | Format | Location | Version (date) | |
BIND, | Tab-delimited text file. | ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below).
20050525.complex2refs.txt 20050525.ints.txt 20050525.refs.txt 20050525.complexes.txt 20050525.labels.txt 20050525.complex2subunits.txt These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/ For historical purposes, a snapshot of the the Blueprint web-site may be viewed at... http://web.archive.org/web/20050204013426/www.blueprint.org/index.html ...via the internet archive at... |
25th May, 2005 | |
BINDTranslate | PSI-MI 2.5 | http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz | Version 1.0 (December 15th, 2010) | |
BioGRID | PSI-MI 2.5 | http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.77/BIOGRID-ALL-3.1.77.psi25.zip | Version 3.1.77 (June 1st, 2011) | |
CORUM | PSI-MI 2.5 | http://mips.gsf.de/genre/proj/corum/index.html http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip |
December 2nd, 2009 | |
DIP | PSI-MI 2.5 | http://dip.doe-mbi.ucla.edu/dip/Download.cgi
|
October 10th, 2010 | |
HPRD, | PSI-MI 2.5 | http://www.hprd.org/download/ HPRD_PSIMI_041310.tar.gz |
Release 9. April 13th, 2010 | |
IntAct, | PSI-MI 2.5 | ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-05-23/psi25/pmidMIF25.zip | May 25th, 2011 | |
MINT | PSI-MI 2.5 | ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/ | December 21st, 2010 | |
MPACT | PSI-MI 2.5 | ftp://ftpmips.gsf.de/yeast/PPI/ mpact-complete.psi25.xml.gz |
January 10th, 2008 | |
MPPI | PSI-MI 1.0 | http://mips.gsf.de/proj/ppi/data/mppi.gz | June 1st, 2004 (from archive) | |
OPHID | PSI-MI 1.0 | http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/) | July 07th, 2006 | |
New for this release | ||||
InnateDB | PSI-MI 2.5 | http://www.innatedb.com/download.jsp Curated InnateDB Data |
2011-03-06 | |
MPIDB | MITAB format file | http://www.jcvi.org/mpidb http://www.jcvi.org/mpidb/download.php |
Downloaded on ***?*** | |
MatrixDB | PSI-MI 2.5 | http://matrixdb.ibcp.fr/ MatrixDB_20100826.xml.zip |
August 26th, 2010 (timestamp) |
Source | Format | Location | Version (date) |
SEGUID | Tab-delimited text | ftp://bioinformatics.anl.gov/seguid/ seguidannotation |
July 24th, 2007 (timestamp) |
UniProt | Text | http://www.uniprot.org/downloads UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz) |
UniProt Knowledgebase Release 2010_12(Downloaded 21-December-2010): UniProtKB/Swiss-Prot UniProtKB/TrEMBL (from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt) |
UniProt | Text | http://www.uniprot.org/downloads UniProtKB/TrEMBL (uniprot_trembl.dat.gz) | |
UniProt, IsoForms | FASTA | http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz | |
UniProt, SGD | Tab-delimited text file. | http://www.expasy.org/cgi-bin/lists?yeast.txt Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD | |
UniProt, FLY | Tab-delimited text file. | http://www.expasy.org/cgi-bin/lists?fly.txt Drosophila: entries, gene names and cross-references to FlyBase. | |
NCBI, RefSeq | GenPept | ftp://ftp.ncbi.nih.gov/refseq/release/complete see *.protein.gpff.gz files |
Release 44 (Downloaded on December 21st, 2010) (from http://www.ncbi.nlm.nih.gov/refseq/) |
NCBI, MMDB/PDB | Tab-delimited text | ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table | (Downloaded on December 21st, 2010) |
NCBI, PDB sequences | FASTA | ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz | (Downloaded on December 21st, 2010) |
NCBI Gene2Refseq | Tab-delimited text | ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ gene2refseq.gz |
(Downloaded on December 21st, 2010) |
All iRefIndex Pages
Follow this link for a listing of all iRefIndex related pages (archived and current).