Sources and Issues Next Release
Note |
This is a planning template for the next release. It does not correspond to a released product. See http://irefindex.uio.no/ for the most recent release and related documentation. This page can be used to create the sources page. Check for xxx before copying and pasting to the appropriate sources page for the new release. Do not edit xxx in this page. Leave this page as a template. After making a new release page, update the general Sources for iRefIndex redirect page. |
Last edited: 2013-03-15
Applies to iRefIndex release: xxx
Release date: xxx
Authors: Ian Donaldson, Sabry Razick and Paul Boddie
Database: iRefIndex (http://irefindex.uio.no)
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex. Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.
- For statistics on full public dataset please refer to: http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx
- For statistics on the public dataset (distributed on the FTP site contains) please refer to:http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_xxx
Contents
Issues
BioGRID interaction record ids (pre-build issue)
Capture BioGRID interaction record ids so iRefWeb can link out to BioGRID.
The only interaction id available from the BioGRID files are already being used and also there in the iRefWeb, such as...
<primaryRef db="grid" id="103" refType="identity" refTypeAc="MI:0356" dbAc="MI:0463" />
See Bugzilla:250.
MITAB/iRefScape canonicalization
Change this to choose canonical sequence rather than longest sequence (mapping score L). Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.
See Bugzilla:255.
PDB identifiers
In previous releases we have replaced the pipe character (|) of the PDB identifiers with an underscore character (_) . In this release, this is only done when there are multiple database:accession entries in a field otherwise the |) character is maintained as part of the PDB identifier. This is a regression and will be corrected in a future release.
IMEX identifiers
IMEx identifiers should be present in column 52 but appear to be missing. This is a regression and will be corrected in a future release. There are 6004 lines in release 10 with imex:... This number needs to be cross-checked before the next release.
Compatibility with Java PSI parser needs to be improved
Java parser from psimi https://code.google.com/p/psimi/downloads/detail?name=psimitab-1.8.3-distribution.zip. But there are at least a few examples where the files don't follow the specs:
-reserved characters are not quoted.
Like for instance in file for human:
taxid:11706(HIV-1 M:B_HXB2R) taxid:10299(Herpes simplex virus (type 1 / strain 17)) go:GO:0005783|rigid:d//bz+DaMrbuxGA3i1Xe4hqlrXI|edgetype:X
In case of controlled terms to be standard conform it should look like this:
psi-mi:"MI:0496"(bait)
-empty columns need to be consistently filled with '-' . For example, column 15 in the human file.
-dates should be represented as yyyy/mm/dd but look like yyyy-mm-dd Thanks to Thomas Schmitt for pointing out these problems
Build issues
Source | Format | Location | Version (date) |
BIND | Tab-delimited text file. | ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below).
20050525.complex2refs.txt 20050525.ints.txt 20050525.refs.txt 20050525.complexes.txt 20050525.labels.txt 20050525.complex2subunits.txt These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/ For historical purposes, a snapshot of the the Blueprint web-site may be viewed at... http://web.archive.org/web/20050204013426/www.blueprint.org/index.html ...via the internet archive at... |
2005-05-25 |
BIND Translation | PSI-MI 2.5 | http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz | Version 1.0 (2010-12-15) |
BioGRID | PSI-MI 2.5 | http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.81/BIOGRID-ALL-3.1.81.psi25.zip | Version 3.1.81 (2011-10-01) |
CORUM | PSI-MI 2.5 | http://mips.gsf.de/genre/proj/corum/index.html http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip |
2009-12-02 |
DIP | PSI-MI 2.5 | http://dip.doe-mbi.ucla.edu/dip/Download.cgi
|
2010-10-10 |
HPRD | PSI-MI 2.5 | http://www.hprd.org/download HPRD_PSIMI_041310.tar.gz |
Release 9 (2010-04-13) |
IntAct | PSI-MI 2.5 | ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-09-29/psi25/pmidMIF25.zip | 2011-09-29 |
MINT | PSI-MI 2.5 | ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/ | 2010-12-21 |
MPACT | PSI-MI 2.5 | ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz | 2008-01-10 |
MPPI | PSI-MI 1.0 | http://mips.gsf.de/proj/ppi/data/mppi.gz | 2004-06-01 (from archive) |
OPHID | PSI-MI 1.0 | http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/) | 2006-07-07 |
New for this release | |||
InnateDB | PSI-MI 2.5 | http://www.innatedb.com/download.jsp Curated InnateDB Data |
2011-03-06 |
MPIDB | MITAB format file | http://www.jcvi.org/mpidb (information) http://www.jcvi.org/mpidb/download.php (general downloads) |
Downloaded on 2011-10-03 |
MatrixDB | PSI-MI 2.5 | http://matrixdb.ibcp.fr/ MatrixDB_20100826.xml.zip |
2010-08-26 (timestamp) |
Source | Format | Location | Version (date) |
SEGUID | Tab-delimited text | ftp://bioinformatics.anl.gov/seguid/ seguidannotation |
2007-07-24 (timestamp) |
UniProt | Text | http://www.uniprot.org/downloads UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz) |
UniProt Knowledgebase Release 2011_09 (2011-09-21) (Downloaded on 2011-10-04): UniProtKB/Swiss-Prot UniProtKB/TrEMBL (from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt) |
UniProt | Text | http://www.uniprot.org/downloads UniProtKB/TrEMBL (uniprot_trembl.dat.gz) | |
UniProt, IsoForms | FASTA | http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz | |
UniProt, SGD | Tab-delimited text file. | http://www.expasy.org/cgi-bin/lists?yeast.txt Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD | |
UniProt, FLY | Tab-delimited text file. | http://www.expasy.org/cgi-bin/lists?fly.txt Drosophila: entries, gene names and cross-references to FlyBase. | |
NCBI, RefSeq | GenPept | ftp://ftp.ncbi.nih.gov/refseq/release/complete see *.protein.gpff.gz files |
Release 49 (2011-09-09) (Downloaded on 2011-10-04) (from http://www.ncbi.nlm.nih.gov/refseq/) |
NCBI, MMDB/PDB | Tab-delimited text | ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table | (Downloaded on 2011-10-04) |
NCBI, PDB sequences | FASTA | ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz | (Downloaded on 2011-10-03) |
NCBI Gene2Refseq | Tab-delimited text | ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ gene2refseq.gz |
(Downloaded on 2011-10-04) |
All iRefIndex Pages
Follow this link for a listing of all iRefIndex related pages (archived and current).