Difference between revisions of "Sources and Issues Next Release"

From irefindex
Line 27: Line 27:
 
'''Hard Release date: July 1st.'''
 
'''Hard Release date: July 1st.'''
  
'''Taxon specific MITAB files'''
+
'''Yeast taxon id changes'''
 +
See http://www.uniprot.org/news/2011/05/03/release
 +
 
 +
'''New databases'''
 +
 
 +
'''BioGrid interaction record ids (pre-build issue)'''
 +
 
 +
Capture Biogrid interaction record ids so iRefWeb can link out to BioGrid.
 +
 
 +
'''RIGID recalculation (pre-build issue)'''
 +
 
 +
See bug 242.
 +
 
 +
'''Taxon specific MITAB files (post-processing issue)'''
  
 
Taxon specific files should contain interactions ONLY if one or both taxa, taxb have the appropriate taxon (regardless of what the source database said the interaction taxon was.  Change README.
 
Taxon specific files should contain interactions ONLY if one or both taxa, taxb have the appropriate taxon (regardless of what the source database said the interaction taxon was.  Change README.
Line 33: Line 46:
 
A "mouse" interaction from HPRD lists only human interactors (the paper is about mouse and they have made a transfer to human without noting what they have done.)  As a result, this human interaction ends up in the mouse MITAB (because HPRD says it was mouse).  BioGRID correctly curates the paper as about mouse.
 
A "mouse" interaction from HPRD lists only human interactors (the paper is about mouse and they have made a transfer to human without noting what they have done.)  As a result, this human interaction ends up in the mouse MITAB (because HPRD says it was mouse).  BioGRID correctly curates the paper as about mouse.
  
'''CORUM methods'''
+
'''CORUM methods (code change implemented)'''
  
 
Ensure that all CORUM methods (with MI terms) are parsed.
 
Ensure that all CORUM methods (with MI terms) are parsed.
  
'''Repeated lines'''
+
'''Repeated lines (post-processing issue)'''
  
 
There are multiple lines that are repeated many times.  These appear to arise from BIND 3DBP division (see for example lines 5,13,117,125 in Ecoli MITAB and others arising from BIND ID 92720 - 44 pieces of experimental evidence and 5 PMIDs) because the accessions for the different experimental forms are not present in MITAB.  See Antonio and bug# 245. Could be handled as a post-processing step on MITAB to take the unique set of all MITAB lines.
 
There are multiple lines that are repeated many times.  These appear to arise from BIND 3DBP division (see for example lines 5,13,117,125 in Ecoli MITAB and others arising from BIND ID 92720 - 44 pieces of experimental evidence and 5 PMIDs) because the accessions for the different experimental forms are not present in MITAB.  See Antonio and bug# 245. Could be handled as a post-processing step on MITAB to take the unique set of all MITAB lines.
  
'''RIGID recalculation'''
+
'''MITAB/irefscape canonicalization (post-processing issue)'''
 
 
See bug 242.
 
 
 
'''Resolution of ambiguous gene ids'''
 
  
 
Change this to choose canonical sequence rather than longest sequence (mapping score L).
 
Change this to choose canonical sequence rather than longest sequence (mapping score L).
Line 54: Line 63:
 
Resolve by distributing non-canonicalized data as before AND a canonicalized MITAB file with complete provenance info (this will become the main MITAB file we release and it will support PSICQUIC services and we will drop non-canonicalised version in future releases).  Also, canonicalize irefscape data and include provenace data for interactors in edge attribute viewer.   
 
Resolve by distributing non-canonicalized data as before AND a canonicalized MITAB file with complete provenance info (this will become the main MITAB file we release and it will support PSICQUIC services and we will drop non-canonicalised version in future releases).  Also, canonicalize irefscape data and include provenace data for interactors in edge attribute viewer.   
  
Requires review of current MITAB file format by Ian.  
+
Requires review of current MITAB file format by Ian.
 
 
'''BioGrid interaction record ids'''
 
 
 
Capture Biogrid interaction record ids so iRefWeb can link out to BioGrid.
 
 
 
'''Yeast taxon id changes'''
 
See http://www.uniprot.org/news/2011/05/03/release
 
 
 
'''New databases'''
 
  
 
== Interaction related resources ==
 
== Interaction related resources ==

Revision as of 11:30, 26 May 2011

This is a planning template for the next release.  It does not correspond to a released product.
See http://irefindex.uio.no/ for the most recent release and related documentation.
This page can be used to create the sources page.  
Check for xxx before cut and paste to the appropriate sources page for the new release. 
Do not edit xxx in this page.  Leave this page as a template.

Last edited: 2011-05-26

Applies to iRefIndex release: xxx

Release date: xxx

Authors: Ian Donaldson, Sabry Razick and Paul Boddie

Database: iRefIndex (http://irefindex.uio.no)

Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)

Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex. Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.

Issues

Hard Release date: July 1st.

Yeast taxon id changes See http://www.uniprot.org/news/2011/05/03/release

New databases

BioGrid interaction record ids (pre-build issue)

Capture Biogrid interaction record ids so iRefWeb can link out to BioGrid.

RIGID recalculation (pre-build issue)

See bug 242.

Taxon specific MITAB files (post-processing issue)

Taxon specific files should contain interactions ONLY if one or both taxa, taxb have the appropriate taxon (regardless of what the source database said the interaction taxon was. Change README. Example see PMID http://wodaklab.org/iRefWeb/pubReport/detail?pubmed=12565857+ A "mouse" interaction from HPRD lists only human interactors (the paper is about mouse and they have made a transfer to human without noting what they have done.) As a result, this human interaction ends up in the mouse MITAB (because HPRD says it was mouse). BioGRID correctly curates the paper as about mouse.

CORUM methods (code change implemented)

Ensure that all CORUM methods (with MI terms) are parsed.

Repeated lines (post-processing issue)

There are multiple lines that are repeated many times. These appear to arise from BIND 3DBP division (see for example lines 5,13,117,125 in Ecoli MITAB and others arising from BIND ID 92720 - 44 pieces of experimental evidence and 5 PMIDs) because the accessions for the different experimental forms are not present in MITAB. See Antonio and bug# 245. Could be handled as a post-processing step on MITAB to take the unique set of all MITAB lines.

MITAB/irefscape canonicalization (post-processing issue)

Change this to choose canonical sequence rather than longest sequence (mapping score L). Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.

Decided not to chnage L method...instead:

Resolve by distributing non-canonicalized data as before AND a canonicalized MITAB file with complete provenance info (this will become the main MITAB file we release and it will support PSICQUIC services and we will drop non-canonicalised version in future releases). Also, canonicalize irefscape data and include provenace data for interactors in edge attribute viewer.

Requires review of current MITAB file format by Ian.

Interaction related resources

Source Format Location Version (date)
BIND, Tab-delimited text file. ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/

20050525.complex2refs.txt

20050525.ints.txt

20050525.refs.txt

20050525.complexes.txt

20050525.labels.txt

20050525.complex2subunits.txt

These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/

For historical purposes, a snapshot of the the Blueprint web-site may be viewed at...

http://web.archive.org/web/20050204013426/www.blueprint.org/index.html

...via the internet archive at...

http://web.archive.org/web/*/http://www.blueprint.org

25th May, 2005
BioGRID PSI-MI 2.5 http://www.thebiogrid.org/downloads.php
/Current Release/BIOGRID-ALL-2.0.61.psi.zip
Version 2.0.61 (January 31st, 2010)
CORUM PSI-MI 2.5 http://mips.gsf.de/genre/proj/corum/index.html
http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip
December 2nd, 2009
DIP PSI-MI 2.5 http://dip.doe-mbi.ucla.edu/dip/Download.cgi
dip20091230.mif25
December 30th, 2009
HPRD, PSI-MI 2.5 http://www.hprd.org/download/
HPRD_SINGLE_PSIMI_070609.xml.tar.gz
Release 8. July 6th, 2009
IntAct, PSI-MI 2.5 ftp://ftp.ebi.ac.uk/pub/databases/intact/2010-01-22/psi25/pmidMIF25.zip January 22nd, 2010
MINT PSI-MI 2.5 ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/ November 11th, 2009
MPACT PSI-MI 2.5 ftp://ftpmips.gsf.de/yeast/PPI/
mpact-complete.psi25.xml.gz
January 10th, 2008
MPPI PSI-MI 1.0 http://mips.gsf.de/proj/ppi/data/mppi.gz June 1st, 2004 (from archive)
I2D PSI-MI 2.5 http://ophid.utoronto.ca/ophidv2.201/downloads.jsp Downloaded on February 8th, 2010

Sequence related resources

Source Format Location Version (date)
SEGUID Tab-delimited text ftp://bioinformatics.anl.gov/seguid/
seguidannotation
August 7th, 2007 (server gives "08/07/107")
UniProt Text http://www.uniprot.org/downloads
UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)
UniProt Knowledgebase Release 15.14:
UniProtKB/Swiss-Prot Release 57.14 (09-Feb-2010)
UniProtKB/TrEMBL Release 40.14 (09-Feb-2010)
UniProt Text http://www.uniprot.org/downloads
UniProtKB/TrEMBL (uniprot_trembl.dat.gz)
UniProt, IsoForms FASTA http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz
UniProt, SGD Tab-delimited text file. http://www.expasy.org/cgi-bin/lists?yeast.txt
Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD
UniProt, FLY Tab-delimited text file. http://www.expasy.org/cgi-bin/lists?fly.txt
Drosophila: entries, gene names and cross-references to FlyBase.
NCBI, RefSeq GenPept ftp://ftp.ncbi.nih.gov/refseq/release/complete
see *.protein.gpff.gz files
Release 39 (January 30th, 2010)
NCBI, MMDB/PDB Tab-delimited text ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table (Downloaded on February 8th, 2010)
NCBI, PDB sequences FASTA ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz (Downloaded on February 8th, 2010)
NCBI Gene2Refseq Tab-delimited text ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
gene2refseq.gz
(Downloaded on February 8th, 2010)

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).