Difference between revisions of "iRefIndex Testing 7.0"

Revision as of 11:04, 11 December 2010

The testing procedure for iRefIndex

1 Cross check with output of element counter
- 1.1 Program to use : biotek.uio.no.XML.Element_Counter (SaxValidator package)
2 Check SEGUID. Check one record each to very the process worked
- 2.1 Test SEGUID updating process
- 2.2 UID overlap testing
3 Check five records each from all data sources
4 Check the legacy data-sources
5 Check SQL tables

Cross check with output of element counter

Program to use : biotek.uio.no.XML.Element_Counter (SaxValidator package)

For each interaction source </interactor> count should match the UID count int_object (select (select name from int_db where int_db.id=source) as intSource, count(uid) from int_object group by source; ).
For each interaction source </interactor> count should match the UID count int_source (select (select name from int_db where int_db.id=source) as intSource, count(uid) from int_source group by source;).
When </interactor> is not usable to count distinct objects (when this occurs as part of interaction and repeated in interactorList) some other suitable element has to be used (e.g </participant>)
Why count the closing elements in the above cases (e.g. </interactor> , instead of <interaction> or </interaction ). The reason is interaction elements may have attributes and elements starting with interaction may be ambiguous. This program uses text matching (to be independent of any XML parsing).

Check SEGUID. Check one record each to very the process worked

Test SEGUID updating process

*SQL query = select orid, count(distinct rog) as rog_C from seguid where orid<0 group by orid;

orid	Record_count
-30	16983
-26	2
-24	78
-23	14
-22	1043258
-21	669761
-12	2679
-11	1665
-8	6525
-7	6547
-6	5235
-5	50305
-3	10853842
-2	11972291

All entries with orid<0 are altered during update. All interies with orid>=0 are original entries from seguid annotation file.

ORID	Description
-30	This is a iRefIndex Complex (RIGID used as ROGID), included in a previous process
-26	Is a OLN dead yeast_acc mapped using UniProt cross reference
-25	Is a SGD acc dead yeast_acc mapped using UniProt cross reference
-24	Is a dead fly_acc mapped using UniProt cross reference
-23	Is a dead PDB
-22	Is a dead RefSeq
-21	Is a dead UniProtKB
-12	Added to SEGUID from original sequence record (N-Scores) in a previous process
-11	Added to SEGUID using Eutils in a previous process
-8	Is a live OLN acc yeast_acc mapped using UniProt cross reference
-7	Is a live SGD acc yeast_acc mapped using UniProt cross reference
-6	Is a live fly_acc mapped using UniProt cross reference
-5	Is a alive PDB
-3	Is a alive RefSeq
-2	Is a alive UniProtKB

UID overlap testing

After parsing it is important to make sure there is no overlap in the UID: The following queries should return empty set:

select * from int_object where int_object.uid in (select uid from int_source)
select * from int_object where int_object.uid in (select uid from int_experiment)
select * from int_source where int_source.uid in (select uid from int_experiment)

Check five records each from all data sources

Check with the file
Check with the website if available

The method is to find the UID range for the source from the int_surce2object table. e.g for IntAct

Select max(sourceid) as max_id , min(sourceid) as min_id from int_source2object where source=5;
List the first interaction (min_id) the last (max_id) and few from the middle
To list node attribute use the int_xref and int_name tables with the objectid

  e.g select * from int_xref where int_xref.uid = <the objectid from int_source2object table>

To get the the interaction attributes use the int_xref and int_name tables with the sourceid
to get Experiment attributes. First get the experimental uid from int_experiment table using sourceid of the int_source2object table. Then for attribute use the int_xref and int_name tables with the uid of the int_experiment table.

Check the legacy data-sources

These are data source where the source data has not change.

verify the the reasons for differences in numbers if any.

Check SQL tables

The following tables should be checked for:

The expected number of rows and columns
Null values (these included MySQL null, 0, -1, -8 and -10).(Some time the null values are allowed and the attempt here is to verify there is no systematic error)
Reserved characters in PSI-MI Tab and XML.
Problems in character encoding.

Table name	Check	What to expect
acc_multiples	NO
addeds	NO
arbitrary	NO
colon_patch	NO
colon_patch_bk	NO
config	NO
cy_edgeatrib	YES	This table is a denormalized table with all interaction attributes. Used when making the RIGID centric TAB file. This is also used when making iRefScape data. Blank values are "-". No fileds should contain NULL.Chack for columns with only "-" as value.
cy_nodeatrib	YES	This table is a denormalized table with all interactor attributes. Used when making the ROGID centric TAB file. This is also used when making iRefScape data.Blank values are "-". No fileds should contain NULL.Chack for columns with only "-" as value. Oly ROGIDs used in interactions will apear here
equa_score_multiple	NO
equa_score_multiple_reset	NO
eutils	Yes	This table contains sequences for deprecated protein sequences.(removed from current RefSeq, UniProt or other databases archived by Entrez. Row count in this table should not be significantly different from the previous release. The SEGUID column should be checked and should make sure the Eutil web service client has performed as expected. This is also a good point to check the "E" scores in the int_xref_mod table. Also cross check with the SEGUID table (entries here should also appear in the SEGUID table if they have a valid SEGUID)
gene_acc	YES
gene2refseq	YES
geneinfo	YES
int_category	YES
int_db	YES
int_deleted	YES
int_experiment	YES
int_generation	YES
int_name	YES
int_object	YES
int_objecttype	YES
int_participants	YES
int_proteinUIDs	YES
int_recordtype	YES
int_seguerror	YES
int_sequence	YES
int_source	YES
int_source2object	YES
int_xref	YES
int_xref_mod	YES
intacc2rig	YES
ipi2seq	YES
ipi2xref	YES
maxvals	YES
none_prots	YES
pdb	YES
pdb_mmdb	YES
pluses	YES
pmid2int	YES
pmid2rig	YES
PPI_sourceid	YES
ref_main	YES
ref_xref	YES
refseq	YES
rig2rigid	YES
rig2rog	YES
risg2risgid	YES
rog_found	YES
rog_mult	YES
rog_multiple	YES
rog_reset	YES
rog2rig	YES
rog2rogid	YES
score_multiple	YES
segu2seq	YES
seguid	YES
seguid_aded	YES
seguid_complex	YES
seguid_gbnk	YES
seguid_pdbd	YES
seguid_refs	YES
seguid_remv	YES
seguid_rest	YES
seguid_unip	YES
sha_seguid	YES
sha_seguid_redund	YES
summary_rig	YES
summary_rog	YES
summary_score	YES
tmp_orphaned_interaction_experiment	YES
tmp_orphaned_interaction_name	YES
tmp_orphaned_interaction_source	YES
tmp_orphaned_interaction_xref	YES
tmp_orphaned_interactions	YES
tmp_orphaned_interactor_name	YES
tmp_orphaned_interactor_object	YES
tmp_orphaned_interactor_xref	YES
tmp_orphaned_interactors	YES
uid2rig	YES
uid2rog	YES
uniprot_fly_acc	YES
uniprot_isoforms	YES
uniprot_main	YES
uniprot_ref	YES
uniprot_sequence	YES
uniprot_yeast_acc	YES
unique_rigids	YES
unique_rogs	YES
used_rogs	YES

Follow this link for a listing of all iRefIndex related pages (archived and current).

@@ Line 134: / Line 134: @@
 |  config                              ||NO||
 |-
-|  cy_edgeatrib                        ||YES||This table is a denormalized table with all interaction attributes. Used when making the RIGID centric TAB file. This is also used when making iRefScape data. Blank values are \\'-\\'. No fileds should contain NULL.Chack for columns with only \'-\' as value.
+|  cy_edgeatrib                        ||YES||This table is a denormalized table with all interaction attributes. Used when making the RIGID centric TAB file. This is also used when making iRefScape data. Blank values are "-". No fileds should contain NULL.Chack for columns with only "-" as value.
 |-
-|  cy_nodeatrib                        ||YES||This table is a denormalized table with all interactor attributes. Used when making the ROGID centric TAB file. This is also used when making iRefScape data.Blank values are \'-\'. No fileds should contain NULL.Chack for columns with only \'-\' as value. Oly ROGIDs used in interactions will apear here
+|  cy_nodeatrib                        ||YES||This table is a denormalized table with all interactor attributes. Used when making the ROGID centric TAB file. This is also used when making iRefScape data.Blank values are "-". No fileds should contain NULL.Chack for columns with only "-" as value. Oly ROGIDs used in interactions will apear here
 |-
 |  equa_score_multiple                 ||NO||
@@ Line 142: / Line 142: @@
 |  equa_score_multiple_reset           ||NO||
 |-
-|  eutils                              ||This table contains sequences for deprecated protein sequences.(removed from current RefSeq, UniProt or other databases archived by Entrez. Row count in this table should not be significantly different from the previous release. The SEGUID column should be checked and should make sure the Eutil web service client has performed as expected. This is also a good point to check the "E" scores in the int_xref_mod table. Also cross check with the SEGUID table (entries here should also appear in the SEGUID table if they have a valid SEGUID)  ||
+|  eutils                              ||Yes||This table contains sequences for deprecated protein sequences.(removed from current RefSeq, UniProt or other databases archived by Entrez. Row count in this table should not be significantly different from the previous release. The SEGUID column should be checked and should make sure the Eutil web service client has performed as expected. This is also a good point to check the "E" scores in the int_xref_mod table. Also cross check with the SEGUID table (entries here should also appear in the SEGUID table if they have a valid SEGUID)
 |-
 |  gene_acc                            ||YES||

Anonymous

Search

Difference between revisions of "iRefIndex Testing 7.0"

Namespaces

More

Page actions

Revision as of 11:04, 11 December 2010

Contents

Cross check with output of element counter

Program to use : biotek.uio.no.XML.Element_Counter (SaxValidator package)

Check SEGUID. Check one record each to very the process worked

Test SEGUID updating process

UID overlap testing

Check five records each from all data sources

Check the legacy data-sources

Check SQL tables

Navigation

Navigation

Internal Links

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "iRefIndex Testing 7.0"

Revision as of 11:04, 11 December 2010

Contents

Cross check with output of element counter

Program to use : biotek.uio.no.XML.Element_Counter (SaxValidator package)

Check SEGUID. Check one record each to very the process worked

Test SEGUID updating process

UID overlap testing

Check five records each from all data sources

Check the legacy data-sources

Check SQL tables

Navigation

Wiki tools

Page tools

Categories