[Funcoup] GIN and DOM

Andrey Alexeyenko andrej.alekseenko at scilifelab.se
Wed Oct 19 09:57:22 CEST 2011


Pls do not forget that GIN data were not available (to me) as gene 
pairs, it was just pearson coefficients, which makes it not the same as 
PPI data. If I had the original table, I'd of course had introduced that 
number into the web page, and would not raise the issue...

A

On 2011-10-19 09:39, Erik Sonnhammer wrote:
> Hi, my point is that since genes are the currency of FunCoup, it makes
> sense to convert all other currencies to that currency.  Surely we are
> not making models of domain pairs, but of gene pairs.
>
> BTW, the PPI nr seems to be converted from actual datapoints to gene
> pairs.  I don't think there are>500000 datapoints for human PPI.
>
> Ideally we should present both the 'raw' nr of datapoints and the nr of
> gene pairs they corresponds to for each datatype.
>
> /Erik
>
> On 10/18/2011 07:44 PM, Thomas Schmitt wrote:
>> I haven't followed the whole conversation, but in my opinion
>> both GIN and DOM fall into to the same category of pairwise
>> data as PPI. (Although PPI isn't really pairwise because interactions are defined for more then two genes)
>> I think we should therefore report for both the number
>> of unique pairs. Namely for GIN the number of gene pairs
>> and for DOM the number of domain pairs because thats what the data is about.
>>
>> /Thomas
>>
>> On Oct 18, 2011, at 4:23 PM, Erik Sonnhammer wrote:
>>
>>> GIN: Again, how were the 4880 defined?  Surely we can't just pretend
>>> some part of the input wasn't there just because their LLRs were low.
>>> Look at MIR, those numbers are even much higher.  I think we should
>>> write 225178 for GIN.
>>>
>>> DOM: I could go for Ngenes for DOM as well I guess.  But Ndomains is
>>> like Nlocations, pretty meaningless.
>>>
>>> /Erik
>>>
>>> On 10/18/2011 04:04 PM, Andrey Alexeyenko wrote:
>>>>> GIN: What is 4880 then and why did you not write 255178?
>>>> I could agree, 225178 might look informative...
>>>> But as we know it includes mostly pairs that do not deserve LLR>0.5, it
>>>> is misleadin 0 if one keeps PPI data in mind.
>>>>
>>>>>
>>>>> DOM: Still don't get it. Maybe a toy example will help. If domains A and
>>>>> B interact, 5 genes have A and 5 have B, then there are gene 25 pairs.
>>>>> Why is this impossible to compute?
>>>> It is possible, I agree, but it is the same as squaring N genes on a MA
>>>> - while for those we provide just Nconditions (or Ngenes for SCL).
>>>>
>>>> A
>>>>
>>>>>
>>>>> /Erik
>>>>>
>>>>> On 10/18/2011 03:20 PM, Andrey Alexeyenko wrote:
>>>>>> On 2011-10-18 15:07, Erik Sonnhammer wrote:
>>>>>>> GIN: What was the lowest PLC (i.e. the cutoff)? Note that in total,
>>>>>>> Sanjit had 255.178 gene pairs with correlations. AFAICR he used all of
>>>>>>> them.
>>>>>> The whole table contains 255178 pairs. Min=0.1
>>>>>> At min=0.2 and 0.3 we get 18210 and 4184 pairs, respectively.
>>>>>> Bin borders (PPI, yeast) are
>>>>>>
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>> upper -0.1075
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>> upper 0.1385
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>> upper 0.1905
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>> upper 0.257
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>> upper 0.3905
>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>> upper 1000.835
>>>>>>
>>>>>> and LLRs, respectively:
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>> -0.915041861965188
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>> -0.0173913346260002
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>> 0.645635223480877
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>> 1.69915508802254
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>> 2.78104091819998
>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>> 4.60004121876386
>>>>>>
>>>>>>>
>>>>>>> Incidentally, the originator of the data, Charlie Boone, is talking
>>>>>>> here
>>>>>>> tomorrow at 10. Seems they used a cutoff of PLC>0.2, but I can't find
>>>>>>> how many links that amounts to.
>>>>>>>
>>>>>>> DOM: "how do we define a set of input genes?" Isn't this simply the nr
>>>>>>> of genes containing interacting domain pairs? At the very least we
>>>>>>> should say what versions of Pfam and UniDomInt were used, but I don't
>>>>>>> really see why it is so impossible to convert to gene pair counts.
>>>>>>
>>>>>> Because while we can count such genes, we cannot anticipate _PAIRS_
>>>>>> where both genes have mutually interacting domains. This requires
>>>>>> re-running everything. I think the plain Pfam domain list is more
>>>>>> informative.
>>>>>>
>>>>>> A
>>>>>>
>>>>>>>
>>>>>>> /Erik
>>>>>>>
>>>>>>> On 10/18/2011 01:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>> GIN: I just realised that this data type is not described in the
>>>>>>>>> Methods
>>>>>>>>> section, which it should be as it is new. Could you please provide a
>>>>>>>>> section? I'm surprised it's only 4880 interactions - above what
>>>>>>>>> cutoff
>>>>>>>>> was that?
>>>>>>>> It's header is under
>>>>>>>> "***: distinct genes/domains", i.e. I just counted how many unique
>>>>>>>> genes occurred in the Pearson table.
>>>>>>>> This is also the reason that I do not know what to say about preparing
>>>>>>>> the table and hence - cannot describe it in the Methods. I used just
>>>>>>>> the
>>>>>>>> Pearson linear correlation values in the standard form.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOM: I'm very unhappy with this just saying 3563 for all species -
>>>>>>>>> this
>>>>>>>>> is almost meaningless. Are you saying that the mapping to genes was
>>>>>>>>> only done on the fly? Could I perhaps ask Dimitri to try to extract
>>>>>>>>> the
>>>>>>>>> actual gene pairs numbers?
>>>>>>>> That would be the hard part: how do we define a set of input genes?..
>>>>>>>> And I do not think it is crucial. Indeed, the pivotal dataset is
>>>>>>>> UniDomInt: as soon as we have an extra gene with Pfam domains in
>>>>>>>> it, we
>>>>>>>> can check it in FunCoup with this data. For comparison, in MEX, PEX or
>>>>>>>> PPI the gene IDs are pivotal, that's why we count them in the table.
>>>>>>>>
>>>>>>>>> I also see that we don't describe the UniDomInt usage in the Methods
>>>>>>>>> section - do we need to? Was some cutoff or other parameter used?
>>>>>>>> I think I just employed the old procedure developed for those old
>>>>>>>> Rhodes
>>>>>>>> data. And I ignored lines that had UniDomInt score 0.
>>>>>>>>
>>>>>>>> A
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, could I please ask everybody to go through the paper looking
>>>>>>>>> for
>>>>>>>>> omissions, unclear parts, and other bugs.
>>>>>>>>>
>>>>>>>>> /Erik
>>>>>>>>>
>>>>>>>>> On 10/18/2011 12:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>>> http://funcoup.sbc.su.se/statistics_2.0.html fixed.
>>>>>>>>>>
>>>>>>>>>> BUT differently (see the page), as it was (close to) impossible to
>>>>>>>>>> calculate the exact numbers:
>>>>>>>>>>
>>>>>>>>>> - in GIN: due to absence (at me) of the original pairwise file;
>>>>>>>>>>
>>>>>>>>>> - in DOM: because we store just the domain pairs, and answering
>>>>>>>>>> exactly
>>>>>>>>>> would take re0running the whole thing in the debug mode and
>>>>>>>>>> looking at
>>>>>>>>>> variable values...
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>> On 2011-09-28 12:02, Erik Sonnhammer wrote:
>>>>>>>>>>> Great
>>>>>>>>>>>
>>>>>>>>>>> I guess you mean GIN. How about simply the nr of interactions
>>>>>>>>>>> (above the
>>>>>>>>>>> cutoff whatever it was)?
>>>>>>>>>>>
>>>>>>>>>>> for DOM there should be a nr of interactions for each species while
>>>>>>>>>>> GIN
>>>>>>>>>>> is only in yeast.
>>>>>>>>>>>
>>>>>>>>>>> /Erik
>>>>>>>>>>>
>>>>>>>>>>> On 09/28/2011 11:56 AM, Andrey Alexeyenko wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I updated statistics_2.0.html,
>>>>>>>>>>>> except the columns DOM and INT where I do not know what to count.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>> On 2011-09-27 11:42, Erik Sonnhammer wrote:
>>>>>>>>>>>>> Here is a list:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Put 2.0 on home page
>>>>>>>>>>>>>
>>>>>>>>>>>>> Update release notes (text file fine) with Input dataset sizes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Erik's desktop (ubuntu), under Edge Catetories, Species, “fly”
>>>>>>>>>>>>> becomes “...”
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have KEGG pathway memberships and subcellular localisations been
>>>>>>>>>>>>> updated?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why does not fly FBpp0289426 (NBS) align with its ortholog human
>>>>>>>>>>>>> NBN?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Option to turn on debugging info
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And some suggestions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Change to “(out of # at pfc>0.1):” under 'Network
>>>>>>>>>>>>> edges'.>0.25,>0.5,
>>>>>>>>>>>>> 0.75 is a bit too course anyway and may not match the query.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Update example queries(?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Add “maximum pfc” cutoff to the query – to look for novel links.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fewer areas on the webpage. Similar options should be grouped in
>>>>>>>>>>>>> one
>>>>>>>>>>>>> area instead. A few areas with clear headers about what they
>>>>>>>>>>>>> contain is
>>>>>>>>>>>>> preferable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /Erik
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/27/2011 11:03 AM, Thomas Schmitt wrote:
>>>>>>>>>>>>>> Awesome! What are the issues with the website that you want to
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>> fixed?
>>>>>>>>>>>>>> Is this something that we should do asap?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Funcoup mailing list
>>>>>>>>> Funcoup at sbc.su.se
>>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>> _______________________________________________
>>>>>>>> Funcoup mailing list
>>>>>>>> Funcoup at sbc.su.se
>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Funcoup mailing list
>>> Funcoup at sbc.su.se
>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>
>> _______________________________________________
>> Funcoup mailing list
>> Funcoup at sbc.su.se
>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>
> _______________________________________________
> Funcoup mailing list
> Funcoup at sbc.su.se
> https://mail.sbc.su.se/mailman/listinfo/funcoup


More information about the Funcoup mailing list