[Funcoup] GIN and DOM

Erik Sonnhammer Erik.Sonnhammer at sbc.su.se
Wed Oct 19 11:29:30 CEST 2011


But it was pearson coefficients of gene pairs, so what is the problem?

BTW Again, how were the 4880 defined?

/Erik

On 10/19/2011 09:57 AM, Andrey Alexeyenko wrote:
> Pls do not forget that GIN data were not available (to me) as gene
> pairs, it was just pearson coefficients, which makes it not the same as
> PPI data. If I had the original table, I'd of course had introduced that
> number into the web page, and would not raise the issue...
>
> A
>
> On 2011-10-19 09:39, Erik Sonnhammer wrote:
>> Hi, my point is that since genes are the currency of FunCoup, it makes
>> sense to convert all other currencies to that currency.  Surely we are
>> not making models of domain pairs, but of gene pairs.
>>
>> BTW, the PPI nr seems to be converted from actual datapoints to gene
>> pairs.  I don't think there are>500000 datapoints for human PPI.
>>
>> Ideally we should present both the 'raw' nr of datapoints and the nr of
>> gene pairs they corresponds to for each datatype.
>>
>> /Erik
>>
>> On 10/18/2011 07:44 PM, Thomas Schmitt wrote:
>>> I haven't followed the whole conversation, but in my opinion
>>> both GIN and DOM fall into to the same category of pairwise
>>> data as PPI. (Although PPI isn't really pairwise because interactions are defined for more then two genes)
>>> I think we should therefore report for both the number
>>> of unique pairs. Namely for GIN the number of gene pairs
>>> and for DOM the number of domain pairs because thats what the data is about.
>>>
>>> /Thomas
>>>
>>> On Oct 18, 2011, at 4:23 PM, Erik Sonnhammer wrote:
>>>
>>>> GIN: Again, how were the 4880 defined?  Surely we can't just pretend
>>>> some part of the input wasn't there just because their LLRs were low.
>>>> Look at MIR, those numbers are even much higher.  I think we should
>>>> write 225178 for GIN.
>>>>
>>>> DOM: I could go for Ngenes for DOM as well I guess.  But Ndomains is
>>>> like Nlocations, pretty meaningless.
>>>>
>>>> /Erik
>>>>
>>>> On 10/18/2011 04:04 PM, Andrey Alexeyenko wrote:
>>>>>> GIN: What is 4880 then and why did you not write 255178?
>>>>> I could agree, 225178 might look informative...
>>>>> But as we know it includes mostly pairs that do not deserve LLR>0.5, it
>>>>> is misleadin 0 if one keeps PPI data in mind.
>>>>>
>>>>>>
>>>>>> DOM: Still don't get it. Maybe a toy example will help. If domains A and
>>>>>> B interact, 5 genes have A and 5 have B, then there are gene 25 pairs.
>>>>>> Why is this impossible to compute?
>>>>> It is possible, I agree, but it is the same as squaring N genes on a MA
>>>>> - while for those we provide just Nconditions (or Ngenes for SCL).
>>>>>
>>>>> A
>>>>>
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>> On 10/18/2011 03:20 PM, Andrey Alexeyenko wrote:
>>>>>>> On 2011-10-18 15:07, Erik Sonnhammer wrote:
>>>>>>>> GIN: What was the lowest PLC (i.e. the cutoff)? Note that in total,
>>>>>>>> Sanjit had 255.178 gene pairs with correlations. AFAICR he used all of
>>>>>>>> them.
>>>>>>> The whole table contains 255178 pairs. Min=0.1
>>>>>>> At min=0.2 and 0.3 we get 18210 and 4184 pairs, respectively.
>>>>>>> Bin borders (PPI, yeast) are
>>>>>>>
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>>> upper -0.1075
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>>> upper 0.1385
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>>> upper 0.1905
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>>> upper 0.257
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>>> upper 0.3905
>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>>> upper 1000.835
>>>>>>>
>>>>>>> and LLRs, respectively:
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>>> -0.915041861965188
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>>> -0.0173913346260002
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>>> 0.645635223480877
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>>> 1.69915508802254
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>>> 2.78104091819998
>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>>> 4.60004121876386
>>>>>>>
>>>>>>>>
>>>>>>>> Incidentally, the originator of the data, Charlie Boone, is talking
>>>>>>>> here
>>>>>>>> tomorrow at 10. Seems they used a cutoff of PLC>0.2, but I can't find
>>>>>>>> how many links that amounts to.
>>>>>>>>
>>>>>>>> DOM: "how do we define a set of input genes?" Isn't this simply the nr
>>>>>>>> of genes containing interacting domain pairs? At the very least we
>>>>>>>> should say what versions of Pfam and UniDomInt were used, but I don't
>>>>>>>> really see why it is so impossible to convert to gene pair counts.
>>>>>>>
>>>>>>> Because while we can count such genes, we cannot anticipate _PAIRS_
>>>>>>> where both genes have mutually interacting domains. This requires
>>>>>>> re-running everything. I think the plain Pfam domain list is more
>>>>>>> informative.
>>>>>>>
>>>>>>> A
>>>>>>>
>>>>>>>>
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>> On 10/18/2011 01:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>>> GIN: I just realised that this data type is not described in the
>>>>>>>>>> Methods
>>>>>>>>>> section, which it should be as it is new. Could you please provide a
>>>>>>>>>> section? I'm surprised it's only 4880 interactions - above what
>>>>>>>>>> cutoff
>>>>>>>>>> was that?
>>>>>>>>> It's header is under
>>>>>>>>> "***: distinct genes/domains", i.e. I just counted how many unique
>>>>>>>>> genes occurred in the Pearson table.
>>>>>>>>> This is also the reason that I do not know what to say about preparing
>>>>>>>>> the table and hence - cannot describe it in the Methods. I used just
>>>>>>>>> the
>>>>>>>>> Pearson linear correlation values in the standard form.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> DOM: I'm very unhappy with this just saying 3563 for all species -
>>>>>>>>>> this
>>>>>>>>>> is almost meaningless. Are you saying that the mapping to genes was
>>>>>>>>>> only done on the fly? Could I perhaps ask Dimitri to try to extract
>>>>>>>>>> the
>>>>>>>>>> actual gene pairs numbers?
>>>>>>>>> That would be the hard part: how do we define a set of input genes?..
>>>>>>>>> And I do not think it is crucial. Indeed, the pivotal dataset is
>>>>>>>>> UniDomInt: as soon as we have an extra gene with Pfam domains in
>>>>>>>>> it, we
>>>>>>>>> can check it in FunCoup with this data. For comparison, in MEX, PEX or
>>>>>>>>> PPI the gene IDs are pivotal, that's why we count them in the table.
>>>>>>>>>
>>>>>>>>>> I also see that we don't describe the UniDomInt usage in the Methods
>>>>>>>>>> section - do we need to? Was some cutoff or other parameter used?
>>>>>>>>> I think I just employed the old procedure developed for those old
>>>>>>>>> Rhodes
>>>>>>>>> data. And I ignored lines that had UniDomInt score 0.
>>>>>>>>>
>>>>>>>>> A
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, could I please ask everybody to go through the paper looking
>>>>>>>>>> for
>>>>>>>>>> omissions, unclear parts, and other bugs.
>>>>>>>>>>
>>>>>>>>>> /Erik
>>>>>>>>>>
>>>>>>>>>> On 10/18/2011 12:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>>>> http://funcoup.sbc.su.se/statistics_2.0.html fixed.
>>>>>>>>>>>
>>>>>>>>>>> BUT differently (see the page), as it was (close to) impossible to
>>>>>>>>>>> calculate the exact numbers:
>>>>>>>>>>>
>>>>>>>>>>> - in GIN: due to absence (at me) of the original pairwise file;
>>>>>>>>>>>
>>>>>>>>>>> - in DOM: because we store just the domain pairs, and answering
>>>>>>>>>>> exactly
>>>>>>>>>>> would take re0running the whole thing in the debug mode and
>>>>>>>>>>> looking at
>>>>>>>>>>> variable values...
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>> On 2011-09-28 12:02, Erik Sonnhammer wrote:
>>>>>>>>>>>> Great
>>>>>>>>>>>>
>>>>>>>>>>>> I guess you mean GIN. How about simply the nr of interactions
>>>>>>>>>>>> (above the
>>>>>>>>>>>> cutoff whatever it was)?
>>>>>>>>>>>>
>>>>>>>>>>>> for DOM there should be a nr of interactions for each species while
>>>>>>>>>>>> GIN
>>>>>>>>>>>> is only in yeast.
>>>>>>>>>>>>
>>>>>>>>>>>> /Erik
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/28/2011 11:56 AM, Andrey Alexeyenko wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated statistics_2.0.html,
>>>>>>>>>>>>> except the columns DOM and INT where I do not know what to count.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2011-09-27 11:42, Erik Sonnhammer wrote:
>>>>>>>>>>>>>> Here is a list:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Put 2.0 on home page
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Update release notes (text file fine) with Input dataset sizes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Erik's desktop (ubuntu), under Edge Catetories, Species, “fly”
>>>>>>>>>>>>>> becomes “...”
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have KEGG pathway memberships and subcellular localisations been
>>>>>>>>>>>>>> updated?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why does not fly FBpp0289426 (NBS) align with its ortholog human
>>>>>>>>>>>>>> NBN?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Option to turn on debugging info
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And some suggestions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Change to “(out of # at pfc>0.1):” under 'Network
>>>>>>>>>>>>>> edges'.>0.25,>0.5,
>>>>>>>>>>>>>> 0.75 is a bit too course anyway and may not match the query.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Update example queries(?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Add “maximum pfc” cutoff to the query – to look for novel links.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fewer areas on the webpage. Similar options should be grouped in
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> area instead. A few areas with clear headers about what they
>>>>>>>>>>>>>> contain is
>>>>>>>>>>>>>> preferable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /Erik
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09/27/2011 11:03 AM, Thomas Schmitt wrote:
>>>>>>>>>>>>>>> Awesome! What are the issues with the website that you want to
>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>> fixed?
>>>>>>>>>>>>>>> Is this something that we should do asap?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Funcoup mailing list
>>>>>>>>>> Funcoup at sbc.su.se
>>>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>>> _______________________________________________
>>>>>>>>> Funcoup mailing list
>>>>>>>>> Funcoup at sbc.su.se
>>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>>
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Funcoup mailing list
>>>> Funcoup at sbc.su.se
>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>
>>> _______________________________________________
>>> Funcoup mailing list
>>> Funcoup at sbc.su.se
>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>
>> _______________________________________________
>> Funcoup mailing list
>> Funcoup at sbc.su.se
>> https://mail.sbc.su.se/mailman/listinfo/funcoup
> _______________________________________________
> Funcoup mailing list
> Funcoup at sbc.su.se
> https://mail.sbc.su.se/mailman/listinfo/funcoup



More information about the Funcoup mailing list