[Funcoup] GIN and DOM

Erik Sonnhammer Erik.Sonnhammer at sbc.su.se
Wed Oct 19 11:43:12 CEST 2011


This is correct.  Good, we have resolved GIN.

What about DOM?

/Erik

On 10/19/2011 11:36 AM, Andrey Alexeyenko wrote:
>> But it was pearson coefficients of gene pairs, so what is the problem?
> As I always understood it, the correlations could be (and were)
> calculated between all genes tested in SGA, irrespective of being tested
> directly against each other. Is it wrong? Then I simply put the 255178
> number.
>
>>
>> BTW Again, how were the 4880 defined?
> Just the number of distinct genes in the 255178-table.
>
> A
>
>>
>> /Erik
>>
>> On 10/19/2011 09:57 AM, Andrey Alexeyenko wrote:
>>> Pls do not forget that GIN data were not available (to me) as gene
>>> pairs, it was just pearson coefficients, which makes it not the same as
>>> PPI data. If I had the original table, I'd of course had introduced that
>>> number into the web page, and would not raise the issue...
>>>
>>> A
>>>
>>> On 2011-10-19 09:39, Erik Sonnhammer wrote:
>>>> Hi, my point is that since genes are the currency of FunCoup, it makes
>>>> sense to convert all other currencies to that currency.  Surely we are
>>>> not making models of domain pairs, but of gene pairs.
>>>>
>>>> BTW, the PPI nr seems to be converted from actual datapoints to gene
>>>> pairs.  I don't think there are>500000 datapoints for human PPI.
>>>>
>>>> Ideally we should present both the 'raw' nr of datapoints and the nr of
>>>> gene pairs they corresponds to for each datatype.
>>>>
>>>> /Erik
>>>>
>>>> On 10/18/2011 07:44 PM, Thomas Schmitt wrote:
>>>>> I haven't followed the whole conversation, but in my opinion
>>>>> both GIN and DOM fall into to the same category of pairwise
>>>>> data as PPI. (Although PPI isn't really pairwise because interactions are defined for more then two genes)
>>>>> I think we should therefore report for both the number
>>>>> of unique pairs. Namely for GIN the number of gene pairs
>>>>> and for DOM the number of domain pairs because thats what the data is about.
>>>>>
>>>>> /Thomas
>>>>>
>>>>> On Oct 18, 2011, at 4:23 PM, Erik Sonnhammer wrote:
>>>>>
>>>>>> GIN: Again, how were the 4880 defined?  Surely we can't just pretend
>>>>>> some part of the input wasn't there just because their LLRs were low.
>>>>>> Look at MIR, those numbers are even much higher.  I think we should
>>>>>> write 225178 for GIN.
>>>>>>
>>>>>> DOM: I could go for Ngenes for DOM as well I guess.  But Ndomains is
>>>>>> like Nlocations, pretty meaningless.
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>> On 10/18/2011 04:04 PM, Andrey Alexeyenko wrote:
>>>>>>>> GIN: What is 4880 then and why did you not write 255178?
>>>>>>> I could agree, 225178 might look informative...
>>>>>>> But as we know it includes mostly pairs that do not deserve LLR>0.5, it
>>>>>>> is misleadin 0 if one keeps PPI data in mind.
>>>>>>>
>>>>>>>>
>>>>>>>> DOM: Still don't get it. Maybe a toy example will help. If domains A and
>>>>>>>> B interact, 5 genes have A and 5 have B, then there are gene 25 pairs.
>>>>>>>> Why is this impossible to compute?
>>>>>>> It is possible, I agree, but it is the same as squaring N genes on a MA
>>>>>>> - while for those we provide just Nconditions (or Ngenes for SCL).
>>>>>>>
>>>>>>> A
>>>>>>>
>>>>>>>>
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>> On 10/18/2011 03:20 PM, Andrey Alexeyenko wrote:
>>>>>>>>> On 2011-10-18 15:07, Erik Sonnhammer wrote:
>>>>>>>>>> GIN: What was the lowest PLC (i.e. the cutoff)? Note that in total,
>>>>>>>>>> Sanjit had 255.178 gene pairs with correlations. AFAICR he used all of
>>>>>>>>>> them.
>>>>>>>>> The whole table contains 255178 pairs. Min=0.1
>>>>>>>>> At min=0.2 and 0.3 we get 18210 and 4184 pairs, respectively.
>>>>>>>>> Bin borders (PPI, yeast) are
>>>>>>>>>
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>>>>> upper -0.1075
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>>>>> upper 0.1385
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>>>>> upper 0.1905
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>>>>> upper 0.257
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>>>>> upper 0.3905
>>>>>>>>> data bin sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>>>>> upper 1000.835
>>>>>>>>>
>>>>>>>>> and LLRs, respectively:
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 1
>>>>>>>>> -0.915041861965188
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 2
>>>>>>>>> -0.0173913346260002
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 3
>>>>>>>>> 0.645635223480877
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 4
>>>>>>>>> 1.69915508802254
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 5
>>>>>>>>> 2.78104091819998
>>>>>>>>> prob ll sce__geni_(simple_sgadata_correlations_100308.txt)_ ppi_mt 6
>>>>>>>>> 4.60004121876386
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Incidentally, the originator of the data, Charlie Boone, is talking
>>>>>>>>>> here
>>>>>>>>>> tomorrow at 10. Seems they used a cutoff of PLC>0.2, but I can't find
>>>>>>>>>> how many links that amounts to.
>>>>>>>>>>
>>>>>>>>>> DOM: "how do we define a set of input genes?" Isn't this simply the nr
>>>>>>>>>> of genes containing interacting domain pairs? At the very least we
>>>>>>>>>> should say what versions of Pfam and UniDomInt were used, but I don't
>>>>>>>>>> really see why it is so impossible to convert to gene pair counts.
>>>>>>>>>
>>>>>>>>> Because while we can count such genes, we cannot anticipate _PAIRS_
>>>>>>>>> where both genes have mutually interacting domains. This requires
>>>>>>>>> re-running everything. I think the plain Pfam domain list is more
>>>>>>>>> informative.
>>>>>>>>>
>>>>>>>>> A
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /Erik
>>>>>>>>>>
>>>>>>>>>> On 10/18/2011 01:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>>>>> GIN: I just realised that this data type is not described in the
>>>>>>>>>>>> Methods
>>>>>>>>>>>> section, which it should be as it is new. Could you please provide a
>>>>>>>>>>>> section? I'm surprised it's only 4880 interactions - above what
>>>>>>>>>>>> cutoff
>>>>>>>>>>>> was that?
>>>>>>>>>>> It's header is under
>>>>>>>>>>> "***: distinct genes/domains", i.e. I just counted how many unique
>>>>>>>>>>> genes occurred in the Pearson table.
>>>>>>>>>>> This is also the reason that I do not know what to say about preparing
>>>>>>>>>>> the table and hence - cannot describe it in the Methods. I used just
>>>>>>>>>>> the
>>>>>>>>>>> Pearson linear correlation values in the standard form.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> DOM: I'm very unhappy with this just saying 3563 for all species -
>>>>>>>>>>>> this
>>>>>>>>>>>> is almost meaningless. Are you saying that the mapping to genes was
>>>>>>>>>>>> only done on the fly? Could I perhaps ask Dimitri to try to extract
>>>>>>>>>>>> the
>>>>>>>>>>>> actual gene pairs numbers?
>>>>>>>>>>> That would be the hard part: how do we define a set of input genes?..
>>>>>>>>>>> And I do not think it is crucial. Indeed, the pivotal dataset is
>>>>>>>>>>> UniDomInt: as soon as we have an extra gene with Pfam domains in
>>>>>>>>>>> it, we
>>>>>>>>>>> can check it in FunCoup with this data. For comparison, in MEX, PEX or
>>>>>>>>>>> PPI the gene IDs are pivotal, that's why we count them in the table.
>>>>>>>>>>>
>>>>>>>>>>>> I also see that we don't describe the UniDomInt usage in the Methods
>>>>>>>>>>>> section - do we need to? Was some cutoff or other parameter used?
>>>>>>>>>>> I think I just employed the old procedure developed for those old
>>>>>>>>>>> Rhodes
>>>>>>>>>>> data. And I ignored lines that had UniDomInt score 0.
>>>>>>>>>>>
>>>>>>>>>>> A
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, could I please ask everybody to go through the paper looking
>>>>>>>>>>>> for
>>>>>>>>>>>> omissions, unclear parts, and other bugs.
>>>>>>>>>>>>
>>>>>>>>>>>> /Erik
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/2011 12:33 PM, Andrey Alexeyenko wrote:
>>>>>>>>>>>>> http://funcoup.sbc.su.se/statistics_2.0.html fixed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> BUT differently (see the page), as it was (close to) impossible to
>>>>>>>>>>>>> calculate the exact numbers:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - in GIN: due to absence (at me) of the original pairwise file;
>>>>>>>>>>>>>
>>>>>>>>>>>>> - in DOM: because we store just the domain pairs, and answering
>>>>>>>>>>>>> exactly
>>>>>>>>>>>>> would take re0running the whole thing in the debug mode and
>>>>>>>>>>>>> looking at
>>>>>>>>>>>>> variable values...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2011-09-28 12:02, Erik Sonnhammer wrote:
>>>>>>>>>>>>>> Great
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I guess you mean GIN. How about simply the nr of interactions
>>>>>>>>>>>>>> (above the
>>>>>>>>>>>>>> cutoff whatever it was)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for DOM there should be a nr of interactions for each species while
>>>>>>>>>>>>>> GIN
>>>>>>>>>>>>>> is only in yeast.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /Erik
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09/28/2011 11:56 AM, Andrey Alexeyenko wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I updated statistics_2.0.html,
>>>>>>>>>>>>>>> except the columns DOM and INT where I do not know what to count.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2011-09-27 11:42, Erik Sonnhammer wrote:
>>>>>>>>>>>>>>>> Here is a list:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Put 2.0 on home page
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Update release notes (text file fine) with Input dataset sizes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Erik's desktop (ubuntu), under Edge Catetories, Species, “fly”
>>>>>>>>>>>>>>>> becomes “...”
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have KEGG pathway memberships and subcellular localisations been
>>>>>>>>>>>>>>>> updated?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why does not fly FBpp0289426 (NBS) align with its ortholog human
>>>>>>>>>>>>>>>> NBN?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Option to turn on debugging info
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And some suggestions:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Change to “(out of # at pfc>0.1):” under 'Network
>>>>>>>>>>>>>>>> edges'.>0.25,>0.5,
>>>>>>>>>>>>>>>> 0.75 is a bit too course anyway and may not match the query.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Update example queries(?)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Add “maximum pfc” cutoff to the query – to look for novel links.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fewer areas on the webpage. Similar options should be grouped in
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> area instead. A few areas with clear headers about what they
>>>>>>>>>>>>>>>> contain is
>>>>>>>>>>>>>>>> preferable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Erik
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09/27/2011 11:03 AM, Thomas Schmitt wrote:
>>>>>>>>>>>>>>>>> Awesome! What are the issues with the website that you want to
>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>> fixed?
>>>>>>>>>>>>>>>>> Is this something that we should do asap?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Funcoup mailing list
>>>>>>>>>>>> Funcoup at sbc.su.se
>>>>>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Funcoup mailing list
>>>>>>>>>>> Funcoup at sbc.su.se
>>>>>>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Funcoup mailing list
>>>>>> Funcoup at sbc.su.se
>>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>>
>>>>> _______________________________________________
>>>>> Funcoup mailing list
>>>>> Funcoup at sbc.su.se
>>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>>>
>>>> _______________________________________________
>>>> Funcoup mailing list
>>>> Funcoup at sbc.su.se
>>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>> _______________________________________________
>>> Funcoup mailing list
>>> Funcoup at sbc.su.se
>>> https://mail.sbc.su.se/mailman/listinfo/funcoup
>>
>> _______________________________________________
>> Funcoup mailing list
>> Funcoup at sbc.su.se
>> https://mail.sbc.su.se/mailman/listinfo/funcoup
> _______________________________________________
> Funcoup mailing list
> Funcoup at sbc.su.se
> https://mail.sbc.su.se/mailman/listinfo/funcoup



More information about the Funcoup mailing list