Choosing background list in protein set overlap analysis

If it doesn't fit into any other category post it here.
Proton Member
Proton Member
Posts: 1
Joined: Wed Sep 02, 2015 6:46 am

Choosing background list in protein set overlap analysis

Postby dnand » Wed Sep 02, 2015 7:07 am

I am new to proteomics and have a question about testing for overlap between protein lists

I have 2 protein lists I would like to compare: A subset (n) of list of proteins (N) generated by me by experiment and a list of proteins from literature belonging to a specific category (R).

I would like to know whether my subset list of proteins (n) is enriched for the proteins in the list from literature compared to other subsets from my experiment. I tried to use the hypergeometric test for determining the significance of overlap between the 2sets.

I am not sure what to use as the background list. I thought of using the total proteins identified (N) in the experiment as the background, however, I realized that about 50% of the proteins in the list from literature (R) were not identified in my experiment. So obviously they would not be present in my subset list which I would like to look for overlap with the literature list.

Would it be acceptable for me to filter the literature list (R) for only those proteins that were identified in my experiment (r) and then compare my subset list (n) with the subset literature protein list (r) and use my total proteins identified as the background?

E. Coli Lysate Member
E. Coli Lysate Member
Posts: 107
Joined: Wed Dec 21, 2011 8:22 pm

Postby Infinity » Wed Oct 07, 2015 10:43 am

Hi, I'm not statistician but here is what I would do. As I understood you identified a total of N proteins from which n were "regulated" and you want to see overlap of those n and list obtained from the literature R - let's name it "k". You could use entire proteome as back ground but this will not be fare since MS are biased towards identifications of high abundant proteins. What if you randomly choose n proteins from your list of N and calculate overlap with R - this will give you overlap that would be obtained by chance (x). Repeat this step many times (~1,000) to get x1-x1000 - this would allow you to access both "average random overlap" and its variance. Then you will have to use a variation of 1-sample t-test to see if value of your real overlap k is significantly different from what would be randomly obtained.
Hope this helps.

Return to “Other”

Who is online

Users browsing this forum: No registered users and 1 guest