A 2 by 2 table is constructed containing the two cluster memberships (within a certain cluster / out of the cluster) as row variables and pathway memberships (within the pathway / out of the pathway) as column variables. Since normal approximation is inappropriate in the pathway case (i.e. the contingency table often contains a cell with expected values less than 5), Fisher¡¯s exact test is performed instead of the chi-square test. The p-value is defined as the sum of the probabilities of all tables whose probabilities are less than those of the observed table. To deal with the multiple testing problem, q-value is calculated following Storey's scheme. Whereas the p-value is a measure of significance in terms of the false positive rate, the q-value is a measure in terms of the false discovery rate(or FDR). FDR, here, is the expected proportion of false positive results among all rejected hypotheses multiplied by the probability of making at least one rejection. Firstly, the proportion of truly null genes is calculated from the given list of p-values. Secondly, the overall FDR is calculated as the expectation of the number of the false positive divided by the expectation of the number of the significant. Lastly, the q-value is extracted as the minimum of the FDR having the p-values less than threshold.
Tibshirani, R. (2003). Statistical significance for genome-wide Studies.
Storey,J.D., Tibshirani, R. (2003). Statistical significance for genome-wide Studies.PNAS, vol. 100, no. 16, 9440-9445
Assuming that null p are uniformly distributed in the density plot of p-values, the proportion of truly null hypotheses (=¥ğ0) can be ¡®conservatively¡¯ estimated as the height of flat proportion of p exceeding a certain threshold value ¥ë(16). Since most p values near 1 will be null,
where m is the total number of hypotheses
will be an unbiased estimator of ¥ğ0. To achieve the smallest bias,
is used. Graphically, after the natural spline cube is fitted to the plot of ¥ğ0
(¥ë) vs. ¥ë,
the limiting plateau value is selected. Then FDR is calculated as the number of
false positive hypotheses divided by the number of significant hypotheses using
the ¥ğ0 (¥ë). Finally,
the q value for ith hypothesis is
defined as the minimum FDR as
When the p value is the minimum possible false positive rate, the q value is the minimum possible false discovery rate. Detailed algorithm can be found in Storey et al.