On this page I will share some of the results and tools I have developed for the analysis of gene expression data. I have developed the software Statistical Analysis of the GeneChip, which is an R package. Note that the package has moved from CRAN to Bioconductor. The latest Bioconducor release of SAGx is here. For those who want to take advantage of the latest changes the version in development branch is recommended.
Any comments are appreciated; just send an e-mail to me. The ideas behind the samroc function in SAGx are explained in this document. A simulation script for testing statistical methods for identifying differentially expressed genes is also provided. Additionally, the function pava.fdr, which calculates an estimate of FDR using isotonic regression, is explained in this article, see also the deposited manuscript.
Comments on Statistical methods for ranking differentially expressed genes
In the article a goodness criterion C for a ranking is introduced, see the reference. This criterion was chosen because it is increasing in the false positive and false negative rates, and, based on a small set of simulations, it turned out to be easier to estimate than e.g. the sum of the false positive and false negative rates. However, the above-mentioned software outputs this sum. This makes it possible for the analyst to choose the size of the top list such that this sum reaches its minimum. Rank the genes with respect to the p-value and then choose a cut-off where ‘error’ reaches its minimum. Typically, the sum decreases initially as one goes down the ranked gene list, and then starts to grow until it reaches p0.
Comments on A comparative review of estimates of the proportion unchanged genes and the false discovery rate
The Averaging Theorem implies that any estimate of LFDR implicitly defines an estimate of FDR. Therefore I calculated SEP.FDR by integration (or summing as it were) of the SEP LFDR estimate. Note that the function twilight presents an FDR estimate based on qvalue. However, I present SEP.FDR not qvalue under the SEP heading, since I want to keep issues separate so one can reach conclusions regarding the ideas.
To retrieve my PubMed entries from the last ten years click here.