Affymetrix raw data is considerably denser than spotted array data, enabling a deeper investigation. We present several types of plots here showing different aspects of bias. The first plot (figure upper left) shows how the raw intensity data would look, if we could present brightness on a logarithmic scale. This plot brings out the detail in the low range (intensity values typically between 50 & 150); these values would be compressed into the same color on a linear scale, but can be differentiated in this plot; typically this low range contains more than half the probes on a chip. Such a plot often shows striations where probes of similar sequence are constructed in rows.
Even more than with spotted array ratios, it is difficult to see subtle spatial patterns on an Affymetrix chip image because neighboring probes show such a wide range of different intensities next to one another. To make biases visible we would ideally like to compare individual slides to a standard, which represents a good uniform hybridization. Ideally we would like to have many replicate slides and to use their average for this standard. In practice we don't have that many replicates, and we construct a reference for the Affymetrix chips (hereafter called the 'standard' chip), by taking the 20%-trimmed mean of each probe across all the chips from the same tissue in the experimental series. This ideally represents the probe intensities for a 'typical' sample in the experimental series - a sample of the same tissue type with expression values intermediate among all the samples in the experiment. We then plot the differences between log values on each chip and the standard chip:
(2) di,j = log2( Inti,j ) - trim( log2( Inti,k) ),
where i indexes the probe, j indexes the chip, and k indexes all chips; Inti,j is the intensity of probe i on chip j. A plot of dij is shown at upper right.
Using Affymetrix' greater density of probes we may investigate in more detail in what ways the
ratios in one region differ from those in another. If we plot each probe in a sub-region against
the corresponding standard probe values, we would hope to get a straight line, with intercept near 0,
and slope near 1. A change in the intercept would represent a change in local background, and the
slope would represent a local scale factor. In practice we estimate differences between the chip
and the standard on a log2 scale. To estimate the local background of a region we select first
those probes lying in the lowest one-fifth of the range of probe intensities for the chip as a
whole. Then we compute the 20%-trimmed mean of differences between the log2-intensities on the chip,
of the selected probes, and the log2-intensities of the corresponding probes on the standard:
(3) Pj = trim( log2( Inti,k)
(4) bg = trim(log2( Inti,j ) -Pj) | Pj < qP,0.2).
Here trim(x|S) represents the 20%-trimmed mean of the variable x restricted to the set S, and Inti,j represents the intensity of probe j on chip i, Pj represents probe j on the standard, and qP,a represents the a-th quantile of probe intensities on the standard chip.
To compute the scale factor we compute the 20%-trimmed mean among the upper 20% of probes in the region:
(5) S = trim(log2( Inti,j ) - log2( Pj) | Pj > qP,0.8).
We then construct heat maps of the log2 background factor and log2 scale factor over all the regions of the chip. Putting these plots side by side (the bottom left and right plots) lets us see regions where the background is raised, but the scale factor is unaffected, and vice versa.
SmudgeMiner is a development of the Genomics and Bioinformatics Group, Laboratory of Molecular Pharmacology (LMP), Center for Cancer Research (CCR) National Cancer Institute (NCI). If you have any problems, questions or feedback on the tool, please email us.