A Little Analysis on the Memshrink Project

Table of Contents

1 A Little Analysis on the Memshrink Project

As the website ( https://wiki.mozilla.org/Performance/MemShrink ) describes, MemShrink is a project to reduce Firefox memory consumption. Summary (taken from the webpage) is

Speed. Firefox will be faster due to less cache pressure, less paging, and fewer/smaller GC and CC pauses. Changes that reduce memory consumption but make Firefox slower are not desirable.

Stability. Firefox will suffer fewer aborts/crashes due to virtual or physical memory exhaustion. The former is mostly a problem on 32-bit Windows builds with a 2GB or 4GB virtual memory limit, the latter is mostly a problem on mobile devices that lack swap space.

The engineers working on MemShrink asked the Metrics team to help discover and quantify what variables affect variables that related to MemShrink. Key among these is RESIDENT_MEMORY which is the resident memory that Firefox occupies. For a given installation, multiple measurements are taken before the data is submitted. The data, for a given installation, is recorded as a histogram (so we dont have serial correlations between observations …), and the final value used in modeling is the weighted mean.

1.1 The Data Set

We already have some variables at our disposal, them being

  • the version of Firefox (10,11 or 12)
  • number of addons<
  • does the installation have Firebug? (Yes/No)
  • memory size
  • Number of CPUs
  • RESIDENT_MEMORY

The data set was a 70% sample taken from the Hbase Telemetry table from 2012-01-01 to 2012-01-31. With those variables, we first looked at QQ Plots ( see http://en.wikipedia.org/wiki/Q-Q_plot ) to see the distribution for variable transformations (e.g. truncation? log transformation?). After that, the sample was further randomly sampled 1000 12.5% samples of the 70% sample. ANOVA was run across each of these samples and the quantiles of the residuals, the parameters and adjusted R squareds were taken.

1.2 Results

The following diagrams are scatter plots of some of the covariates vs. RESIDENT_MEMORY. This is used to consider interactions or transformations .In the following graphs, Y vs X implies Y on the vertical axis and X on the horizontal. 'log(A,2)' is the log of A with respect to base 2.

http://blog.mozilla.com/metrics/files/2012/02/addon.png

1(a) log(RESIDENT_MEMORY in bytes,2) vs. Log(# of Addons,2) by Version

http://blog.mozilla.com/metrics/files/2012/02/memsize.png

1(b) log(RESIDENT_MEMORY in bytes,2) vs. Log(Memsize in bytes,2) by Version

http://blog.mozilla.com/metrics/files/2012/02/hasfb.png

http://blog.mozilla.com/metrics/files/2012/02/resmem.png

Observations:

  1. there doesn't seem to be interaction with version except for a feeble one in (1c)
  2. 1(d) is also called the Empirical Cumulative Distribution Function.. If the distribution were uniform, it would be a diagonal at 45 degrees. The panel for '12' indicates that 60% of the obs. are less than 218 bytes.
  3. In summary, he data set is extremely noisy. Though the assumptions of modeling are met (independence etc) the variables only explain 35% of the variation (not even 50) as seen from the ANOVA R2.
  4. 1(a): definite increasing trend with log(addon+1) but there is so much variation! and no difference across versions.
  5. 1(b): same as (2) except that in 12 the flattening happens earlier though flattens at a higher value than 10,11. I expect some flattening to happen, why would RESIDENT_MEMORY increase continuously just because memory size is bigger?
  6. 1(c): marginal difference, but again so much noise.
  7. 1(d) the distribution of RESIDENT_MEMORY is almost same for 10,11,12 with a slight upward shift for 12.

We cannot expect a great fit with these variables. The next is a panel display of QQ plots of 100 random ANOVA residuals. Statistical inference in in this particular case study requires that the residuals be Gaussian distributed. Here 'normal' means the Normal distribution. They aren't perfectly Normal, but the departure looks acceptable. Not included are the other displays such Scale-Location plots, though upon inspection they show no relation between scale and location (i.e. variances does not depend on mean).

http://blog.mozilla.com/metrics/files/2012/02/qq1.png

1.3 Interpretation

Averaging the coefficients of 1000 regression produces the following parameter table. Though the estimates are precise the R squared itself is not the highest - a lot of explanation is required. Explanation of the table follows below it. The model is \(log(RESIDENT MEMORY,2) \sim version +log(addon+1,2) + hasFirebug(fb) + log(memorysize,2)\)

The first column of numbers is the left end of the 95% confidence interval of the coefficient, and the 3rd column is the upper end of the 95% CI.

                           2.5      Mean            97.5
(Intercept)       12.782759636 12.832504483 12.882249330
version11          0.109000871  0.121434363  0.133867855
version12          0.353176829  0.373191172  0.393205515
fbTRUE             0.039786606  0.064000099  0.088213592
log(addon + 1, 2)  0.261556671  0.265001297  0.268445923
addonlargeTRUE    -0.553224345 -0.476251602 -0.399278860
memreslargeTRUE   -0.219969543 -0.204470629 -0.188971716
log(cpu, 2)       -0.006946532 -0.001578957  0.003788619
memsize            0.386554181  0.391241738  0.395929294

What this means is that, ('keeping everything else fixed')

  • version 11 reduces memory consumption about 2% (on average, but keep in mind there is a lot of variation) over v.10
  • version 12 ups it by about 11% over v.10 (see the distribution at top of log of RESIDENT_MEMORY by version)
  • Presence of Firebug extension causes a slight increase (on average of 12%)
  • if one doubles the number of addons (and add 1 to this) the RESIDENT_MEMORY increases by approximately 33%

1.4 Future

The thing with large data is that even the smallest difference can be called 'significant'. What is true that 12 seems to use more memory. And that the difference between using Firebug or not decreases for 12. But equally importantly the collection of variables explains only 36% of the variance - so though the estimates are precise, there is a lot of variance around any estimate. Results should be taken with caution.

All the computation and analysis was done using RHIPE (see https://github.com/saptarshiguha/RHIPE ). We have only covered some variables we thought might affect RESIDENT_MEMORY. There are however 215 variables captured by the Telemetry project and we could well benefit doing some tree-based analysis (randomforests anyone?) to discover useful variables (that affect RESIDENT_MEMORY). More on this later.

1.5 Sample RHIPE Code

This is the code producing marginal plots.

yy <- function(location,var,N=3){
  m <- ewrap({
    Var <- unserialize(charToRaw(Sys.getenv("varname")))
    N <- unserialize(charToRaw(Sys.getenv("N")))
    r <- r[!is.na(r$memres) & !is.na(r$memsize) & r$memres>-Inf & r$memsize>-Inf,]
    Y <- r[,"memres"];X <- r[,Var]; V <- as.character(r[,"version"])
    for(i in 1:nrow(r)){
      rhcollect(list(version=V[i],var=X[i],value=round(Y[i],N)), 1)
    }})
  mapred <- list(varname=rawToChar(serialize(var,NULL,ascii=TRUE)),
                 N=rawToChar(serialize(N,NULL,ascii=TRUE)))
  reduce <- summer
  z <- rhmr2(m,reduce=reduce, combine=TRUE
           ,ifo="/user/sguha/telemetry/samples/1/p*",of=location
           ,mapred=mapred)
  rhstatus(rhex(z,async=TRUE),mon.sec=5)
  z <- rhread(sprintf("%s/p*",location))
  library(Hmisc)
  version <- unlist(lapply(z,function(r) r[[1]][[1]]))
  varble <- unlist(lapply(z,function(r) r[[1]][[2]]))
  value <- unlist(lapply(z,function(r) r[[1]][[3]]))
  count <- unlist(lapply(z,function(r) r[[2]][[1]]))
  z1 <- data.frame(version=version,var=varble,value=value, count=count,
                   stringsAsFactors=FALSE)
  colnames(z1) <- c("version",var,"value","count")
  z1 <- z1[order(z1$version,z1[,var],z1$value),]
  z2 <- do.call('rbind',lapply(split(z1,list(z1$version,z1[,var])),function(r){
    P <- 1:19/20
    data.frame(version=r[1,"version"],var=r[1,var], p=P, q= wtd.quantile(r$value,r$count, P),stringsAsFactors=FALSE)
  }))
  colnames(z2) <- c("version",var,"p","q")
  z2
}

library(lattice)
library(latticeExtra)

###############################
## Plot of memres vs addons
###############################
addons <- yy("/user/sguha/telemetry/tmp/b","addon")
pdf("~/addons.pdf")
asTheEconomist(
              xyplot(q~log(addon,2)|version, type='p',
              main='Quantiles of Log(RESIDENT\_MEMORY,2) vs. Log(addon,2)'
                     ,data=addons,pch=1, cex=0.3,col='#00000030'
                     ,layout=c(3,1),aspect=1,ylab='Log(RESIDENT\_MEMORY,2)'
                     ,panel=function(x,y,subscripts,...){
                       panel.grid(h=-1,v=-1,lwd=0.5)
                       panel.xyplot(x,y,...)
                       panel.loess(x,y,col='red',lwd=0.7,type='l')
                     })
               ,type='p',pch=16, cex=0.7,col='#00000060')
dev.off() ##cut at 6
#############################

Date: 2012-02-21 14:40:08 PST

Author: Saptarshi Guha

Org version 7.8.03 with Emacs version 24

Validate XHTML 1.0