R benchmarking of Spectre and Meltdown mitigations on Windows

Like many, I suspect, I was a little concerned about whether recent Windows updates to address Meltdown and Spectre would have a significant adverse effect on calculation time in R. I do quite a lot of survival analysis using large(ish) data sets, which already takes a long time. Unfortunately it only occurred to me after updating my main PC, so decided to do a little benchmarking using an old one, both before and after the relevant Windows update. The results were quite interesting.

      test replications elapsed relative user.self sys.self
Pre-Update          100  287.31        1    281.41     5.68
PostUpdate          100  334.71        1    290.42    44.27
PostUpdate          100   338.9        1    294.22     44.5

The increase in elapsed time is irritating but manageable. What is really striking is this slowdown is mostly due to nearly 8 fold increase in system time. Is this expected? Has anyone else done a similar exercise? It would be interesting to know the range of the impact others are experiencing, since I only did this with one particular type of calculation that is a bottleneck in some of my work, and other calculations may suffer more. I appreciate that is is pushing the boundaries of the type of question stackoverflow is intended for, but since this is likely to affect almost everyone, answers sharing experience might be useful.

Just a bit of background context: I ran these tests with as few other processes as possible running.

  • Processor: i5-2500 (no microcode update obviously!)
  • OS: Windows 7, pre and post update KB4056897
  • R version 3.4.3
  • And just for completeness, the code run is:

    library("survival")
    library("rbenchmark")
    set.seed(42)
    n = 1e6
    censTimes <- seq(from = 0, to = 1, length.out = n)
    failTimes <- rweibull(n, 1, -1/log(0.9))
    Event <- failTimes < censTimes
    obsTimes <- ifelse(Event, failTimes, censTimes)
    survObj <- Surv(obsTimes, Event)
    Group <- rbinom(n, 4, 0.5)
    benchmark(coxph(survObj ~ Group))
    

    Thanks for getting some real numbers, including a specific CPU model and benchmark code.

    Yes, it makes sense that most of the impact is in system time due to Meltdown mitigation. Every kernel / userspace transition has to modify the page tables, resulting in TLB invalidation. The kernel probably has to touch more different pages than R does, because R is probably mostly working with fewer large allocations, and not checking variables / data structures scattered all over the place.

    If Windows is doing Spectre mitigation, too, it might be doing something quite slow. IDK, I haven't looked into how OSes try to mitigate Spectre other than with a retpoline for every indirect branch. (Intentionally causing a mispredict to a known location with the return-address predictor instead of being subject to branch-target injection by malicious code that primes the predictors).


    (Although IDK enough about R to know why it's making enough system calls to take a significant fraction of the time even pre-update.)

    Page-table invalidation when returning from kernel to user-space also adds to user time, but user time was large anyway. (And as I said, quite likely R doesn't touch many different pages, so only needs a few TLB misses -> page walks to get back "up to speed" after a system call or interrupt.)

    related: more microarchitectural details about Meltdown, and why that CPU design made sense before anyone thought of the Meltdown attack.

    链接地址: http://www.djcxy.com/p/86532.html

    上一篇: System.currentTimeMillis与System.nanoTime

    下一篇: R在Windows上进行Specter和Meltdown缓解的基准测试