Baron Schwartz giving an Epic Talk on Benchmarking

Baron Schwartz giving an Epic Talk on Benchmarking

June 26, 2012 0 By Tad Reeves

Baron Schwartz from Percona gave an amazing talk on Benchmarking.  As someone who’s always loved reading about benchmarks, but having been pretty terrible at producing them myself, I found this talk fascinating — especially after my recent experience with attempting to run a bunch of inconclusive benchmarks on JBoss 4.2 vs JBoss 5.1 performance.

Put simply, Baron Schwartz is a benchmarking GOD. Listen to what he says.  Read his blog.  This guy is benchmarking sanity personified.

Bullets from his talk:

  • It’s important to establish goals for a benchmark, reasons why, legend, distribution, response time, etc – not just throughput
  • One needs a lot of info to think clearly about a benchmark
  • Ideal benchmark report:
    • Clear benchmark goals:
      • Validating hardware config (disk / cpu / etc) – see if it matches expectations
      • Compare two systems
      • Checking for regressions
      • Capacity planning (how will it perform at higher load than you have?)
      • Reproduce bad behaviour to solve it
        • Most systems you don’t want to push it as far as it’s max throughput, as at that point you’re beyond its threshhold of “good behaviour”.
      • Stress test to find bottlenecks
    • Get specs:
      • Get specs for CPU, disk, memory, network, including makes/models/etc.
      • SSDs are EXTREMELY tricky to benchmark
      • Versions of all software
      • RAID controller / filesystem
      • Disk queue scheduler –
        • a lot of Linux defaults have tons of desktop software shoved in there.  CFQ is standard disk scheduler (desktop – perf sucks) instead of noop or others
      • Generate some plots to summarize
    • Better Aggregate Measurement:
      • Average / Percentiles
      • Observation duration
      • 95th percentile = you can throw away the worst 1/20 of your day.  Means  you can throw away more than an hour of data per day.  I.e. your system can be rock bottom performing for an hour a day.  Not so good for establishing an SLA or SLO (objective).
      • Scatter graphs can be much more telling than a single point – as you can see if your performance is all over the map or if it returns a stable figure.   i.e. SSDs have performance all over the map, and have very different performance characteristics when empty / full or at start/end of the benchmark.
    • Performance:
      • Two metrics:  Thoughput and Response time (tasks per time or time per task)
      • They are not reciprocals
      • Resource consumption is NOT a good measure of performance – i.e. CPU% / Load Avg / etc.  These are indicators.  They are not the goal.
      • Be very careful with tools that report utilization.  At 100% utilization many systems are not actually saturated.
      • try ptdiskstats from perconia
    • What is a system’s actual capacity?
      • Max throughput at max achievable concurrency while being given acceptable performance (response time).
    • Recap:
      • Most benchmarks reveal little
    • if 1/20 is serialized, you’ll never get more than a 20x speedup from going parallel.
    • Isolating bottlenecks or iteratively optimizing them is one way – but don’t optimize things that don’t matter.  Don’t try to optimize little things.
    • Little’s law:  concurrency = throughput * response time
      • This holds regardless of queuing, arrival rate distribution, response time distribution, etc.
    • Utilization law:
      • Utilization = service time * throughput