# IMG_20141117_202519~2

Notes from Theo Schlossnagle's talk at Velocity Conf EU 2014:

* Problem is how to understand meaningful signals. Answer is to use Maths!

* Don't track rates

* Track totals, at regular intervals

** Derive rates from that

* Disk space – you have typically 2 questions:

** How full is it?

** How quickly is it changing over time (capacity planning)

* Collecting once per minute can be too infrequent. You won't be able to react quickly enough

** So add more nodes, so you have more data

** Or increase frequency

*** This can be hard to do with cron, which fires on the minute

* Doing the maths, you can use:

** Exponentially Weighted Mean

** Sliding Window Mean

* Combining these can make them more powerful, and more memory-efficient too

* They currently use en.wikipedia.org/wiki/CUSUM, but are evaluating Tukey's range test.

* This gives a strong statistical model for alerting

* So for high-velocity streams

** summarise over a minute

** extract useful, less-dimensional characteristics

** apply CUSUM or Tukey

* See Brendan Gregg's mvalue for another useful approach

* Always for this, try to tweak the problem statement so that you can apply meaningful maths to your problem (that sounds a lot like algorithms, where people like Steven Skiena advise trying to frame your problem like a well-known algorithm that can be efficiently implemented)