new icn messageflickr-free-ic3d pan white
IMG_20141117_202635~2 | by James Abley
Back to photostream


My notes from Arun Kejariwal's talk at Velocity Conf EU 2014:


* Twitter collect millions of time-series

* Need to detect anomalies in a performant matter

** Analyse engagement

** Detect DOS

* There are 50 years of existing literature in this area

* A lot of it applies to normal distributions though

** with a mean and variance, mean + 3*variance gives 99.7% coverage

* Much of their data is multi-modal


1. Extract seasonal component using STL in R

2. Residual = Raw - Seasonal - Median

3. Run ESD / MAD etc




If they decompose the time-series:

The residual becomes uni-modal

So they can apply a vast array of existing techniques in an efficient manner


Once again, it's about using maths to turn the problem into one that has an efficient solution.

0 faves
Taken on November 17, 2014