new icn messageflickr-free-ic3d pan white

IMG_20141117_202635~2

My notes from Arun Kejariwal's talk at Velocity Conf EU 2014:

 

* Twitter collect millions of time-series

* Need to detect anomalies in a performant matter

** Analyse engagement

** Detect DOS

* There are 50 years of existing literature in this area

* A lot of it applies to normal distributions though

** with a mean and variance, mean + 3*variance gives 99.7% coverage

* Much of their data is multi-modal

 

1. Extract seasonal component using STL in R

2. Residual = Raw - Seasonal - Median

3. Run ESD / MAD etc

 

BUT

 

If they decompose the time-series:

The residual becomes uni-modal

So they can apply a vast array of existing techniques in an efficient manner

 

Once again, it's about using maths to turn the problem into one that has an efficient solution.

162 views
0 faves
0 comments
Taken on November 17, 2014