Notes on outlier detection

In signal processing / detection one of the main issues is that of outlier detection. An example of application is that of motion detection, another is beat detection (I'm currently working on a laser-based communication system that uses this). Basically, given some data coming from a signal (such as a photosensor) you need to identify the data that is outside the expected range (eg. unusually high motion, created by a human subject).

There are numerous ways to identify outliers. We name a few here:

  • take the max value from a sample: everything that is above max is an outlier (drawback: if someone passes in front of the sensors while you calculate this, you're fucked)
  • calculate the interquartile range (use k=1.5, or k=3 for extreme outliers) (drawback: you need to keep the sample values on the RAM, 100 samples seem to be fair)
  • estimate the interquartile range given mean and variance (e.g. as in this beat detection algorithm) (advantage: you can compute mean/variance as the algorithm runs ie. without using much RAM; drawback: less precise than the above)

Ohter references: