Do you know the Benford's law? It is one of these really irritating theories that you can't explain but do work very well (you'll find a lot of information and tools about it on the net). In a nutshell, it says that in current life, a number has a much higher probability to start with a "1" than with a "2", with a "2" than with a "3", etc. It is actually a logarithmic progression that has been empirically verified in many situations (some people even became crazy about it) and proved to be systematically true with 'natural' numbers like length of rivers, amount of money, size of objects… that's to say 99% of the figures we daily use. Although it has been proved that if the Benford's law exists then it should follow such logarithmic progression, it has never been really understood. Some people (like Stanislas Dehaene) believe that it is somehow linked to our brain capacity to access directly to small numbers (generally up to 3), without having to count or even 'think' (another amazing capacity called subitation).
In economy, Bentford's law is used for very serious matters like fraud detection and accounting certification (would you be tempted to evade tax, better you become an expert).
So we decided to question the relevance of Benford's law for Wan governance, Wan optimization and enterprise application delivery.
We took 3 sources of data:
- Some Ipanema configuration files. This is a machine generated file that contains IP addresses, access bandwidth values, natural names and a lot of different technical parameters;
- The Ipanema User Manual, which is a text document of nearly 500 pages that provides all details about this Autonomic networking system;
- … and of course this Wan governance blog since its beginning (not including this post to prevent self-reference situations).
Measured statistics are provided in the graph beside and compared to the values predicted by Benford. We can see that it works quite well, with two notable exceptions:
- Numbers in configuration files have much higher probability to start with a "1" than expected; this may be due to its 'machine generated' nature with many Booleans (0 or 1) and a relatively low rate of natural data and names;
- The User manual has a strange propensity to embed figures starting with a "2" (we even double checked it to get sure). Is it due to a mysterious characteristic of the Ipanema system, a psychological anomaly of the manual writer or do we finally find a notable exception to the Benford's law?
This should be an interesting topic for more investigations...
Comments