Benford's Law

Benford's law in its simplest form says that the first digit of a large enough set of data will have a distribution that has some kind of exponential decay. The digit 1 appears much more often (about 30%) than other digits, and 9 is the least probable (about 4.6%). On the other hand, at the last digit position, all digits can appear with the same probability. If the data set is artificially constructed, then this law will usually not be satisfied. This means that a check of Benford's law can be used to detect fraudulently manipulated data. The author of this book seems to have made it a life's purpose to explore and exploit this law. See Forensic Benford and he also applied for a patent.

The book consists of a collection of 147 relatively short texts (from half a page to a dozen pages) called chapters, that are organized in 7 sections. It is surprising to find sections subdivided into chapters while it is usually the opposite.

The first section illustrates the appearance of Benford's law with many examples and gives some historical comments. The law was detected by Simon Newcomb in 1881 and independently by Frank Benford in 1938 because they observed that in old books with logarithm tables, the pages for the lower digits were more worn out than the pages for higher digits. Benford formulated the law that the probability that the first digit is d is given by log10(d+1) - log10(d) = log10(1+1/d). Later this was generalized to the probability that the first two digits are pq is given by log10(1+1/pq) etc. Furthermore it was detected that the data set needs to span a sufficiently large range excluding long thin tails.

Section 2 investigates how fraudulent data sets can be detected. Of course, when the law is not satisfied, great care must be taken not to draw the wrong conclusion. There may be other reasons for the deviation. For example, it should be verified by a compliance test (section 3) whether the data should obey Benford's law or not. Measures of how much the data deviates from the law and statistical and other tests should identify which digits are off and by how much.

Section 4 is called "conceptual and mathematical foundation". However as a statistician or a mathematician, you may be somewhat disappointed. There are no strong theorems or formal proofs, scarcely a formula, and not a systematic, structured approach to the subject. Just the same kind of seemingly nearly unrelated topics and ideas that were encountered so far. A First issue is to find a model that really fits the ideal Benford's law. I.e., the first digits from 1 to 9 appear with probability 30.1%, 17.6%, 12.5%, 9.7%, 6.7%, 5.8%, 5.1%, 4.6%. The problem with large data sets is that the samples come from many different distributions so that one could think of distributions of distributions of distributions... but a definitive solution never pops up in this avalanche of ideas and variations on the theme. Others considerations of possible distributions of the digits are given and tests to verify whether the data satisfy Benford's law or should satisfy it are discussed. Is the uniformity of the first digits of the mantissa of the log of the data a guarantee for the law to hold and what distributions imply this uniform behavior of the mantissa's first digit? Does scaling of the data influence the validity of the law? It can be shown that for data sampled from an interval [a,b] with a and b powers of 10, according to the pdf f(x) = k/x (k a normalizing constant), we do have Benford's law. However, this is mainly of theoretical interest since most real life data sets vaguely resemble something like a normal or lognormal distribution. However, the exponential and the lognormal distributions with appropriate parameters approximate the k/x distribution well enough. Also a deterministic geometric sequence with ratio r > 1 will generically satisfy Benford's law. Since the frequency of the leading digit strongly depends on the interval in which the data fall, it is important to subdivide the whole range in intervals whose boundaries are integer powers of ten.

Section 5 is about Benford's law in physics, chemistry, biology, social sciences etc. This relatively short section illustrates by several examples that the appropriate distributions of the data appear indeed frequently so that Benford's law shows on many examples. Also the next section, collecting diverse topics, is relatively short. It considers distributions of the second digit and the first two digits, other distributions and chains of distributions that may lead to Benford's law, certain ratios in a geometric series may cause cycles in the sequence of the mantissa and thus break the law, and many other topics.

The last section 7 is again somewhat longer. The idea is here to place Benford's law in a much larger and broader context. The attention shifts from relative frequencies of digits to the relative frequencies of quantities. Are there more small quantities than large ones? In other words is Benford's law a general law of nature, not depending on the man-made number system. By using histograms to classify data in bins, and assuming the quantities behave like the k/x law, then this can be translated into any number system that will result in Benford's law.

What I knew about Benford's law before was that was a, although surprising, but rather simple fact that could be explained and justified by some simple arguments and formulas. After reading this book, I am totally confused. This provides some 650 pages of ideas, examples, guesses, half truths, conjectures, but brought in an unsystematic, and in my view a mathematically unprofessional way. Agreed, the author has the ambition to bring the material for the non-professional, and hence most arguments are verbal, and avoiding formulas in most chapters. Although there are some chapters in the last section that have a high density of formulas. Unfortunately mathematicians are spoiled by Knuth's wonderful typesetting of formulas with LaTeX, the formulas in this book will look horrible to them. There are some interesting ideas hidden, but one needs to dig them up from the debris. The material need to be much more boiled down, purified, and organized to make this an advisable read for professionals.

Adhemar Bultheel
Book details

This is a collection of ideas concerning Benford's law, which says that under certain conditions, looking at the frequencies of the first digits in a large data set, it is observed that the smaller digits appear much more frequent than the larger ones, following some exponential decay.



978-981-4583-68-8 (hbk)
£ 102.00 (hbk)

User login