What is Pareto Distribution?

main-qimg-527ff9f1d02a79ea537c57af0efea0bc

What is an intuitive example of the Pareto Distribution?

The Pareto distribution is a probability distribution that seeks to describe quantities which have a particular property: namely, that a few items account for a lot of it and a lot of items account for a little of it.

This is vague, so let us consider a concrete scenario where we think of the “quantity” as the total income in a country and the “items” as the people in the country. That is, a small fraction of the people in a country (the few richest ones) tend to account for a large fraction of total income, and a large fraction of the people in a country (the many “poorer” ones) tend to account for a small fraction of total income. (In fact, this scenario is exactly what the famous economist Pareto had in mind when he developed his eponymous distribution.)

This effect in the context of wealth distribution is familiar to most people, where perhaps the most salient example in recent memory is the Occupy movement and their slogans regarding “the 99 percent” and “the 1 percent”. In particular, the Occupiers were upset because the 1 percent (the top 1 percent of people by earnings in the US) accounted for far more than 1 percent of the total income, and the 99 percent (the bottom 99 percent of people by earnings in the US) accounted for much less than 99 percent of the total income. These Occupiers were upset because they felt the distribution of wealth in the US was too “harsh” of a Pareto distribution.

Another commonly quoted factoid, that 20 percent of the people control 80 percent of the wealth (which itself is sometimes called the Pareto principle), is another statement of the fact that the allocation of income could be Pareto distributed (for those who care, these particular figures correspond to a Pareto distribution with shape parameter $α \approx 1.161$ ).

Although the concentration of wealth across people in a population is the canonical empirical example of the Pareto distribution, many other real life processes follow the “a few items account for a lot of it and a lot of items account for a little of it” Pareto paradigm.

Some examples:

business: the top few customers account for a bulk of the profits
insurance: a small number of big claims account for a bulk of payments
settlement: a small number of cities include a bulk of the population
languages: the most frequent words account for most words in books
gardening: a select few pods account for most of the peas
software: a small number of bugs account for most of the crashes

Since the person who asked this question included the topic Machinery, perhaps the example the asker is seeking comes from industry: the failure rate of machines. In this case, the interpretation of the Pareto distribution is that the “quantity” is “chance of machine failure” and the “item” is time: the earliest moments after a machine starts running are the most vulnerable to machine failure. Put another way, the longer the machine runs without failing, the smaller the chance it will fail NOW. In statistics, we would describe this phenomenon by stating that the Pareto distribution has a decreasing hazard function. Note that the Pareto distribution is just a model. It is not obvious that a machine would exhibit this behavior: it is perfectly possible that a machine is more likely to fail as time goes on, in which case the Pareto distribution would be a poor choice of model for its performance.