Big Data's Big 5 When and how to make change to a successful business or popular website can be a huge risk. Get things right and - at best - nobody notices. Get things wrong, however, and you run the risk of losing business and suffering a damaged reputation.
A good recent example is that of film and TV service Netflix, whose fluffed introduction of separate DVD and streaming charges in 2011 saw its previously high-flying chief executive Reid Hastings forced into a grovelling apology.
Change can have anticipated as well as unforeseen consequences. Understanding the risk involved is essential. If only there were only some structured way to account for risk in your analysis and decision making...
Happily, there is. Let’s start with a class of problem and then show you a way to solve it.
Take the website change example. You know that your customers fall into five age groups:
- Over 35
They also fall into four classes in terms of how long they have been using the site:
- New and keen
- Old and jaded
- Old and still keen
- Super cool
You have six classes of web page:
You also have different classes of advertisements that appeal, in different ways, to the different age groups and classes of users.
You know the numbers of different customer age groups and classifications and also how each combination reacts to your different classes of web pages and advertisements.
You are asked whether the site will make more money if you add more information pages - which the “old and jaded” hate but the “new and keen” love - at the expense of maps that the “super cools” love (even though they would never admit it) but both “old” groups detest.
In other words, you are being asked to predict what effect changing the balance of the web pages classes will have. The “classic” way of solving this type of problem is to calculate how a set of probabilities would interact.
You are probably (he typed hopefully) familiar with the concept of probabilities and how independent variables interact. Suppose that you and I drink at the same watering hole from time to time; you have a 50 per cent chance of being there on Friday night and I a 70 per cent chance. We can calculate the probability of meeting there simply by multiplying the probabilities (expressed, not as percentages, but as values between zero and one), so 0.5 x 0.7 = 0.35 meaning we have a 35 per cent chance of meeting.
We can combine probability calculations with combinatorial ones. Suppose we have a mutual friend, Fred, who has a 20 per cent chance of turning up. There are eight possible combinations of meeting, each with a probability:
|You and me||28|
|You and Fred||3|
|Fred and me||7|
|All of us||7|
The problem is that the number of combinations grows very quickly; ten topers would produce 1,024 combinations. So trying to calculate all the combinations of website customers, pages, advertisements and so on rapidly becomes very tedious. And that’s where another technique becomes useful: the Monte Carlo simulation.
Let’s take a real example (one that you will probably never want to solve yourself!) that illustrates the advantage of a Monte Carlo simulation beautifully and also introduces you to its inventor.
Stanislaw Marcin Ulam was a mathematician who worked on the Manhattan Project in Los Alamos at the invitation of John von Neumann, so we can assume he was reasonably good at difficult sums. In 1946 he was laid up in hospital and bored, so he started playing Canfield Solitaire. He fell to wondering how often it could be expected to "come out."
According to Ulam:
After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than ‘abstract thinking’ might not be to lay it out say one hundred times and simply observe and count the number of successful plays.
This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations. Later [in 1946], I described the idea to John von Neumann, and we began to plan actual calculations.
And there you have the whole concept of Monte Carlo simulations in a nutshell. Here was a problem where performing the necessary calculations defeated a world-class mathematician - but the problem was perfectly amenable to being solved using a Monte Carlo simulation.
The Monte Carlo simulation in this case could be a computer program that simply shuffled (randomised) the cards, laid them out, and then played to the rules. Of course, the next time it played, the order of the cards would be different (because of the randomisation during shuffling), so each game could have a different outcome. All the computer has to do is to play a hundred (or ten thousand) games and count the number of times it came out. At the end of it, we would have a very close approximation to the number we seek.