extracting unexpected hidden patterns from data.” It is hard to see any analogous
connection between either data exploration or data mining and metaphorical worms. As
for automatically extracting hidden and unexpected patterns, there is some analogous
truth
to that statement. The real problem is that it gives no flavor for what goes into finding
those hidden patterns, why you would look for them, nor any idea of how to practically use
them when they are found. As a statement, it makes data mining appear to ex
where such things happen by themselves. This leads to “the expectation of magic” from
data mining: wave a magic wand over the data and produce answers to questions you
didn’t even know you had!
Without question, effective data exploration provides a disciplined approach to identifying
business problems and gaining an understanding of data to help solve them. Absolutely
no magic used, guaranteed.
Identifying Problems
The data exploration process starts by identifying the right problems to solve. This is not
as easy as it seems. In one instance, a major telecommunications company insisted that
they had already identified their problem. They were quite certain that the problem was
churn. They listened patiently to the explanat
ion of the data exploration methodology, and
then, deciding it was irrelevant in this case (since they were sure they already understood
the problem), requested a model to predict churn. The requested churn model was duly
built, and most effective it was t
oo. The company’s previous methods yielded about a 50%
accurate prediction model. The new model raised the accuracy of the churn predictions to
more than 80%. Based on this result, they developed a major marketing campaign to
reduce churn in their customer base. The company spent vast amounts of money
targeting at-risk customers with very little impact on churn and a disastrous impact on
profitability. (Predicting churn and stopping it are different things entirely. For instance, the
amazing discovery was made that unemployed people over 80 years old had a most
regrettable tendency to churn. They died, and no incentive program has much impact on
death!)
Fortunately they were persuaded by the apparent success, at least of the predictive
model, to continue with the project. After going through the full data exploration process,
they ultimately determined that the problem that should have been addressed was
improving retu
rn from underperforming market segments. When appropriate models were
built, the company was able to create highly successful programs to improve the value
that their customer base yielded to them, instead of fighting the apparent dragon of churn.
The value of finding and solving the appropriate problem was worth literally millions of
dollars, and the difference between profit and loss, to this company.
Precise Problem Definition
So how is an appropriate problem discovered? There is a methodology for doing just this.