20
©Manning Publications Co. Please post comments or corrections to the Author Online forum:
http://www.manning-sandbox.com/forum.jspa?forumID=623
Most of the action happens in evaluate(). Inside, the RecommenderEvaluator handles splitting
the data into a training and test set, builds a new training
DataModel and Recommender to test, and
compares its estimated preferences to the actual test data.
Note that we don’t pass a
Recommender to this method. This is because, inside, the method will
need to build a
Recommender around a newly created training DataModel. So we must provide an
object that can build a
Recommender from a DataModel – a RecommenderBuilder. Here, it builds
the same implementation that we tried in the first chapter.
2.3.3 Assessing the result
This program prints the result of the evaluation: a score indicating how well the Recommender
performed. In this case you should simply see:
1.0. Even though a lot of randomness is used inside the
evaluator to choose test data, the result should be consistent because of the call to
RandomUtils.useTestSeed(), which forces the same random choices each time. This is only used in
such examples, and unit tests, to guarantee repeatable results. Don’t use it in your real code.
What this value means depends on the implementation we used – here,
AverageAbsoluteDifferenceRecommenderEvaluator. A result of 1.0 from this implementation
means that, on average, the recommender estimates a preference that deviates from the actual
preference by 1.0.
A value of 1.0 is not great, on a scale of 1 to 5, but there is so little data here to begin with. Your
results may differ as the data set is split randomly, and hence the training and test set may differ with
each run.
This technique can be applied to any
Recommender and DataModel. To use root-mean-square
scoring, replace
AverageAbsoluteDifferenceRecommenderEvaluator with the implementation
RMSRecommenderEvaluator.
Also, the
null parameter to evaluate() could instead be an instance of DataModelBuilder,
which can be used to control how the training
DataModel is created from training data. Normally the
default is fine; it may not be if you are using a specialized implementation of
DataModel in your
deployment. A
DataModelBuilder is how you would inject it into the evaluation process.
The
1.0 parameter at the end controls how much of the overall input data is used. Here it means
“100%.” This can be used to produce a quicker, if less accurate, evaluation by using only a little of a
potentially huge data set. For example,
0.1 would mean 10% of the data is used and 90% is ignored.
This is quite useful when rapidly testing small changes to a
Recommender.
2.4 Evaluating precision and recall
We could also take a broader view of the recommender problem: we don't have to estimate preference
values to produce recommendations. It’s not always necessary to present estimated preference values
to users. In many cases, all we want is an ordered list of recommendations, from best to worst. In fact,
in some cases we don't care much about the exact ordering of the list – a set of a few good
recommendations is fine.
Taking this more general view, we could also apply classic information retrieval metrics to evaluate
recommenders: precision and recall. These terms are typically applied to things like search engines,
which return some set of best results for a query out of many possible results.
Licensed to nancy chen <amigo4u2009@gmail.com>