©Manning Publications Co. Please post comments or corrections to the Author Online forum:
http://www.manning-sandbox.com/forum.jspa?forumID=623
builder, null, model, 0.7, 1.0); C
System.out.println(score);
A Used only in examples for repeatable result
B Builds the same Recommender as above
C Use 70% of data to train; test with other 30%
Most of the action happens in
evaluate(). Inside, the RecommenderEvaluator handles splitting
the data into a training and test set, builds a new training
DataModel and Recommender to test, and
compares its estimated preferences to the actual test data.
Note that there is no
Recommender passed to this method. This is because, inside, the method will
need to build a
Recommender around a newly created training DataModel. So the caller must provide an
object that can build a
Recommender from a DataModel – a RecommenderBuilder. Here, it builds the
same implementation that was tried earlier in this chapter.
2.3.3 Assessing the result
This program prints the result of the evaluation: a score indicating how well the Recommender performed.
In this case you should simply see:
1.0. Even though a lot of randomness is used inside the evaluator to
choose test data, the result should be consistent because of the call to
RandomUtils.useTestSeed(),
which forces the same random choices each time. This is only used in such examples, and unit tests, to
guarantee repeatable results. Don’t use it in your real code.
What this value means depends on the implementation used – here,
AverageAbsoluteDifferenceRecommenderEvaluator. A result of 1.0 from this implementation
means that, on average, the recommender estimates a preference that deviates from the actual
preference by 1.0.
A value of 1.0 is not great, on a scale of 1 to 5, but there is so little data here to begin with. Your
results may differ as the data set is split randomly, and hence the training and test set may differ with
each run.
This technique can be applied to any
Recommender and DataModel. To use root-mean-square
scoring, replace
AverageAbsoluteDifferenceRecommenderEvaluator with the implementation
RMSRecommenderEvaluator.
Also, the
null parameter to evaluate() could instead be an instance of DataModelBuilder,
which can be used to control how the training
DataModel is created from training data. Normally the
default is fine; it may not be if you are using a specialized implementation of
DataModel in your
deployment. A
DataModelBuilder is how you would inject it into the evaluation process.
The
1.0 parameter at the end controls how much of the overall input data is used. Here it means
“100%.” This can be used to produce a quicker, if less accurate, evaluation by using only a little of a
potentially huge data set. For example,
0.1 would mean 10% of the data is used and 90% is ignored.
This is quite useful when rapidly testing small changes to a
Recommender.
2.4 Evaluating precision and recall
We could also take a broader view of the recommender problem: it’s not strictly necessary to estimate
preference values in order to produce recommendations. It’s not always essential to present estimated
preference values to users. In many cases, just an ordered list of recommendations, from best to worst, is
sufficient. In fact, in some cases the exact ordering of the list doesn’t matter much – a set of a few good
recommendations is fine.
Taking this more general view, we could also apply classic information retrieval metrics to evaluate
recommenders: precision and recall. These terms are typically applied to things like search engines, which
return some set of best results for a query out of many possible results.
A search engine should not return irrelevant results in the top results, although it should strive to
return as many relevant results as possible. “Precision” is the proportion of top results that are relevant,
for some definition of relevant. “Precision at 10” would be this proportion judged from the top 10 results.
“Recall” is the proportion of all relevant results included in the top results. See figure 2.3 for a
visualization of these ideas.
Licensed to Duan Jienan <jnduan@gmail.com>