2009-05-08 17:37:18,218 INFO Generator - Generator: Selecting best-scoring urls d
ue for fetch.
2009-05-08 17:37:18,625 INFO Generator - Generator: starting
2009-05-08 17:37:18,937 INFO Generator - Generator: segment: 20090508/segments/20
090508173137
2009-05-08 17:37:19,468 INFO Generator - Generator: filtering: true
2009-05-08 17:37:22,312 INFO Generator - Generator: topN: 50
2009-05-08 17:37:51,203 INFO Generator - Generator: jobtracker is 'local', generat
ing exactly one partition.
2009-05-08 17:39:57,609 INFO JvmMetrics - Cannot initialize JVM Metrics with proce
ssName=JobTracker, sessionId= - already initialized
2009-05-08 17:40:05,234 WARN JobClient - Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
2009-05-08 17:40:05,406 WARN JobClient - No job jar file set. User classes may not
be found. See JobConf(Class) or JobConf#setJar(String).
2009-05-08 17:40:05,437 INFO FileInputFormat - Total input paths to process : 1
2009-05-08 17:40:06,062 INFO FileInputFormat - Total input paths to process : 1
2009-05-08 17:40:06,109 INFO MapTask - numReduceTasks: 1
省略插件加载日志……
2009-05-08 17:40:06,312 INFO Configuration - found resource crawl-urlfilter.txt a
t file:/D:/work/workspace/nutch_crawl/bin/crawl-urlfilter.txt
2009-05-08 17:40:06,343 INFO FetchScheduleFactory - Using FetchSchedule impl: org.
apache.nutch.crawl.DefaultFetchSchedule
2009-05-08 17:40:06,343 INFO AbstractFetchSchedule - defaultInterval=2592000
2009-05-08 17:40:06,343 INFO AbstractFetchSchedule - maxInterval=7776000
2009-05-08 17:40:06,343 INFO MapTask - io.sort.mb = 100
2009-05-08 17:40:06,437 INFO MapTask - data buffer = 79691776/99614720
2009-05-08 17:40:06,437 INFO MapTask - record buffer = 262144/327680
2009-05-08 17:40:06,453 WARN RegexURLNormalizer - can't find rules for scope 'part
ition', using default
2009-05-08 17:40:06,453 INFO MapTask - Starting flush of map output
2009-05-08 17:40:06,625 INFO MapTask - Finished spill 0
2009-05-08 17:40:06,640 INFO TaskRunner - Task:attempt_local_0003_m_000000_0 is
done. And is in the process of commiting
2009-05-08 17:40:06,640 INFO LocalJobRunner - file:/D:/work/workspace/nutch_craw
l/20090508/crawldb/current/part-00000/data:0+143
2009-05-08 17:40:06,640 INFO TaskRunner - Task 'attempt_local_0003_m_000000_0' d
one.
2009-05-08 17:40:06,656 INFO LocalJobRunner -
2009-05-08 17:40:06,656 INFO Merger - Merging 1 sorted segments
2009-05-08 17:40:06,656 INFO Merger - Down to the last merge-pass, with 1 segments
left of total size: 78 bytes
2009-05-08 17:40:06,656 INFO LocalJobRunner –
省略插件加载日志……