"Spark RDD实现N-gram语言模型训练及应用案例"

需积分: 5 1 下载量 63 浏览量 更新于2024-04-12 收藏 1.46MB PDF 举报
The "藏经阁-Custom applications.pdf" document discusses the use of custom applications with Spark's Resilient Distributed Datasets (RDD) by Tejas Patil at Facebook. The presentation covers a variety of topics including the use case, real-world applications, previous solutions, Spark version, data skew, and performance evaluation. One specific example mentioned is the training of an N-gram language model for predicting words. Real world applications of this technology include auto-subtitling for Facebook videos and detecting low quality places, such as non-public places, home sweet home, and non-real places like apartments. The document provides insights into how custom applications with Spark's RDDs can be utilized effectively for data processing and analysis in various real-world scenarios. Performance evaluation is also highlighted as a key aspect of measuring the success of these custom applications.