"Spark RDD实现N-gram语言模型训练及应用案例"
需积分: 5 93 浏览量
更新于2024-04-12
收藏 1.46MB PDF 举报
The "藏经阁-Custom applications.pdf" document discusses the use of custom applications with Spark's Resilient Distributed Datasets (RDD) by Tejas Patil at Facebook. The presentation covers a variety of topics including the use case, real-world applications, previous solutions, Spark version, data skew, and performance evaluation. One specific example mentioned is the training of an N-gram language model for predicting words. Real world applications of this technology include auto-subtitling for Facebook videos and detecting low quality places, such as non-public places, home sweet home, and non-real places like apartments. The document provides insights into how custom applications with Spark's RDDs can be utilized effectively for data processing and analysis in various real-world scenarios. Performance evaluation is also highlighted as a key aspect of measuring the success of these custom applications.
2024-11-07 上传
228 浏览量
2024-11-07 上传
2024-10-28 上传
2024-10-30 上传
2024-11-06 上传
2024-11-06 上传
2024-10-28 上传
2024-10-31 上传
![](https://profile-avatar.csdnimg.cn/28105908048e4518a28a3457cdef3389_weixin_40191861.jpg!1)
weixin_40191861_zj
- 粉丝: 89
最新资源
- Windows CE开发与嵌入式Linux资料概览
- Borland PME模型:属性、方法和事件
- Oracle全文检索技术深度解析
- 使用PHP接口实现与Google搜索引擎交互
- .Net框架中的Socket编程基础
- C#编程进阶指南:对象思考与核心技术
- Visual C# 中的MDI编程实践
- C语言数值计算:经典教程与源码解析
- TCP/IP协议下的Socket基础与进程通信解决策略
- Java学习经验分享:动态加载与类查找原理探索
- Oracle 1z0-031 认证考试试题与学习指南
- EJB3基础教程:元数据批注与EntityBean解析
- 深入理解Hibernate 3.x过滤器:参数化与灵活性提升
- Eclipse+MyEclipse集成:Struts+Spring+Hibernate开发用户信息查询示例
- Visual C#数据库编程基础:浏览、修改、删除与插入
- 基于小波变换的图像边缘检测Matlab代码实现