Spark Summit 2013:无代码迁移Hadoop Streaming到Spark,加速用户模型更新

需积分: 0 2 下载量 44 浏览量 更新于2024-07-23 收藏 3.93MB PDF 举报
在2013年的Spark Summit会议上,来自Yahoo的Gavin Li、Jaebong Kim和Andy Feng发表了关于将AEX(Audience Expansion)管道从Hadoop Streaming迁移到Spark的演讲。他们的目标是显著提高用户模型刷新速度至少两倍。这次演讲的重点在于阐述他们如何创新地实现这一迁移,无需任何代码改动。 首先,他们讨论了Yahoo Audience Expansion项目,这是一个基于Spark的应用,其核心是通过机器学习技术来扩大用户群体,通过训练模型找出与样本用户行为相似的潜在用户。这个系统依赖于大规模的机器学习能力,包括逻辑回归算法,处理的数据量达到TB级别,输入和中间数据规模巨大。在Hadoop Pipeline中,这个过程需要运行超过3万个map任务,2000个reduce任务,耗时长达16小时,涉及大约20个Hadoop Streaming作业。 然而,使用Spark的主要优势在于减少延迟和成本。传统的Hadoop Streaming架构存在性能瓶颈,特别是对于IO密集型任务,如6-7小时的标签标注阶段,产生了17TB的中间IO。Spark以其更高的并发性和内存计算能力,极大地优化了这些步骤,使得整个过程可以在更短的时间内完成。 在Spark中,他们实现了特征提取,从原始事件中抽取有用信息,这是模型训练阶段中的CPU密集型任务。他们采用了逻辑回归作为主要的机器学习模型,这一步骤在Spark环境下得到了高效的执行。Spark的并行处理能力显著提升了整个AEX管道的效率,使得即使是大型数据处理也能在更短的时间内完成,从而实现了从Hadoop Streaming到Spark的高效迁移。 Gavin Li、Jaebong Kim和Andy Feng在Spark Summit 2013上分享了他们在Yahoo如何通过创新性的迁移策略,利用Spark的可扩展性、高性能和内存计算优势,成功优化了大数据环境下的用户增长引擎,显著提高了数据处理速度和降低运营成本。这场演讲展示了Spark在处理大规模、复杂的数据分析任务中的潜力,以及它如何成为大数据处理领域的新标准。

Shifts in China’s Rural and Urban Population: 2000-2020 The bar chart clearly reveals that from 2000 to 2020, while the total population in China increased moderately from 1.25 billion to 1.41 billion, population in urban and rural areas experienced dramatic shifts in different directions. Urban population rose from 450 million in 2000 to 670 million in 2010 and 900 million in 2020; contrastingly, rural population declined from 800 million in 2000 to 680 million in 2010 and 510 million in 2020. The population gap narrowed largely because of the joint effects of urbanization, unequal economic opportunities in rural and urban areas, and the expansion of higher education. In the first place, there was a large-scale urban sprawl during this period. Places which had been part of the vast countryside were incorporated into cities, causing hundreds of millions of rural dwellers to be passively transformed into urban residents. What’s more, while urban living standards improved greatly in these years, few economic opportunities fell on rural areas and most peasant families remained at the poverty line. Poverty prompted the call for change, leading a large quantity of healthy young peasants to leave their hometowns and flock to cities for a better living. Last but not least, China’s higher education grew at an unprecedented rate in these years. More high school graduates than ever before entered colleges and universities, most of whom preferred to stay in urban areas after graduation for personal development. The increase in urban population was a sure indication of economic and educational achievements in China. It benefited the country in many aspects, relieving the shortage of labor force in cities, lessening the burden of peasants to support their families, and affording young people from rural areas more opportunities to display their talents. However, the migration of rural residents into urban areas inevitably brought about disadvantages. Some of them, such as waste of arable land and left-behind children in the countryside, as well as traffic congestion and soaring housing prices in cities, have already called the attention of the government and corresponding measures have begun to take effect. But others, especially the inability of many peasants to integrate into urban life due to their lack of education and civilized habits, have long been neglected. In this sense, we cannot be satisfied with the superficially optimistic figures in the chart, but should endeavor to foster the integration of these newcomers by providing them with adequate assistance in educational and cultural aspects, so that they can find easier access to the prosperity and convenience of urban life and be more fully devoted to the development of cities.翻译成英文版两百单词左右的文章

2023-02-21 上传