![](https://csdnimg.cn/release/download_crawler_static/88323589/bg7.jpg)
大规模机器学习
大规模机器学习
- 基于Spark Mllib的尝试&实践
经验之谈:
挑战:
亿维特征空间
• Too many RDD union >> stackoverflow
• Driver out of memory >> spark.driver.maxResultSize
• Model AUC=0.5 >> lower learning rate
• Integer.MAX_VALUE >> partition.size less than 2G
• Shuffle fetch failed >> spark.local.dir
• Shuffle fetch failed >> JVM GC adjustment
• Shuffle fetch failed >> spark.network.timeout
参数矩阵巨大
• 内存开销
• 网络开销