Traceback (most recent call last): File "/home/atguigu/tmdb/analyst.py", line 151, in <module> save(base + m[k]["path"], json.dumps(f())) File "/home/atguigu/tmdb/analyst.py", line 29, in countByGenres res = countByJson("genres").collect() File "/home/atguigu/tmdb/analyst.py", line 23, in countByJson return mdf.select(field).filter(mdf[field] != ' ').rdd.flatMap(lambda g: [(v, 1) for v in map(lambda x: x['name'], json.loads(g[field]))]).repartition(1).reduceByKey(lambda x, y: x + y) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1625, in reduceByKey File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1853, in combineByKey File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2263, in _defaultReducePartitions File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 391, in getNumPartitions File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco pyspark.sql.utils.IllegalArgumentException: 'Invalid start or len parameter'
时间: 2023-08-16 16:07:32 浏览: 124
01背包,部分背包,完全背包问题.docx
这是一个 Python 错误,看起来是在对一个 RDD 进行 reduceByKey 操作时出现了问题。具体的错误信息是 'Invalid start or len parameter',可能是由于某个参数传递错误导致的。需要检查代码中对该 RDD 的操作是否正确,例如过滤掉不合法的数据或者对数据进行预处理等。建议逐行检查代码,找出出错的具体位置,并逐步排除错误。
阅读全文