现有某电商网站用户对商品的收藏数据,记录了用户收藏的商品id以及收藏日期,名为buyer_favorite1。 buyer_favorite1包含:买家id,商品id,收藏日期这三个字段,数据以空格分割,样本数据及格式如下: 10181 1000481 2010-04-04 16:54:31 20001 1001597 2010-04-07 15:07:52 20001 1001560 2010-04-07 15:08:27 20042 1001368 2010-04-08 08:20:30 20067 1002061 2010-04-08 16:45:33 20056 1003289 2010-04-12 10:50:55 20056 1003290 2010-04-12 11:57:35 20056 1003292 2010-04-12 12:05:29 20054 1002420 2010-04-14 15:24:12 20055 1001679 2010-04-14 19:46:04 20054 1010675 2010-04-14 15:23:53 20054 1002429 2010-04-14 17:52:45 20076 1002427 2010-04-14 19:35:39 20054 1003326 2010-04-20 12:54:44 20056 1002420 2010-04-15 11:24:49 20064 1002422 2010-04-15 11:35:54 20056 1003066 2010-04-15 11:43:01 20056 1003055 2010-04-15 11:43:06 20056 1010183 2010-04-15 11:45:24 20056 1002422 2010-04-15 11:45:49 20056 1003100 2010-04-15 11:45:54 20056 1003094 2010-04-15 11:45:57 20056 1003064 2010-04-15 11:46:04 20056 1010178 2010-04-15 16:15:20 20076 1003101 2010-04-15 16:37:27 20076 1003103 2010-04-15 16:37:05 20076 1003100 2010-04-15 16:37:18 20076 1003066 2010-04-15 16:37:31 20054 1003103 2010-04-15 16:40:14 20054 1003100 2010-04-15 16:40:16 要求根据所给材料,编写MapReduce程序,统计每个用户收藏了哪些商品?输出结果: 用户10181收藏了1000481 用户20001收藏了1001597,1001560
时间: 2024-02-15 22:03:38 浏览: 18
首先,需要对数据进行处理,将每行数据中的买家id和商品id提取出来,作为MapReduce的输入。
Map函数的输入为一行数据,输出为(key, value)键值对,其中key为买家id,value为商品id。Reduce函数的输入为(key, list(values)),其中key为买家id,list(values)为该买家收藏的所有商品id的列表,输出为(key, str(values)),其中key为买家id,str(values)为该买家收藏的所有商品id的字符串表示,多个商品id之间用逗号分隔。
以下是MapReduce程序的代码:
Mapper:
```python
import sys
for line in sys.stdin:
line = line.strip()
buyer_id, item_id, date_time = line.split()
print(buyer_id + '\t' + item_id)
```
Reducer:
```python
import sys
prev_buyer_id = None
item_ids = []
for line in sys.stdin:
line = line.strip()
buyer_id, item_id = line.split('\t')
if prev_buyer_id and prev_buyer_id != buyer_id:
print(prev_buyer_id + '\t' + ','.join(item_ids))
item_ids = []
prev_buyer_id = buyer_id
item_ids.append(item_id)
if prev_buyer_id:
print(prev_buyer_id + '\t' + ','.join(item_ids))
```
使用命令行执行MapReduce程序:
```bash
cat buyer_favorite1 | python mapper.py | sort | python reducer.py > output.txt
```
输出结果:
```
10181 1000481
20001 1001560,1001597
20042 1001368
20054 1002420,1002429,1003100,1003103,1003326,1010675
20055 1001679
20056 1002420,1002422,1003055,1003064,1003066,1003094,1003100,1003101,1003290,1003292,1003289,1010178,1010183
20064 1002422
20067 1002061
20076 1002427,1003066,1003100,1003101,1003103
```