编写一段爬取网站热搜词并存储在HBase的代码
时间: 2024-02-29 10:53:09 浏览: 69
携程+去哪儿爬虫Java代码,仅供参考
好的,以下是一个示例代码,可以爬取百度热搜词并存储在HBase中:
```python
import requests
from bs4 import BeautifulSoup
import happybase
# 连接HBase数据库
connection = happybase.Connection('localhost', port=9090)
table_name = 'hot_search'
if table_name not in connection.tables():
connection.create_table(table_name, {'data': dict()})
table = connection.table(table_name)
# 发送HTTP请求并解析网页
url = 'https://www.baidu.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 获取热搜词列表
hot_words = soup.find_all('a', {'class': 'title-content-title'})
for i, word in enumerate(hot_words):
# 将热搜词存储到HBase中
word_text = word.text.strip()
table.put(str(i), {'data:word': word_text})
# 关闭连接
connection.close()
```
请注意,在运行此代码之前,需要先安装并配置好HBase和HappyBase库。此外,还需要安装requests和beautifulsoup4库。
阅读全文