对网易有道词典单词本进行爬虫并存入Excel的python代码
时间: 2024-02-06 08:01:59 浏览: 170
以下是对网易有道词典单词本进行爬虫并存入Excel的Python代码:
```python
import requests
from bs4 import BeautifulSoup
import openpyxl
# 定义爬虫函数
def spider(url):
# 发送请求
response = requests.get(url)
# 解析HTML
soup = BeautifulSoup(response.content, 'html.parser')
# 获取单词
words = soup.select('.wordbook-wordlist .wordbook-word-item .wordbook-word')[1:]
# 获取词性和释义
meanings = soup.select('.wordbook-wordlist .wordbook-word-item .wordbook-word-item-meanings')[1:]
# 存储单词及其词性和释义
wordbook = []
for i in range(len(words)):
word = words[i].get_text().strip()
meaning = meanings[i].get_text().strip()
wordbook.append([word, meaning])
return wordbook
# 定义保存到Excel函数
def save_to_excel(wordbook, filename):
# 创建Excel文件
workbook = openpyxl.Workbook()
# 获取第一个sheet
sheet = workbook.active
# 写入表头
sheet.cell(row=1, column=1, value='单词')
sheet.cell(row=1, column=2, value='词性和释义')
# 写入单词及其词性和释义
for i in range(len(wordbook)):
word = wordbook[i][0]
meaning = wordbook[i][1]
sheet.cell(row=i+2, column=1, value=word)
sheet.cell(row=i+2, column=2, value=meaning)
# 保存Excel文件
workbook.save(filename)
# 测试代码
if __name__ == '__main__':
wordbook_url = 'http://dict.youdao.com/wordbook/wordlist'
response = requests.get(wordbook_url)
soup = BeautifulSoup(response.content, 'html.parser')
# 获取单词本链接
wordbook_links = soup.select('.wordbook-wrap .wordbook-title .wordbook-title-name a')
for link in wordbook_links:
href = link['href']
title = link.get_text().strip()
# 爬取单词本
wordbook = spider(href)
# 保存到Excel
save_to_excel(wordbook, f'{title}.xlsx')
```
这段代码将爬取网易有道词典单词本页面中的单词及其对应的词性和释义,然后将其保存到Excel文件中。使用时,将代码复制到Python文件中,修改`wordbook_url`为单词本页面的链接,然后运行脚本即可。注意要安装所需的库,如`requests`、`beautifulsoup4`和`openpyxl`等。
阅读全文