利用requests、re爬取这个网站https://yuedu.163.com/book/rank/sell/week/p/s20所有数据的代码
时间: 2023-06-15 18:03:54 浏览: 115
以下是利用requests和re爬取这个网站https://yuedu.163.com/book/rank/sell/week/p/s20所有数据的代码:
```python
import requests
import re
url = 'https://yuedu.163.com/book/rank/sell/week/p/s20'
response = requests.get(url).text
pattern = re.compile('<tr>.*?<td class="ranknum">(.*?)</td>.*?<td class="title"><a href="(.*?)".*?>(.*?)</a></td>.*?<td class="author">(.*?)</td>.*?</tr>', re.S)
results = re.findall(pattern, response)
for result in results:
rank_num = result[0]
book_url = result[1]
book_name = result[2]
author = result[3]
print(rank_num, book_url, book_name, author)
```
运行代码后,即可输出该网站所有图书的排名、链接、书名和作者。
相关问题
爬取这个网站https://yuedu.163.com/book/rank/sell/week/p1/s20所有数据的代码
以下是使用 Python 3 爬取 https://yuedu.163.com/book/rank/sell/week/p1/s20 所有数据的代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://yuedu.163.com/book/rank/sell/week/p1/s20'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
books = soup.find_all('div', {'class': 'book-info'})
for book in books:
title = book.find('h3').text.strip()
author = book.find('p', {'class': 'author'}).text.strip()
category = book.find('p', {'class': 'category'}).text.strip()
description = book.find('p', {'class': 'desc'}).text.strip()
print(title)
print(author)
print(category)
print(description)
print('-' * 50)
```
该代码通过 requests 库向网站发送 GET 请求,获取网站内容。然后使用 BeautifulSoup 库解析网站内容,获取所有书籍的信息。最后,将每本书籍的标题、作者、分类和描述打印出来。
注意,该代码只能爬取第一页的数据。如果需要爬取更多页面的数据,可以修改 URL 中的页码参数。
阅读全文
相关推荐
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)