python爬虫爬取豆瓣网页上所有源代码为class="title"的信息的网页链接
时间: 2024-12-13 11:14:46 浏览: 3
Python爬虫——爬取豆瓣电影Top250代码实例
5星 · 资源好评率100%
在Python中,我们可以使用BeautifulSoup库配合requests库来抓取豆瓣网页上的特定信息。下面是一个基本的步骤说明:
首先,你需要安装必要的库,如果还没有安装,可以在命令行运行:
```bash
pip install beautifulsoup4 requests
```
然后,编写Python代码如下:
```python
import requests
from bs4 import BeautifulSoup
def get_links(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
# 找到包含class="title"的元素,假设它们是链接标签a
links = [a['href'] for a in soup.find_all('a', class_='title')]
return links
else:
print(f"请求失败,状态码: {response.status_code}")
return []
url_to_crawl = 'https://movie.douban.com/top250/' # 指定需要抓取的页面
result_links = get_links(url_to_crawl)
for link in result_links:
print(link)
#
阅读全文