python爬虫爬取电影源码
时间: 2023-08-25 08:15:52 浏览: 181
当然可以!以下是一个简单的示例代码,你可以使用Python的requests库和BeautifulSoup库来实现爬取电影信息的功能:
```python
import requests
from bs4 import BeautifulSoup
def get_movie_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = []
for link in soup.find_all('a'):
href = link.get('href')
if href and 'movie' in href:
links.append(href)
return links
def get_movie_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.find('h1').text.strip()
rating = soup.find('span', class_='rating').text.strip()
description = soup.find('div', class_='description').text.strip()
return {
'title': title,
'rating': rating,
'description': description
}
base_url = 'https://www.example.com/movies/'
movie_links = get_movie_links(base_url)
for link in movie_links:
movie_url = base_url + link
movie_info = get_movie_info(movie_url)
print(movie_info)
```
请注意,上述代码中的 `base_url` 应该替换为你要爬取的电影网站的实际URL,同时你可能需要根据实际情况进行适当的修改和优化。另外,请确保你遵守网站的爬取规则和法律法规。
阅读全文