编写一段代码:python爬取豆瓣电影top250中的电影名,编剧,主演,电影类型并保存在csv文件中,要有反反爬机制
时间: 2024-06-10 22:07:25 浏览: 116
python爬取豆瓣电影top250
好的,以下是代码:
import requests
from bs4 import BeautifulSoup
import csv
import time
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def get_movie(url):
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
items = soup.find_all('div', class_='item')
for item in items:
try:
name = item.find('span', class_='title').text
except:
name = '暂无'
try:
screenwriter = item.find('p', class_='').text.replace('\n', '').replace(' ', '')[-6:]
except:
screenwriter = '暂无'
try:
actor = item.find('p', class_='').text.replace('\n', '').split()[1]
except:
actor = '暂无'
try:
movie_type = item.find('p', class_='').text.replace('\n', '').split()[2]
except:
movie_type = '暂无'
info_list = [
name, screenwriter, actor, movie_type]
print(info_list)
with open('top250.csv', 'a', encoding='utf-8-sig', newline='') as f:
writer = csv.writer(f)
writer.writerow(info_list)
def main():
urls = ['http://movie.douban.com/top250?start={}'.format(str(i)) for i in range(0, 250, 25)]
for url in urls:
get_movie(url)
time.sleep(2)
if __name__ == '__main__':
main()
代码中加入了headers头信息,防止反爬,程序每爬一页就等待2秒,避免访问频繁。运行程序后会爬取豆瓣电影top250中的电影名、编剧、主演、电影类型,并保存在csv文件中。
阅读全文