请写代码使用python可以实时爬取豆瓣近期热门的500部电影并存入sqlite3数据库
时间: 2023-06-07 07:09:05 浏览: 97
好的,以下是Python代码,可以实时爬取豆瓣近期热门的500部电影并存入SQLite3数据库中:
```python
import requests
from bs4 import BeautifulSoup
import sqlite3
# 获取电影的信息
def get_movie_info(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('div', class_='info')
movies = []
for item in items:
movie = {}
movie['title'] = item.find('a').get('title')
movie['score'] = item.find('span', class_='rating_num').text
movie['director'] = item.find('div', class_='bd').p.text.split('\n')[1].strip().split(' ')[1:]
movie['actors'] = item.find('div', class_='bd').p.text.split('\n')[2].strip().split(' ')[1:]
movie['quote'] = item.find('span', class_='inq').text if item.find('span', class_='inq') else ''
movies.append(movie)
return movies
# 存储电影到SQLite3数据库中
def save_movies_to_database(movies):
conn = sqlite3.connect('movies.db')
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS movies
(id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
score FLOAT NOT NULL,
director TEXT NOT NULL,
actors TEXT NOT NULL,
quote TEXT)''')
for movie in movies:
cursor.execute('INSERT INTO movies (title, score, director, actors, quote) VALUES(?,?,?,?,?)',
(movie['title'], float(movie['score']), ','.join(movie['director']), ','.join(movie['actors']),
movie['quote']))
conn.commit()
conn.close()
# 爬取豆瓣电影页面
def crawl_movies():
movies = []
for i in range(0, 500, 25):
url = 'https://movie.douban.com/top250?start={}'.format(i)
movies += get_movie_info(url)
save_movies_to_database(movies)
if __name__ == '__main__':
crawl_movies()
```
上述代码使用requests库、beautifulsoup库爬虫,爬取豆瓣电影页面的电影信息,并将电影信息存储到SQLite3数据库中。建议在每爬取完一页信息之后,先打印出来看一下,再进行存储,这样不仅可以检查爬虫代码是否正常运行,而且也方便看一下爬取的核心信息。
阅读全文