使用BeautifulSoup新闻的标题、内容、时间并保存在sqlite数据库文件中
时间: 2024-05-08 13:16:35 浏览: 16
以下是一个示例代码,用于从新闻网站中提取标题、内容、时间并将其保存在sqlite数据库文件中。在此示例中,我们将从https://news.yahoo.com/网站中提取新闻。
首先,我们导入必要的库:
```python
import requests
from bs4 import BeautifulSoup
import sqlite3
```
然后,我们将创建一个名为“news.db”的sqlite数据库文件,并在其中创建一个名为“news”的表格,该表格将包含标题、内容和时间的列:
```python
conn = sqlite3.connect('news.db')
c = conn.cursor()
c.execute('''CREATE TABLE news
(title text, content text, time text)''')
```
接下来,我们将使用requests库获取网页的HTML内容,并使用BeautifulSoup库解析HTML内容:
```python
url = 'https://news.yahoo.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
```
我们可以使用BeautifulSoup选择器来提取所需的元素。例如,以下代码将提取新闻标题、内容和时间,并将其插入到“news”表格中:
```python
for article in soup.select('div.js-stream-content > ul > li > div > div > h3'):
title = article.get_text().strip()
content = ""
time = ""
link = article.find('a')['href']
article_response = requests.get(link)
article_soup = BeautifulSoup(article_response.content, 'html.parser')
for p in article_soup.select('div.caas-body > div > p'):
content += p.get_text().strip() + "\n"
time = article_soup.select('div.auth-prov-soc > div > span')[0].get_text().strip()
c.execute("INSERT INTO news VALUES (?, ?, ?)", (title, content, time))
```
最后,我们将提交更改并关闭数据库连接:
```python
conn.commit()
conn.close()
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import sqlite3
conn = sqlite3.connect('news.db')
c = conn.cursor()
c.execute('''CREATE TABLE news
(title text, content text, time text)''')
url = 'https://news.yahoo.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for article in soup.select('div.js-stream-content > ul > li > div > div > h3'):
title = article.get_text().strip()
content = ""
time = ""
link = article.find('a')['href']
article_response = requests.get(link)
article_soup = BeautifulSoup(article_response.content, 'html.parser')
for p in article_soup.select('div.caas-body > div > p'):
content += p.get_text().strip() + "\n"
time = article_soup.select('div.auth-prov-soc > div > span')[0].get_text().strip()
c.execute("INSERT INTO news VALUES (?, ?, ?)", (title, content, time))
conn.commit()
conn.close()
```