pycharm爬取虎扑湖人社区标题内容,并且导入mongodb数据库
时间: 2023-11-07 07:50:57 浏览: 154
好的,你可以按照以下步骤来实现:
1. 安装 pymongo 和 requests 库,使用以下命令进行安装:
```
pip install pymongo requests
```
2. 打开 PyCharm,创建一个新的 Python 项目。
3. 在项目中创建一个名为 `spider.py` 的文件。
4. 导入所需的库:
```python
import requests
from bs4 import BeautifulSoup
from pymongo import MongoClient
```
5. 连接到 MongoDB 数据库:
```python
client = MongoClient('mongodb://localhost:27017/')
db = client['hupu']
collection = db['lakers']
```
6. 爬取虎扑湖人社区的标题和内容:
```python
url = 'https://bbs.hupu.com/lakers'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
titles = soup.select('.titlelink > a')
for title in titles:
href = title['href']
titleText = title.text
contentResponse = requests.get(href, headers=headers)
contentSoup = BeautifulSoup(contentResponse.text, 'html.parser')
content = contentSoup.select('.quote-content')
if len(content) > 0:
contentText = content[0].text
post = {
'title': titleText,
'content': contentText
}
collection.insert_one(post)
```
7. 运行代码,爬取数据并将其导入到 MongoDB 数据库中。
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['hupu']
collection = db['lakers']
url = 'https://bbs.hupu.com/lakers'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
titles = soup.select('.titlelink > a')
for title in titles:
href = title['href']
titleText = title.text
contentResponse = requests.get(href, headers=headers)
contentSoup = BeautifulSoup(contentResponse.text, 'html.parser')
content = contentSoup.select('.quote-content')
if len(content) > 0:
contentText = content[0].text
post = {
'title': titleText,
'content': contentText
}
collection.insert_one(post)
```
注意:在运行代码之前,请确保你的 MongoDB 服务已经启动。
阅读全文