需要详细的整个过程的代码 拜托了
时间: 2024-04-27 13:24:35 浏览: 134
好的,这里提供一个详细的代码实现,包括爬虫、MongoDB存储和Flask Web程序。
1. 爬取蔚蓝书店数据
```python
import requests
from bs4 import BeautifulSoup
def get_books():
url = 'https://www.weilan-shudian.com/book'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
books = []
for book in soup.select('table tr')[1:]:
title = book.select_one('td:nth-of-type(1)').text.strip()
author = book.select_one('td:nth-of-type(2)').text.strip()
price = book.select_one('td:nth-of-type(3)').text.strip()
publisher = book.select_one('td:nth-of-type(4)').text.strip()
publish_date = book.select_one('td:nth-of-type(5)').text.strip()
books.append({
'title': title,
'author': author,
'price': price,
'publisher': publisher,
'publish_date': publish_date
})
return books
```
2. 存储到MongoDB数据库
```python
import pymongo
client = pymongo.MongoClient('mongodb://localhost:27017/')
db = client['bookstore']
collection = db['books']
def save_books(books):
for book in books:
collection.update_one({'title': book['title']}, {'$set': book}, upsert=True)
```
3. Flask Web程序
```python
from flask import Flask, render_template, jsonify
import pymongo
app = Flask(__name__)
client = pymongo.MongoClient('mongodb://localhost:27017/')
db = client['bookstore']
collection = db['books']
@app.route('/')
def index():
books = list(collection.find())
return render_template('index.html', books=books)
@app.route('/progress')
def progress():
books_count = collection.count_documents({})
return jsonify({'progress': f'{books_count} books have been saved.'})
if __name__ == '__main__':
app.run(debug=True)
```
4. 程序入口
```python
import time
from flask import Flask, render_template, jsonify
import pymongo
app = Flask(__name__)
client = pymongo.MongoClient('mongodb://localhost:27017/')
db = client['bookstore']
collection = db['books']
@app.route('/')
def index():
books = list(collection.find())
return render_template('index.html', books=books)
@app.route('/progress')
def progress():
books_count = collection.count_documents({})
return jsonify({'progress': f'{books_count} books have been saved.'})
if __name__ == '__main__':
while True:
books = get_books()
save_books(books)
time.sleep(60) # 等待60秒再次爬取
```
5. 模板文件
```html
<!-- index.html -->
<!DOCTYPE html>
<html>
<head>
<title>Bookstore</title>
<style>
table {
border-collapse: collapse;
margin: 20px 0;
}
table th, table td {
padding: 10px;
border: 1px solid #ddd;
text-align: center;
}
table th {
background: #f2f2f2;
}
</style>
</head>
<body>
<h1>Bookstore</h1>
<table>
<thead>
<tr>
<th>Title</th>
<th>Author</th>
<th>Price</th>
<th>Publisher</th>
<th>Publish Date</th>
</tr>
</thead>
<tbody>
{% for book in books %}
<tr>
<td>{{ book.title }}</td>
<td>{{ book.author }}</td>
<td>{{ book.price }}</td>
<td>{{ book.publisher }}</td>
<td>{{ book.publish_date }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<p id="progress"></p>
<script>
function getProgress() {
fetch('/progress')
.then(response => response.json())
.then(data => {
document.querySelector('#progress').innerText = data.progress;
});
}
setInterval(getProgress, 1000);
</script>
</body>
</html>
```
这是一个完整的程序框架,你可以在此基础上进行进一步的修改和完善。注意,代码中使用了定时器来每隔60秒自动爬取一次数据,这可以根据实际需求进行调整。
阅读全文