一整段利用python进行逆向分析获取数据的代码及解析,实现目标如下:1.获取“新书推荐”图书名称;2.获取图书ID;3.获取图书照片网址;4.存储数据到word里面。
时间: 2024-06-12 10:03:24 浏览: 12
由于没有具体的网址提供,这里提供一个示例代码,演示如何利用Python进行逆向分析获取数据并存储到Word中。
1.获取“新书推荐”图书名称
首先,我们需要找到包含“新书推荐”图书名称的HTML元素,并提取其中的文本信息。可以使用Python中的requests和BeautifulSoup库来实现。
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com/new-books'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
book_names = []
# 找到包含“新书推荐”图书名称的HTML元素
book_elements = soup.find_all('div', {'class': 'book-name'})
# 提取元素中的文本信息
for book_element in book_elements:
book_name = book_element.text.strip()
book_names.append(book_name)
print(book_names)
```
2.获取图书ID
类似地,我们可以找到包含图书ID的HTML元素,并提取其中的信息。
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com/new-books'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
book_ids = []
# 找到包含图书ID的HTML元素
book_elements = soup.find_all('div', {'class': 'book-info'})
# 提取元素中的ID信息
for book_element in book_elements:
book_id = book_element['data-id']
book_ids.append(book_id)
print(book_ids)
```
3.获取图书照片网址
同样地,我们可以找到包含图书照片网址的HTML元素,并提取其中的信息。
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com/new-books'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
book_photos = []
# 找到包含图书照片网址的HTML元素
book_elements = soup.find_all('div', {'class': 'book-photo'})
# 提取元素中的照片网址信息
for book_element in book_elements:
book_photo = book_element.find('img')['src']
book_photos.append(book_photo)
print(book_photos)
```
4.存储数据到Word里面
最后,我们可以使用Python中的docx库将获取到的数据存储到Word文档中。
```python
import requests
from bs4 import BeautifulSoup
from docx import Document
url = 'https://www.example.com/new-books'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
book_names = []
book_ids = []
book_photos = []
# 获取图书名称、ID和照片网址
book_elements = soup.find_all('div', {'class': 'book-block'})
for book_element in book_elements:
book_name = book_element.find('div', {'class': 'book-name'}).text.strip()
book_id = book_element.find('div', {'class': 'book-info'})['data-id']
book_photo = book_element.find('div', {'class': 'book-photo'}).find('img')['src']
book_names.append(book_name)
book_ids.append(book_id)
book_photos.append(book_photo)
# 存储数据到Word文档中
document = Document()
document.add_heading('New Books', 0)
# 添加表格并填充数据
table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Book Name'
hdr_cells[1].text = 'Book ID'
hdr_cells[2].text = 'Book Photo'
for i in range(len(book_names)):
row_cells = table.add_row().cells
row_cells[0].text = book_names[i]
row_cells[1].text = book_ids[i]
row_cells[2].add_picture(requests.get(book_photos[i], stream=True).raw, width=100000)
document.save('new_books.docx')
```
这段代码将获取到的图书名称、ID和照片网址存储到了一个Word文档中,并添加了一个表格用于展示数据。其中,添加图片时需要使用requests库的raw属性来获取图片的二进制数据,并设置图片的宽度。