用python爬取网站多页内容并保存到txt文件中
时间: 2023-10-15 12:03:30 浏览: 346
可以使用Python中的requests和BeautifulSoup库来爬取网站内容并保存到txt文件中。具体步骤如下:
1. 导入所需的库
```python
import requests
from bs4 import BeautifulSoup
```
2. 定义一个函数来获取每一页的内容
```python
def get_page_content(url):
response = requests.get(url)
if response.status_code == 200:
html = response.text
soup = BeautifulSoup(html, 'lxml')
return soup.get_text()
else:
return None
```
3. 定义一个函数来保存内容到txt文件中
```python
def save_to_txt(content, filename):
with open(filename, 'a', encoding='utf-8') as f:
f.write(content)
```
4. 循环获取多页内容并保存到txt文件中
```python
for i in range(1, 11): # 以获取10页为例
url = f'https://www.example.com/page/{i}'
content = get_page_content(url)
if content:
save_to_txt(content, 'example.txt')
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
def get_page_content(url):
response = requests.get(url)
if response.status_code == 200:
html = response.text
soup = BeautifulSoup(html, 'lxml')
return soup.get_text()
else:
return None
def save_to_txt(content, filename):
with open(filename, 'a', encoding='utf-8') as f:
f.write(content)
for i in range(1, 11): # 以获取10页为例
url = f'https://www.example.com/page/{i}'
content = get_page_content(url)
if content:
save_to_txt(content, 'example.txt')
```
将上述代码中的'https://www.example.com/page/{i}'替换为你要爬取的网站的url即可。
阅读全文