首页爬虫 python代码

爬虫 python代码

时间: 2024-08-16 21:08:26 浏览: 47

在Python中，爬虫通常使用一些库来实现，最流行的是`requests`和`BeautifulSoup`或`Scrapy`框架。以下是使用这两个库简单爬取网页数据的基本步骤： 1. **导入所需库**： ```python import requests from bs4 import BeautifulSoup ``` 2. **发送HTTP请求**： ```python url = "http://example.com" # 需要抓取的网站URL response = requests.get(url) ``` 3. **解析HTML内容**： ```python soup = BeautifulSoup(response.text, 'html.parser') ``` 4. **查找并提取数据**： ```python data_elements = soup.find_all('div', class_='example') # 根据需要选择标签和属性 data = [element.text for element in data_elements] ``` 5. **处理数据或保存到文件**： ```python for item in data: print(item) # 或者将数据保存到CSV、JSON等文件 with open('output.txt', 'w') as f: f.write('\n'.join(data)) ```

阅读全文