python网络爬虫实例代码
时间: 2023-07-12 14:02:41 浏览: 79
以下是一个简单的Python网络爬虫示例代码:
```python
import requests
from bs4 import BeautifulSoup
# 定义目标网站URL
url = "https://www.example.com"
# 使用requests库获取目标网站HTML代码
response = requests.get(url)
html = response.text
# 使用BeautifulSoup库解析HTML代码
soup = BeautifulSoup(html, "html.parser")
# 找到目标元素并提取内容
target_element = soup.find("div", {"class": "target-class"})
target_content = target_element.text
# 打印提取的内容
print(target_content)
```
请注意,这只是一个简单的示例,实际的网络爬虫代码需要更多的处理和异常处理。此外,请确保您的爬虫代码遵循网站的使用条款和条件,以避免违反法律和道德规范。
相关问题
python网页爬虫实例代码
以下是三个Python爬虫项目实例代码,供您参考:
1. 爬取豆瓣电影Top250的电影名称和评分
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
for i in range(0, 250, 25):
req = requests.get(url + '?start=' + str(i), headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
movies = soup.find_all('div', {'class': 'hd'})
for movie in movies:
title = movie.a.span.text.strip()
rating = movie.parent.find('span', {'class': 'rating_num'}).text.strip()
movie_list.append(title + ' ' + rating)
print('\n'.join(movie_list))
```
2. 爬取知乎上的问题和答案
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.zhihu.com/question/37787176'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
question = ''
answers = []
req = requests.get(url, headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
question = soup.find('h1', {'class': 'QuestionHeader-title'}).text.strip()
answer_list = soup.find_all('div', {'class': 'List-item'})
for answer in answer_list:
content = answer.find('div', {'class': 'RichContent-inner'}).text.strip()
answers.append(content)
print('问题:' + question)
print('回答:\n' + '\n\n'.join(answers))
python爬虫实例代码300行
python爬虫的实例代码不一定非要300行,可以根据具体的需求和网站的结构来确定所需代码的行数。下面是一个简单的示例,仅用100行左右的代码:
```python
import requests
from bs4 import BeautifulSoup
# 定义目标URL
url = "http://example.com"
# 发送HTTP请求并获取页面内容
response = requests.get(url)
content = response.content
# 解析页面内容
soup = BeautifulSoup(content, "html.parser")
# 查找并打印出所有的链接
links = soup.find_all("a")
for link in links:
print(link.get("href"))
# 获取页面标题
title = soup.title.string
print("页面标题:" + title)
# 查找特定元素
target_element = soup.find("div", class_="target-class")
print(target_element.get_text())
# 提取特定数据
data = []
rows = soup.find_all("tr")
for row in rows:
cols = row.find_all("td")
row_data = [col.get_text() for col in cols]
data.append(row_data)
# 保存数据到文件
with open("data.csv", "w") as file:
for row_data in data:
file.write(",".join(row_data) + "\n")
```
实际上,爬虫的代码行数可以根据需求进行扩展,比如添加更多的网页解析逻辑、数据处理和存储等功能。这只是一个简单的例子,如果需要更复杂的爬虫功能,可能需要更多的代码行数。