python爬取百度搜索的标题和真实url
时间: 2023-09-10 21:02:14 浏览: 88
使用Python爬取百度搜索的标题和真实URL可以通过以下步骤来实现:
1. 导入需要的库,包括requests、bs4和urllib.parse:
```python
import requests
from bs4 import BeautifulSoup
import urllib.parse
```
2. 构造搜索URL,通过urllib.parse库的quote函数对搜索关键字进行URL编码:
```python
keyword = "Python爬虫"
encoded_keyword = urllib.parse.quote(keyword)
url = f"https://www.baidu.com/s?wd={encoded_keyword}"
```
3. 发送HTTP请求并获取网页内容:
```python
response = requests.get(url)
content = response.text
```
4. 使用BeautifulSoup解析网页内容,并提取标题和URL:
```python
soup = BeautifulSoup(content, "html.parser")
results = soup.find_all("div", class_="result")
for result in results:
title = result.h3.text
url = result.a["href"]
print(f"标题:{title}\nURL:{url}\n")
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import urllib.parse
keyword = "Python爬虫"
encoded_keyword = urllib.parse.quote(keyword)
url = f"https://www.baidu.com/s?wd={encoded_keyword}"
response = requests.get(url)
content = response.text
soup = BeautifulSoup(content, "html.parser")
results = soup.find_all("div", class_="result")
for result in results:
title = result.h3.text
url = result.a["href"]
print(f"标题:{title}\nURL:{url}\n")
```
通过以上步骤,我们可以实现使用Python爬取百度搜索的标题和真实URL。