import requests import re ur1='https://bj.lianjia.com/zufang/' header={'User-Agent':'Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36 (KHTML,like Gecko) Chrome/74.0.3729.169 Safari/537.36'} response = requests.get(url,headers=header) html=response.text 哪里有错误并修改
时间: 2024-04-26 07:21:47 浏览: 193
代码中的第一行和第二行之间缺少空格,应该在第二行前加一个空格。
此外,在第三行代码中,将`ur1`改为`url`,因为变量名写错了。
修改后的代码如下:
```python
import requests
import re
url = 'https://bj.lianjia.com/zufang/'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
response = requests.get(url, headers=header)
html = response.text
```
相关问题
import requests import re url = 'https://jn.lianjia.com/zufang/pg1/#contentList' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299' } # 获取总页数 response = requests.get(url, headers=headers) html = response.text total_page = re.search(r'data-totalpage="(\d+)"', html).group(1) print(total_page)将total_page变成int型
你可以使用 `int()` 函数将字符串类型的 `total_page` 转换成整数类型。
以下是修改后的示例代码:
```python
import requests
import re
url = 'https://jn.lianjia.com/zufang/pg1/#contentList'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299'
}
# 获取总页数
response = requests.get(url, headers=headers)
html = response.text
total_page = int(re.search(r'data-totalpage="(\d+)"', html).group(1))
print(total_page)
```
这个示例代码可以获取济南市链家网租房信息的总页数,并将其转换成整数类型。你可以根据自己的需要修改代码,提取其他信息。
import requests from lxml import etree url = 'https://jn.lianjia.com/zufang/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' } response = requests.get(url, headers=headers) html = response.text selector = etree.HTML(html) total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage') if len(total_page_list) > 0: total_page = total_page_list[0] print('总页数为:', total_page) else: print('未匹配到结果')报错
这段代码没有报错,但是你忘记了循环爬取每一页的租房信息。你可以将代码改为以下形式:
```
import requests
from lxml import etree
url = 'https://jn.lianjia.com/zufang/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = response.text
selector = etree.HTML(html)
total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage')
if len(total_page_list) > 0:
total_page = total_page_list[0]
print('总页数为:', total_page)
# 循环爬取每一页的租房信息
for num in range(1, int(total_page) + 1):
url = 'https://jn.lianjia.com/zufang/pg{}/#contentList'.format(num)
# 发送请求并处理响应
# ...
else:
print('未匹配到结果')
```
这段代码首先发送一个请求到济南市链家网页,然后使用lxml库中的etree模块将网页转化为可操作的对象,接着使用xpath定位到包含总页数的元素,并进行判断。如果匹配成功,就输出总页数,并循环爬取每一页的租房信息。在循环中,使用.format()方法动态生成每一页的URL,并发送请求并处理响应。如果未匹配到结果,则输出相关信息。
阅读全文