.xpath('.//div[@class="num"]/text()')
时间: 2023-12-21 07:06:05 浏览: 29
This XPath expression selects the text content of the `div` element that has a `class` attribute with the value "num". The `.` before the double slashes means to start the search from the current node (which is assumed to be the root node if not specified).
So, if there is an HTML structure like this:
```html
<div class="num">42</div>
```
Then the expression `.//div[@class="num"]/text()` would select the text "42".
相关问题
with open('./第{}页.txt'.format(i), 'w', encoding='utf-8') as fp: for num in dish_urls: new_url = f'https://www.haochu.com{num}' # 把菜品具体网址作为新url,获取每道菜所需的食材 print(new_url) response2 = requests.get(url=new_url, headers=headers) html_data2 = etree.HTML(response2.text) # 新网址的html——data2 ints_names = html_data2.xpath('//div[@class="ints-cover"]/img/@alt') # 菜品所需食材 print(ints_names) for dish_name, dish_heat, dish_url in zip(dish_names, dish_heats, dish_urls): # 获取每道菜品的具体网址所需的食材信息 dish_url = f'https://www.haochu.com{dish_url}' response3 = requests.get(url=dish_url, headers=headers) html_data3 = etree.HTML(response3.text) dish_ints_names = html_data3.xpath('//div[@class="ints-cover"]/img/@alt') # 将菜品名、热量、所需食材、具体网址等信息写入到文本文件中 dish = f'{dish_name},{dish_heat},{dish_ints_names},https://www.haochu.com{dish_url}\n' fp.write(dish),我想将 dish_imgs = html_data.xpath('//div[@class="quan_r"]/ul/li/a/img/@src')#菜品图片也加进去
可以将获取的菜品图片链接加入到 `dish` 变量中,最后写入文本文件。修改后的代码如下:
```
with open('./第{}页.txt'.format(i), 'w', encoding='utf-8') as fp:
for num in dish_urls:
new_url = f'https://www.haochu.com{num}'
# 把菜品具体网址作为新url,获取每道菜所需的食材
print(new_url)
response2 = requests.get(url=new_url, headers=headers)
html_data2 = etree.HTML(response2.text)
# 新网址的html——data2
ints_names = html_data2.xpath('//div[@class="ints-cover"]/img/@alt') # 菜品所需食材
dish_imgs = html_data2.xpath('//div[@class="cover-img"]/img/@src') # 菜品图片
print(ints_names)
for dish_name, dish_heat, dish_url, dish_img in zip(dish_names, dish_heats, dish_urls, dish_imgs):
# 获取每道菜品的具体网址所需的食材信息
dish_url = f'https://www.haochu.com{dish_url}'
response3 = requests.get(url=dish_url, headers=headers)
html_data3 = etree.HTML(response3.text)
dish_ints_names = html_data3.xpath('//div[@class="ints-cover"]/img/@alt')
# 将菜品名、热量、所需食材、具体网址、菜品图片等信息写入到文本文件中
dish = f'{dish_name},{dish_heat},{dish_ints_names},https://www.haochu.com{dish_url},{dish_img}\n'
fp.write(dish)
```
import requests from lxml import etree url = 'https://jn.lianjia.com/zufang/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' } response = requests.get(url, headers=headers) html = response.text selector = etree.HTML(html) total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage') if len(total_page_list) > 0: total_page = total_page_list[0] print('总页数为:', total_page) else: print('未匹配到结果')报错
这段代码没有报错,但是你忘记了循环爬取每一页的租房信息。你可以将代码改为以下形式:
```
import requests
from lxml import etree
url = 'https://jn.lianjia.com/zufang/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = response.text
selector = etree.HTML(html)
total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage')
if len(total_page_list) > 0:
total_page = total_page_list[0]
print('总页数为:', total_page)
# 循环爬取每一页的租房信息
for num in range(1, int(total_page) + 1):
url = 'https://jn.lianjia.com/zufang/pg{}/#contentList'.format(num)
# 发送请求并处理响应
# ...
else:
print('未匹配到结果')
```
这段代码首先发送一个请求到济南市链家网页,然后使用lxml库中的etree模块将网页转化为可操作的对象,接着使用xpath定位到包含总页数的元素,并进行判断。如果匹配成功,就输出总页数,并循环爬取每一页的租房信息。在循环中,使用.format()方法动态生成每一页的URL,并发送请求并处理响应。如果未匹配到结果,则输出相关信息。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![-](https://csdnimg.cn/download_wenku/file_type_column_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)