.xpath('.//div[@class="num"]/text()')

This XPath expression selects the text content of the `div` element that has a `class` attribute with the value "num". The `.` before the double slashes means to start the search from the current node (which is assumed to be the root node if not specified). So, if there is an HTML structure like this: ```html <div class="num">42</div> ``` Then the expression `.//div[@class="num"]/text()` would select the text "42".

with open('./第{}页.txt'.format(i), 'w', encoding='utf-8') as fp: for num in dish_urls: new_url = f'https://www.haochu.com{num}' # 把菜品具体网址作为新url,获取每道菜所需的食材 print(new_url) response2 = requests.get(url=new_url, headers=headers) html_data2 = etree.HTML(response2.text) # 新网址的html——data2 ints_names = html_data2.xpath('//div[@class="ints-cover"]/img/@alt') # 菜品所需食材 print(ints_names) for dish_name, dish_heat, dish_url in zip(dish_names, dish_heats, dish_urls): # 获取每道菜品的具体网址所需的食材信息 dish_url = f'https://www.haochu.com{dish_url}' response3 = requests.get(url=dish_url, headers=headers) html_data3 = etree.HTML(response3.text) dish_ints_names = html_data3.xpath('//div[@class="ints-cover"]/img/@alt') # 将菜品名、热量、所需食材、具体网址等信息写入到文本文件中 dish = f'{dish_name},{dish_heat},{dish_ints_names},https://www.haochu.com{dish_url}\n' fp.write(dish)，我想将 dish_imgs = html_data.xpath('//div[@class="quan_r"]/ul/li/a/img/@src')#菜品图片也加进去

可以将获取的菜品图片链接加入到 `dish` 变量中，最后写入文本文件。修改后的代码如下： ``` with open('./第{}页.txt'.format(i), 'w', encoding='utf-8') as fp: for num in dish_urls: new_url = f'https://www.haochu.com{num}' # 把菜品具体网址作为新url,获取每道菜所需的食材 print(new_url) response2 = requests.get(url=new_url, headers=headers) html_data2 = etree.HTML(response2.text) # 新网址的html——data2 ints_names = html_data2.xpath('//div[@class="ints-cover"]/img/@alt') # 菜品所需食材 dish_imgs = html_data2.xpath('//div[@class="cover-img"]/img/@src') # 菜品图片 print(ints_names) for dish_name, dish_heat, dish_url, dish_img in zip(dish_names, dish_heats, dish_urls, dish_imgs): # 获取每道菜品的具体网址所需的食材信息 dish_url = f'https://www.haochu.com{dish_url}' response3 = requests.get(url=dish_url, headers=headers) html_data3 = etree.HTML(response3.text) dish_ints_names = html_data3.xpath('//div[@class="ints-cover"]/img/@alt') # 将菜品名、热量、所需食材、具体网址、菜品图片等信息写入到文本文件中 dish = f'{dish_name},{dish_heat},{dish_ints_names},https://www.haochu.com{dish_url},{dish_img}\n' fp.write(dish) ```

import requests from lxml import etree url = 'https://jn.lianjia.com/zufang/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' } response = requests.get(url, headers=headers) html = response.text selector = etree.HTML(html) total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage') if len(total_page_list) > 0: total_page = total_page_list[0] print('总页数为:', total_page) else: print('未匹配到结果')报错

这段代码没有报错，但是你忘记了循环爬取每一页的租房信息。你可以将代码改为以下形式： ``` import requests from lxml import etree url = 'https://jn.lianjia.com/zufang/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' } response = requests.get(url, headers=headers) html = response.text selector = etree.HTML(html) total_page_list = selector.xpath('//div[@class="content__pg"]/div/@data-totalpage') if len(total_page_list) > 0: total_page = total_page_list[0] print('总页数为:', total_page) # 循环爬取每一页的租房信息 for num in range(1, int(total_page) + 1): url = 'https://jn.lianjia.com/zufang/pg{}/#contentList'.format(num) # 发送请求并处理响应 # ... else: print('未匹配到结果') ``` 这段代码首先发送一个请求到济南市链家网页，然后使用lxml库中的etree模块将网页转化为可操作的对象，接着使用xpath定位到包含总页数的元素，并进行判断。如果匹配成功，就输出总页数，并循环爬取每一页的租房信息。在循环中，使用.format()方法动态生成每一页的URL，并发送请求并处理响应。如果未匹配到结果，则输出相关信息。

.xpath('.//div[@class="num"]/text()')

相关推荐

xsoup:当jsoup遇到XPath时

Python+selenium实现Web自动化测试的各种元素定位

08.xpath解析1

5. XPath的文本、属性和命名空间操作

使用Python scrapy进行爬取https://movie.douban.com/top250?start=0&filter=并用json文件保存

https://movie.douban.com/top250?start=0&filter=帮我爬取这个网站数据，并做好可视化界面

如何用python和xpath语言实现：从网页“https：//book.douban.com/subject/24531956/”中提取作者、出版社、出版年、页数、定价、装帧、丛书、ISBN的信息，并且以字典的形式体现，保存到一个json文件中

最新推荐

信氧饮吧-奶茶管理系统

京瓷TASKalfa系列维修手册：安全与操作指南

管理建模和仿真的文件

【进阶】入侵检测系统简介

轨道障碍物智能识别系统开发

小波变换在视频压缩中的应用

"互动学习：行动中的多样性与论文攻读经历"

【进阶】Python高级加密库cryptography

linuxjar包启动脚本

Microsoft OfficeXP详解：WordXP、ExcelXP和PowerPointXP