def get_content(html_url): response = get_response(html_url=html_url) selector = parsel.Selector(response.text) title = selector.css('.grap--h2 ::text').get() content_list = selector.css('.grap ::text').getall() content = ''.join(content_list) return title, content def main(): url = f'https://hongloumeng.5000yan.com/' name, link_list = get_novel_info(novel_url=url) for link in link_list: title, content = get_content(html_url=link) save(name, title, content) word=[] wordlist = [] #这里是定义函数加进列表里,想让entry的内容作为变量参与函数的运行 def add_to_list(): text = entry2.get() words.append(text) wordlist = [] for word in sWords: if word in words: wordlist.append(word) def function(): f=open('D:\Python文档保存\红楼梦.txt','r',encoding='utf-8') text = f.read() plt.rcParams['font.sans-serif'] = 'SimHei' sWords = jieba.lcut(text) #wordlist = [] #words = ['刘姥姥', '贾', '王夫人'] #for word in sWords: # if word in words: # wordlist.append(word) word_counts = collections.Counter(wordlist) print(wordlist) print(word_counts) #按钮3 button3=tk.Button(root,text='词频统计图',bg="#FFB6C1",command=func()) button3.place(x=320,y=165) 把前面的代码绑定在这个按钮上,改一下代码
时间: 2023-06-05 13:09:16 浏览: 422
这是一段 Python 代码,包括两个函数和两个列表。
get_content(html_url) 函数的作用是从给定的 html_url 中获取网页内容(包括标题和正文),并返回标题和正文的字符串形式。
main() 函数利用 get_novel_info() 函数获取小说信息(小说名和链接列表),然后遍历链接列表,调用 get_content() 函数获取每个链接对应的内容,并将小说名、章节名、正文内容一起保存。
word 和 wordlist 两个变量在这段代码中没有用到,可能是之后代码需要用到的变量。
相关问题
selector = parsel.Selector(html_data) lis = selector.css('.list_item h2 a::attr(href)').getall() details = [] for li in lis: detail_url = 'https://travel.qunar.com' + li detail_html = requests.get(detail_url).text details.append(detail_html) # 输出每个网站的 html 数据 for detail_html in details: print(detail_html) detail_html = requests.get(detail_url).text #解析数据 detail_selector = parsel.Selector(detail_html) title = detail_selector.css('.b_crumb_cont *:nth-child(3)::text').get() comment = detail_selector.css('.title white::text').get() view_count = detail_selector.css('.view_count::text').get() date = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.when > p > span.data::text').get() days = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.howlong > p > span.data::text').get() money = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.howmuch > p > span.data::text').get() character = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.who > p > span.data::text').get() play_list = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.how > p > span.data > span::text').getall() for play in play_list: print(play) print(title, comment, view_count, date, days, money, character, play_list, detail_url) time.sleep(random.randint(3,5))这段代码只能输出最后一个网站的信息 该怎么改
这段代码只能输出最后一个网站的信息是因为在循环中每次请求新的网站信息时都会覆盖前面的信息,需要将每次请求的信息存储到一个列表中,最后再一起处理。
可以尝试如下修改:
```python
selector = parsel.Selector(html_data)
lis = selector.css('.list_item h2 a::attr(href)').getall()
details = []
for li in lis:
detail_url = 'https://travel.qunar.com' + li
detail_html = requests.get(detail_url).text
details.append(detail_html)
time.sleep(random.randint(3, 5)) # 每次请求后等待一段时间,避免请求频繁被封
for detail_html in details:
# 解析数据
detail_selector = parsel.Selector(detail_html)
title = detail_selector.css('.b_crumb_cont *:nth-child(3)::text').get()
comment = detail_selector.css('.title.white::text').get()
view_count = detail_selector.css('.view_count::text').get()
date = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.when > p > span.data::text').get()
days = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.howlong > p > span.data::text').get()
money = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.howmuch > p > span.data::text').get()
character = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.who > p > span.data::text').get()
play_list = detail_selector.css('#js_mainleft > div.b_foreword > ul > li.f_item.how > p > span.data > span::text').getall()
for play in play_list:
print(play)
print(title, comment, view_count, date, days, money, character, play_list, detail_url)
```
这样每次请求得到的网站信息都会被存储到 `details` 列表中,最后再遍历处理每个网站的信息。同时,为了避免请求频繁被封,可以在每次请求后等待一段时间。
def parse_book_info(self, url): response = self.get_response(url=url) selector = Selector(text=response.text) page = selector.xpath("//div[@class='paginator']/a[last()]/text()").get()
这段代码定义了一个名为 `parse_book_info` 的方法,用于解析图书信息。它接受一个参数 `url`,表示要解析的页面的URL。
首先,它调用了之前定义的 `get_response` 方法来发送 HTTP 请求并获取响应。然后,使用 `parsel` 库解析响应的 HTML 页面,创建一个 `Selector` 对象。
接下来,使用 XPath 表达式选取了页面中的分页控件,并提取了最后一个页码(即最大页码)。这个值将被用于后续的循环操作。
这段代码的作用是获取指定页面的最大页码,并将其存储在变量 `page` 中。您还有其他问题吗?
阅读全文
相关推荐
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)