帮我翻译以下代码value = soup.find_all('input',id="pagecount") page.append(re.findall(r'[0-9]+',value.__str__()))
时间: 2024-01-30 19:04:01 浏览: 141
Sure! Here's the translation of the code:
value = soup.find_all('input', id="pagecount")
page.append(re.findall(r'[0-9]+', value.__str__()))
查找所有标签名为'input'且id属性为'pagecount'的元素,并将其结果赋值给变量`value`。然后,使用正则表达式`re.findall()`在`value.__str__()`的字符串表示中匹配连续的数字,并将结果追加到名为`page`的列表中。
相关问题
import requests from bs4 import BeautifulSoup import threading import time headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36' } def download(url): start_time = time.time() # 记录开始时间 response = requests.get(url, headers=headers).text soup = BeautifulSoup(response, features='lxml') src = soup.find_all('img') imagesrc = soup.find_all('img', width="100") for s in imagesrc: with open("{}.jpg".format(s.get('alt')), 'wb') as file: image = requests.get(s.get('src')).content file.write(image) print("正在下载" + s.get('alt') + '.jpg') end_time = time.time() # 记录结束时间 print("线程 {} 运行时间为:{} 秒".format(threading.current_thread().name, end_time - start_time)) threads = [] for x in range(10): url = "https://movie.douban.com/top250?start={}&filter=".format(x * 25) thread = threading.Thread(target=download, args=(url,), name="Thread-{}".format(x+1)) threads.append(thread) thread.start() for thread in threads: thread.join()改为单线程
import requests from bs4 import BeautifulSoup import time headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36' } def download(url): start_time = time.time() # 记录开始时间 response = requests.get(url, headers=headers).text soup = BeautifulSoup(response, features='lxml') src = soup.find_all('img') imagesrc = soup.find_all('img', width="100") for s in imagesrc: with open("{}.jpg".format(s.get('alt')), 'wb') as file: image = requests.get(s.get('src')).content file.write(image) print("正在下载" + s.get('alt') + '.jpg') end_time = time.time() # 记录结束时间 print("运行时间为:{} 秒".format(end_time - start_time)) url_list = [] for x in range(10): url = "https://movie.douban.com/top250?start={}&filter=".format(x * 25) url_list.append(url) for url in url_list: download(url)
data = soup.find('div', {'class': 'report-list report-list__item'}).find_all('div')[1].find_all('table')[0].find_all('tr')[1:-1] AttributeError: 'NoneType' object has no attribute 'find_all'
这个错误提示通常是因为`find()`或`find_all()`方法没有找到指定的标签或属性,返回了`None`对象,而`None`对象并没有`find_all()`方法。
你可以在运行`find()`或`find_all()`方法之前,先判断一下对象是否为`None`,避免这个错误的发生。例如:
```
data_div = soup.find('div', {'class': 'report-list report-list__item'})
if data_div:
data_table = data_div.find_all('div')[1].find_all('table')[0]
if data_table:
data = data_table.find_all('tr')[1:-1]
```
这样就可以避免对`None`对象调用`find_all()`方法导致的错误。
阅读全文