首页用正则表达式<dd>.?board-index.?>(.?).?data-src="(.?)".?name.?a.?>(.?)</a>.?star.?>(.?).?releasetime.?>(.?).?integer.?>(.?).?fraction.?>(.?).?</dd>爬取猫眼电影TOP100的所有信息。网址：https://www.maoyan.com/board/4

用正则表达式<dd>.?board-index.?>(.?).?data-src="(.?)".?name.?a.?>(.?)</a>.?star.?>(.?).?releasetime.?>(.?).?integer.?>(.?).?fraction.?>(.?).?</dd>爬取猫眼电影TOP100的所有信息。网址：https://www.maoyan.com/board/4

时间: 2023-07-17 12:09:00 浏览: 130

可以使用以下正则表达式来爬取猫眼电影TOP100的所有信息： ```python import re import requests # 请求网页 url = "https://www.maoyan.com/board/4" response = requests.get(url) html = response.text # 提取信息 pattern = r'<dd>.*?board-index.*?>(\d+).*?data-src="(.*?)".*?name.*?a.*?>(.*?)</a>.*?star.*?>(.*?).*?releasetime.*?>(.*?).*?integer.*?>(.*?).*?fraction.*?>(.*?).*?</dd>' matches = re.findall(pattern, html, re.S) # 打印结果 for match in matches: index = match[0] image = match[1] title = match[2] actor = match[3].strip()[3:] release_time = match[4].strip()[5:] score = match[5] + match[6] print("排名：", index) print("海报链接：", image) print("电影名称：", title) print("主演：", actor) print("上映时间：", release_time) print("评分：", score) print("--------------------") ``` 注意：使用正则表达式爬取网页内容时，可能会因为网页结构变动而导致匹配失败，请根据实际情况调整正则表达式。另外，建议使用更加稳定和方便的库（如BeautifulSoup、Scrapy等）进行网页爬取。

阅读全文