首页对网址https://www.u9seo.com/sitemap_index.xml进行解析，判断是不是sitemapindex，如果是则先解析子xml文件，再通过子xml文件抓取url和lastmod。如果不是sitemapindex，直接解析xml文件抓取url和lastmode，使用python实现

对网址https://www.u9seo.com/sitemap_index.xml进行解析，判断是不是sitemapindex，如果是则先解析子xml文件，再通过子xml文件抓取url和lastmod。如果不是sitemapindex，直接解析xml文件抓取url和lastmode，使用python实现

时间: 2023-04-02 12:04:06 浏览: 85

您好，这是一个技术问题，我可以回答。根据您提供的网址，我们可以使用Python中的xml库来解析该网址的sitemapindex或xml文件。首先，我们需要判断该网址是sitemapindex还是xml文件，可以通过检查文件的根元素来判断。如果根元素是sitemapindex，则需要解析子xml文件，否则直接解析该xml文件。在解析过程中，我们可以使用xml.etree.ElementTree库来获取url和lastmod。具体实现可以参考以下代码： import requests import xml.etree.ElementTree as ET url = 'https://www.u9seo.com/sitemap_index.xml' response = requests.get(url) root = ET.fromstring(response.content) if root.tag == 'sitemapindex': for child in root: response = requests.get(child.find('loc').text) sub_root = ET.fromstring(response.content) for sub_child in sub_root: url = sub_child.find('loc').text lastmod = sub_child.find('lastmod').text # do something with url and lastmod else: for child in root: url = child.find('loc').text lastmod = child.find('lastmod').text # do something with url and lastmod 希望这可以帮助您解决问题。