首页python使用ajaxh的xhr方式，爬取https://www.ptpress.com.cn/“时政要闻”，并将结果存入列表

python使用ajaxh的xhr方式，爬取https://www.ptpress.com.cn/“时政要闻”，并将结果存入列表

时间: 2024-10-16 12:16:47 浏览: 41

在Python中，你可以使用`requests`库结合`BeautifulSoup`库来模拟AJAX请求并抓取网页内容。以下是基本步骤：首先，确保安装了必要的库： ```bash pip install requests beautifulsoup4 ``` 然后，你可以编写一个函数来获取和解析网页内容： ```python import requests from bs4 import BeautifulSoup def fetch_and_parse(url): # 发送GET AJAX请求 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) # 检查请求是否成功 if response.status_code == 200: # 使用BeautifulSoup解析HTML内容 soup = BeautifulSoup(response.text, 'html.parser') # 找到包含时政要闻的部分，这通常需要对实际页面结构有了解 news_content_container = soup.find('div', class_='news-content') or soup.find('section', id='content') # 提取并存储文章标题或其他相关信息，这里假设每个新闻项是一个li元素 articles = [item.text for item in news_content_container.select('li.news-item')] return articles else: print(f"Failed to fetch the page. Status code: {response.status_code}") return [] # 调用函数并处理结果 articles_list = fetch_and_parse("https://www.ptpress.com.cn/" + "时政要闻") # 注意检查URL路径和实际页面结构 if articles_list: print("提取的文章列表:") for article in articles_list: print(article)

阅读全文