首页用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

时间: 2024-02-26 18:53:30 浏览: 96

好的，可以用python的requests和BeautifulSoup库来实现这个爬虫任务。下面是代码实现： ```python import requests from bs4 import BeautifulSoup # 设置请求头 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 发送请求 url = 'http://news.xjtu.edu.cn/zyxw.htm' response = requests.get(url, headers=headers) # 解析HTML soup = BeautifulSoup(response.content, 'html.parser') news_list = soup.find_all('div', class_='newslist')[0].find_all('li') # 获取最新的100条新闻 news_content = [] for news in news_list[:100]: news_url = news.find('a')['href'] news_title = news.find('a').text.strip() news_time = news.find('span').text.strip() news_response = requests.get(news_url, headers=headers) news_soup = BeautifulSoup(news_response.content, 'html.parser') news_content.append({'title': news_title, 'time': news_time, 'content': news_soup.find('div', class_='wp_articlecontent').text.strip()}) # 保存到文件 with open('news.txt', 'w', encoding='utf-8') as f: for news in news_content: f.write(f"{news['title']}\n{news['time']}\n{news['content']}\n\n") ``` 这个代码会首先发送请求获取新闻网页的HTML，然后用BeautifulSoup库解析HTML，找到新闻列表的每个新闻的标题、发布时间和链接，再分别请求每个新闻的链接来获取新闻的内容。最后将每个新闻的标题、发布时间和内容保存到news.txt文件中。

阅读全文

最新推荐

用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

相关推荐

西交大-大数据竞赛商家招牌检测Python项目源码解析

智能诊断：可变形卷积与注意力机制融合的轴承故障检测

Hadoop云计算：起源、架构与应用概述

写一段爬取http://sef.xjtu.edu.cn/szdw/jszy.htm网站里超链接和对应标题的代码

Failed to register @ServerEndpoint class: class com.xjtu.controller.WebSocketController$$EnhancerBySpringCGLIB$$9c4dc5cf

GoogleChrome_ehall.xjtu.zip

yliu-xjtu/savemultifigs:单击保存多个图形的增强功能-matlab开发

XJTU_Gearbox-20221109T094859Z-001.zip

xjtu-news-analysis:学校通知的收集抓取与智能推荐

本科毕业设计用网上的源码-XJTU-Share:西安交通大学课程资料共享计划

网络安全实验报告XJTU.pdf

西安交通大学, xjtu, 计算机科学与技术, cs, 本科毕业设计样例, latex.zip

西安交通大学,xjtu,计算机科学与技术,cs,本科毕业设计样例,latex.zip

发电系统可靠性程序xjtu.zip

2019Baidu-XJTU_URFC:2019Baidu＆XJTU_URFC初步回合代码

用python中的scripy从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻

用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻的代码，并保存在news.txt文件中

用python中的scripy从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻的代码，并保存在news.txt文件中

Traceback (most recent call last): File "D:\XJTU\MAT\quxian\read\readpy\rw2247.py", line 11, in <module> while f2.read(1) != '\n': # 循环读取字符，直到遇到换行符 io.UnsupportedOperation: not readable

最新推荐

ipv6资源整理（BT、WEB、Ftp）

Intel Fortran IAPWS-IF97水物性Fortran源程序

STC算法及代码文档说明

Windows平台下的Fastboot工具使用指南

管理建模和仿真的文件

DLMS规约深度剖析：从基础到电力通信标准的全面掌握

修改代码，使其正确运行

Python机器学习基础入门与项目实践

"互动学习：行动中的多样性与论文攻读经历"

【Shell脚本进阶】：wc命令行数统计的高级用法及解决方案