用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻的代码，并保存在news.txt文件中

时间: 2024-02-26 07:53:44 浏览: 83

西交大-大数据竞赛-商家招牌智能检测Python源码+文档说明+数据

好的，以下是用 Python 爬取交大新闻网主页最新100条新闻并保存在news.txt文件中的代码： ```python import requests from bs4 import BeautifulSoup url = 'http://news.xjtu.edu.cn/zyxw.htm' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') news_list = soup.find('div', class_='l-list') news_items = news_list.find_all('div', class_='item') with open('news.txt', 'w', encoding='utf-8') as f: for news_item in news_items[:100]: news_title = news_item.find('a').text.strip() news_link = news_item.find('a')['href'] news_time = news_item.find('span').text.strip() f.write(news_title + '\n' + news_link + '\n' + news_time + '\n\n') ``` 解释一下代码： 1. 我们首先导入了 `requests` 和 `BeautifulSoup` 库，这两个库分别用于发送 HTTP 请求和解析 HTML 文件。 2. 然后我们设置了请求头部信息，模拟浏览器发送请求。 3. 接下来发送 GET 请求并解析网页内容。 4. 找到新闻列表所在的 HTML 元素，并遍历其中的每一个新闻条目。 5. 将每一条新闻的标题、链接和发布时间写入到 `news.txt` 文件中，每条新闻之间用两个换行符分隔。请确保在运行脚本之前已经安装了 `requests` 和 `beautifulsoup4` 库。您可以使用以下命令进行安装： ``` pip install requests beautifulsoup4 ``` 执行脚本后，将在当前目录下创建一个名为 `news.txt` 的文件，并在其中写入最新的100条新闻的标题、链接和发布时间。

阅读全文

用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻的代码，并保存在news.txt文件中

相关推荐

智能诊断：可变形卷积与注意力机制融合的轴承故障检测

Hadoop云计算：起源、架构与应用概述

写一段爬取http://sef.xjtu.edu.cn/szdw/jszy.htm网站里超链接和对应标题的代码

Failed to register @ServerEndpoint class: class com.xjtu.controller.WebSocketController$$EnhancerBySpringCGLIB$$9c4dc5cf

GoogleChrome_ehall.xjtu.zip

yliu-xjtu/savemultifigs:单击保存多个图形的增强功能-matlab开发

XJTU_Gearbox-20221109T094859Z-001.zip

xjtu-news-analysis:学校通知的收集抓取与智能推荐

本科毕业设计用网上的源码-XJTU-Share:西安交通大学课程资料共享计划

网络安全实验报告XJTU.pdf

西安交通大学, xjtu, 计算机科学与技术, cs, 本科毕业设计样例, latex.zip

西安交通大学,xjtu,计算机科学与技术,cs,本科毕业设计样例,latex.zip

发电系统可靠性程序xjtu.zip

2019Baidu-XJTU_URFC:2019Baidu＆XJTU_URFC初步回合代码

用python中的scripy从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻

用python中的scripy从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻的代码，并保存在news.txt文件中

用python从交大新闻网主页新闻栏目(http://news.xjtu.edu.cn/zyxw.htm)爬取最新的100 条新闻内容的代码，并保存在news.txt文件中

Traceback (most recent call last): File "D:\XJTU\MAT\quxian\read\readpy\rw2247.py", line 11, in <module> while f2.read(1) != '\n': # 循环读取字符，直到遇到换行符 io.UnsupportedOperation: not readable

最新推荐

ipv6资源整理（BT、WEB、Ftp）

Intel Fortran IAPWS-IF97水物性Fortran源程序

STC算法及代码文档说明

白色大气风格的旅游酒店企业网站模板.zip

RStudio中集成Connections包以优化数据库连接管理

管理建模和仿真的文件

Keil uVision5全面精通指南

flink提交给yarn19个全量同步MYsqlCDC的作业，flink的配置参数怎样设置

PHP博客旅游的探索之旅

"互动学习：行动中的多样性与论文攻读经历"