首页爬取newsela网站数据的代码

爬取newsela网站数据的代码

时间: 2023-09-14 09:13:47 浏览: 155

以下是使用Python和BeautifulSoup库爬取Newsela网站文章数据的示例代码： ```python import requests from bs4 import BeautifulSoup # Newsela文章链接 url = 'https://newsela.com/read/ancient-rome-overview/id/24207/' # 发送GET请求获取网页内容 response = requests.get(url) # 解析网页内容 soup = BeautifulSoup(response.content, 'html.parser') # 获取文章标题 title = soup.find('h1', {'class': 'js-headline-text'}).text # 获取文章作者 author = soup.find('a', {'class': 'js-author-link'}).text # 获取文章正文 paragraphs = soup.find_all('p', {'class': 'js-tiptext-text'}) # 将正文段落拼接成完整文章 article = '' for p in paragraphs: article += p.text # 打印文章标题、作者和正文 print('Title:', title) print('Author:', author) print('Article:', article) ``` 请注意，这个示例代码只能获取单个文章的数据。如果你需要批量下载Newsela网站的文章数据，你需要编写更复杂的代码。此外，请确保遵守道德规范和网站的使用条款，不要滥用爬虫技术。

阅读全文