PyCharm爬取新闻代码
时间: 2024-12-19 16:11:52 浏览: 14
PyCharm是一款功能强大的Python集成开发环境(IDE),它可以用于编写爬虫脚本,包括爬取新闻数据。下面是一个简单的Python爬虫代码示例,使用了requests库来发送HTTP请求获取新闻内容,并BeautifulSoup库解析HTML:
```python
import requests
from bs4 import BeautifulSoup
# 目标新闻网站URL
url = "https://example.com/news" # 替换为你想爬取的新闻网站地址
def get_news(url):
try:
response = requests.get(url)
response.raise_for_status() # 检查请求是否成功
soup = BeautifulSoup(response.text, 'html.parser')
# 获取新闻标题和链接区域
news_blocks = soup.find_all('div', class_='news-item') # 类名可能会因网站而异
for block in news_blocks:
title = block.h2.a.text # 标题
link = block.h2.a['href'] # 链接
print(f'Title: {title}\nLink: {link}\n---')
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
get_news(url)
```
阅读全文