首页python爬虫数据采集实例

python爬虫数据采集实例

时间: 2024-05-15 22:11:01 浏览: 12

Python 爬虫是一种通过编写脚本自动化网站数据采集的技术。以下是一个简单的 Python 爬虫数据采集实例：假设我们要采集某个网站的新闻标题和链接，我们可以使用 Python 的 requests 库和 BeautifulSoup 库来实现。首先，我们需要通过 requests 库向该网站发送请求，获取网页的 HTML 内容。然后，使用 BeautifulSoup 库解析 HTML 内容，提取出我们需要的数据。以下是代码示例： ```python import requests from bs4 import BeautifulSoup url = 'https://www.example.com/news' # 要采集的网站 URL response = requests.get(url) # 发送请求，获取 HTML 内容 soup = BeautifulSoup(response.text, 'html.parser') # 解析 HTML 内容 news_list = [] # 存储新闻标题和链接的列表 for news in soup.find_all('div', class_='news-item'): # 找到所有新闻项 title = news.find('a').text # 提取新闻标题 link = news.find('a')['href'] # 提取新闻链接 news_list.append({'title': title, 'link': link}) # 将新闻标题和链接存入列表中 for news in news_list: # 遍历新闻列表，输出新闻标题和链接 print(news['title'], news['link']) ```