首页python写一个爬虫代码

python写一个爬虫代码

时间: 2023-05-20 17:02:54 浏览: 44

好的，下面是一个简单的爬虫代码的例子： ``` import requests from bs4 import BeautifulSoup def get_html(url): """ 获取网页的html内容 """ try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "产生异常" def get_content(html): """ 解析html,获取网页中的数据 """ soup = BeautifulSoup(html, 'html.parser') # 这里使用了css选择器, 根据自己的需求选择合适的方法解析网页 items = soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a') # 初始化一个列表,用来存储解析后的数据 news = [] for item in items: news.append({'title': item.get_text(), 'link': item.get('href')}) return news def spider(): """ 爬虫的主函数 """ url = "http://www.santostang.com/" html = get_html(url) content = get_content(html) print(content) if __name__ == "__main__": spider() ``` 这个爬虫的目标是爬取这个网站(http://www.santostang.com/)的新闻标题和链接。希望这个例子能帮到你！