首页python练习Request库的使用、信息标记、信息提取、Beautiful Soup库的使用、正则表达式

python练习Request库的使用、信息标记、信息提取、Beautiful Soup库的使用、正则表达式

时间: 2023-12-04 14:02:06 浏览: 112

好的，下面我为你介绍一些练习 Request 库、信息标记、信息提取、Beautiful Soup 库和正则表达式的例子。 1. 使用 Request 库发送 HTTP 请求 ``` import requests response = requests.get('http://www.example.com') print(response.status_code) print(response.text) ``` 2. 信息标记和提取假设我们有以下 HTML 代码： ``` <html> <head> <title>Example Website</title> </head> <body> <h1>Welcome to my website</h1> <div class="content"> <p>This is the first paragraph.</p> <p>This is the second paragraph.</p> </div> </body> </html> ``` 我们可以使用 Beautiful Soup 库来解析 HTML 代码并提取信息： ``` from bs4 import BeautifulSoup # 假设 html 是上面的 HTML 代码 soup = BeautifulSoup(html, 'html.parser') # 提取标题 title = soup.title.string # 提取正文 content_div = soup.find('div', {'class': 'content'}) paragraphs = content_div.find_all('p') for p in paragraphs: print(p.text) ``` 3. 使用正则表达式提取信息假设我们有以下字符串： ``` text = 'My phone number is 555-1234. Call me at 9:00 am.' ``` 我们可以使用正则表达式来提取电话号码： ``` import re phone_pattern = r'\d{3}-\d{4}' phone_match = re.search(phone_pattern, text) if phone_match: phone_number = phone_match.group(0) print(phone_number) ``` 这个例子中的正则表达式模式 `\d{3}-\d{4}` 可以匹配以三个数字、一个短横线、四个数字结尾的字符串。`re.search()` 函数在给定的字符串中查找第一个匹配模式的子串，如果找到了就返回一个 `Match` 对象，否则返回 `None`。我们可以使用 `group(0)` 方法来提取匹配到的子串。

阅读全文