用Python编写提取英文文本中的电子邮件、网址、长度为3的单词、以’ing’结尾的单词、连续的 URL、URL 参数名和值、动词(以 -ed 或 -ing 结尾的单词)完整代码
时间: 2024-09-26 13:07:56 浏览: 16
以下是使用正则表达式在Python中提取特定信息的代码片段:
1. **提取电子邮件**[^1]:
```python
import re
txt = "这是示例文本,包含了电子邮件地址如e-mail@example.com和example@domain.com."
regex_email = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
matches = re.findall(regex_email, txt)
print(matches)
```
2. **提取网址**:
```python
regex_url = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
urls = re.findall(regex_url, txt)
print(urls)
```
3. **长度为3的单词**:
```python
regex_3char_word = r'\b\w{3}\b'
matches = re.findall(regex_3char_word, txt)
print(matches)
```
4. **以'ing'结尾的单词**:
```python
regex_ing = r'\b\w*ing\b'
matches_ing = re.findall(regex_ing, txt)
print(matches_ing)
```
5. **连续的 URL**:
```python
regex_consecutive_urls = r'(https?:\/\/\S+(\/|\s))+'
matches_consecutive_urls = re.findall(regex_consecutive_urls, txt)
print(matches_consecutive_urls)
```
6. **URL 参数名和值**:
```python
regex_url_params = r'(\b[a-zA-Z0-9_]+=[^\s"]+)\b'
params = re.findall(regex_url_params, txt)
print(params)
```
7. **动词(以 -ed 或 -ing 结尾的单词)**:
```python
regex_verb = r'\b(\w+-ed|\w+-ing)\b'
matches_verb = re.findall(regex_verb, txt)
print(matches_verb)
```