python爬虫:如何从源码中提取这一部分网址:<script charset="utf-8" src="https://g.alicdn.com/mtb/lib-flexible/0.3.2/flexible.js" crossorigin></script>
时间: 2023-07-02 21:06:19 浏览: 211
你可以使用正则表达式或者BeautifulSoup库来提取这部分网址。
使用正则表达式的话,可以这样写:
```python
import re
html = '这里是网页源码<script charset="utf-8" src="https://g.alicdn.com/mtb/lib-flexible/0.3.2/flexible.js" crossorigin></script>这里是其他内容'
pattern = r'<script charset="utf-8" src="(.*?)" crossorigin></script>'
result = re.findall(pattern, html)
print(result)
```
使用BeautifulSoup库的话,可以这样写:
```python
from bs4 import BeautifulSoup
html = '这里是网页源码<script charset="utf-8" src="https://g.alicdn.com/mtb/lib-flexible/0.3.2/flexible.js" crossorigin></script>这里是其他内容'
soup = BeautifulSoup(html, 'html.parser')
script = soup.find('script', {'src': True, 'charset': 'utf-8'})
src = script['src']
print(src)
```
以上两种方法都可以提取出`https://g.alicdn.com/mtb/lib-flexible/0.3.2/flexible.js`这部分网址。
阅读全文