用python代码实现分别采用request、selenium、scrapy爬取东方财富网上的热门数据、特色数据、沪深港通数据等。
时间: 2023-07-10 10:03:29 浏览: 251
python+selenium定时爬取丁香园的新型冠状病毒数据并制作出类似的地图(部署到云服务器)
1. 使用 requests 爬取东方财富网热门数据:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://finance.eastmoney.com/'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
hot_news = soup.find_all('div', {'class': 'hotNews'})[0]
hot_news_titles = hot_news.find_all('a')
for title in hot_news_titles:
print(title.text.strip())
```
2. 使用 Selenium 爬取东方财富网热门数据:
```python
from selenium import webdriver
url = 'http://finance.eastmoney.com/'
driver = webdriver.Chrome()
driver.get(url)
hot_news = driver.find_element_by_class_name('hotNews')
hot_news_titles = hot_news.find_elements_by_tag_name('a')
for title in hot_news_titles:
print(title.text.strip())
driver.quit()
```
3. 使用 Scrapy 爬取东方财富网热门数据:
```python
import scrapy
class FinanceSpider(scrapy.Spider):
name = 'finance'
start_urls = [
'http://finance.eastmoney.com/'
]
def parse(self, response):
hot_news = response.css('.hotNews')
hot_news_titles = hot_news.css('a::text').getall()
for title in hot_news_titles:
yield {
'title': title.strip()
}
```
4. 使用 requests 爬取东方财富网特色数据:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://finance.eastmoney.com/special/00095/'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
special_news = soup.find_all('div', {'class': 'listSpecial'})[0]
special_news_titles = special_news.find_all('a')
for title in special_news_titles:
print(title.text.strip())
```
5. 使用 Selenium 爬取东方财富网特色数据:
```python
from selenium import webdriver
url = 'http://finance.eastmoney.com/special/00095/'
driver = webdriver.Chrome()
driver.get(url)
special_news = driver.find_element_by_class_name('listSpecial')
special_news_titles = special_news.find_elements_by_tag_name('a')
for title in special_news_titles:
print(title.text.strip())
driver.quit()
```
6. 使用 Scrapy 爬取东方财富网特色数据:
```python
import scrapy
class FinanceSpecialSpider(scrapy.Spider):
name = 'finance_special'
start_urls = [
'http://finance.eastmoney.com/special/00095/'
]
def parse(self, response):
special_news = response.css('.listSpecial')
special_news_titles = special_news.css('a::text').getall()
for title in special_news_titles:
yield {
'title': title.strip()
}
```
7. 使用 requests 爬取东方财富网沪深港通数据:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://finance.eastmoney.com/hsgt/index.html'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
hs_con = soup.find_all('div', {'class': 'hs_con'})[0]
hs_con_titles = hs_con.find_all('a')
for title in hs_con_titles:
print(title.text.strip())
```
8. 使用 Selenium 爬取东方财富网沪深港通数据:
```python
from selenium import webdriver
url = 'http://finance.eastmoney.com/hsgt/index.html'
driver = webdriver.Chrome()
driver.get(url)
hs_con = driver.find_element_by_class_name('hs_con')
hs_con_titles = hs_con.find_elements_by_tag_name('a')
for title in hs_con_titles:
print(title.text.strip())
driver.quit()
```
9. 使用 Scrapy 爬取东方财富网沪深港通数据:
```python
import scrapy
class FinanceHSGTSpider(scrapy.Spider):
name = 'finance_hsgt'
start_urls = [
'http://finance.eastmoney.com/hsgt/index.html'
]
def parse(self, response):
hs_con = response.css('.hs_con')
hs_con_titles = hs_con.css('a::text').getall()
for title in hs_con_titles:
yield {
'title': title.strip()
}
```
阅读全文