12306火车购票系统python爬虫
时间: 2024-06-17 17:01:41 浏览: 28
12306是中国铁路客户服务中心的官方网站,提供火车票预订服务。由于12306采取了严格的反爬虫措施,使用Python编写爬虫抓取其数据可能会遇到实时验证码、IP限制等问题。此外,根据网站的服务条款,直接爬取数据可能违反服务协议。
对于12306这样的大型网站,官方通常不鼓励非官方API的使用,而是推荐使用官方的火车余票查询API(如通过他们的开放平台)或第三方授权接口(如一些合规的数据服务商)。如果你需要获取火车票信息,可以通过这些官方或受信任的渠道获取数据。
如果你仍然决定要学习如何用Python编写一个模拟浏览器行为、处理验证码的12306爬虫,这通常涉及到的技术包括:
- **requests和BeautifulSoup**:用于发送HTTP请求并解析HTML页面结构。
- **Selenium**:可以模拟用户操作,自动填充验证码。
- **OCR识别**(如Tesseract):用于处理验证码图片。
- **动态网页解析**:由于12306可能使用JavaScript生成部分内容,可能需要结合如**Scrapy-Splash**或**Puppeteer**等工具处理。
然而,这是一项复杂的任务,需要对网络爬虫技术有深入理解,并且可能会涉及法律和道德问题。因此,除非你是为了学习和研究目的,并确保遵守所有相关法律法规,否则实际应用时应谨慎考虑。
相关问题
代码 python爬虫 爬取12306订票
好的,下面是一个简单的Python爬虫程序,可以用来爬取12306的订票信息:
```python
import requests
import json
# 设置请求的URL和请求头信息
url = "https://kyfw.12306.cn/otn/leftTicket/queryZ"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'Referer': 'https://kyfw.12306.cn/otn/leftTicket/init'
}
# 设置请求参数
from_station = input("请输入出发地:")
to_station = input("请输入目的地:")
train_date = input("请输入日期(格式:yyyy-mm-dd):")
params = {
'leftTicketDTO.train_date': train_date,
'leftTicketDTO.from_station': from_station,
'leftTicketDTO.to_station': to_station,
'purpose_codes': 'ADULT'
}
# 发送请求并解析返回结果
response = requests.get(url, params=params, headers=headers)
result = json.loads(response.content.decode('utf-8'))
# 输出车次信息
for item in result['data']['result']:
train_info = item.split('|')
print("车次:%s,出发时间:%s,到达时间:%s,历时:%s,一等座:%s,二等座:%s" % (train_info[3], train_info[8], train_info[9], train_info[10], train_info[28], train_info[29]))
```
这段代码会首先向12306发送一个GET请求,获取指定日期、出发地、目的地的车次信息,然后解析返回的JSON格式数据,并输出车次信息,包括车次、出发时间、到达时间、历时和一等座、二等座的余票信息。
python12306下单步骤_python编程实现12306的一个小爬虫实例
下单步骤:
1. 登录账号,获取cookie和验证码
2. 输入出发地、目的地、乘车日期等信息,获取车次信息
3. 选择需要购买的车次和座位类型,获取乘客信息
4. 提交订单,获取订单信息
5. 确认订单,完成购票流程
以下是一个简单的Python爬虫实现12306购票的例子:
```python
import requests
import json
from time import sleep
# 登录url
login_url = 'https://kyfw.12306.cn/passport/web/login'
# 用户名和密码
username = 'your_username'
password = 'your_password'
# 出发地、目的地、日期等信息
from_station = '北京'
to_station = '上海'
train_date = '2019-07-01'
# 车次类型和座位类型
train_type = 'G'
seat_type = '二等座'
# 乘客姓名和身份证号码
passenger_name = '张三'
passenger_id = '123456789012345678'
# 提交订单url
submit_order_url = 'https://kyfw.12306.cn/otn/leftTicket/submitOrderRequest'
# 检查订单url
check_order_url = 'https://kyfw.12306.cn/otn/confirmPassenger/checkOrderInfo'
# 确认订单url
confirm_order_url = 'https://kyfw.12306.cn/otn/confirmPassenger/confirmSingleForQueue'
# 获取验证码url
captcha_url = 'https://kyfw.12306.cn/passport/captcha/captcha-image64'
# 登录请求头
login_headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '44',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'kyfw.12306.cn',
'Origin': 'https://kyfw.12306.cn',
'Referer': 'https://kyfw.12306.cn/otn/resources/login.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
# 下单请求头
order_headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '220',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'kyfw.12306.cn',
'Origin': 'https://kyfw.12306.cn',
'Referer': 'https://kyfw.12306.cn/otn/leftTicket/init',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
# 获取验证码请求头
captcha_headers = {
'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Host': 'kyfw.12306.cn',
'Referer': 'https://kyfw.12306.cn/otn/resources/login.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
}
# 登录请求参数
login_data = {
'username': username,
'password': password,
'appid': 'otn'
}
# 下单请求参数
order_data = {
'secretStr': '',
'train_date': train_date,
'back_train_date': train_date,
'tour_flag': 'dc',
'purpose_codes': 'ADULT',
'query_from_station_name': from_station,
'query_to_station_name': to_station,
'undefined': ''
}
def login():
# 获取验证码
captcha_response = session.get(captcha_url, headers=captcha_headers)
captcha_json = json.loads(captcha_response.text)
captcha_image_base64 = captcha_json['image']
with open('captcha.jpg', 'wb') as f:
f.write(base64.b64decode(captcha_image_base64))
captcha_code = input('请输入验证码: ')
# 登录
login_data['answer'] = captcha_code
response = session.post(login_url, data=login_data, headers=login_headers)
result = json.loads(response.text)
if result['result_code'] == 0:
print('登录成功')
return True
else:
print('登录失败')
return False
def submit_order():
# 提交订单
order_data['secretStr'] = secretStr
response = session.post(submit_order_url, data=order_data, headers=order_headers)
result = json.loads(response.text)
if result['status'] == True:
print('提交订单成功')
return True
else:
print('提交订单失败')
return False
def check_order():
# 检查订单
passengerTicketStr = 'O,0,1,' + passenger_name + ',1,' + passenger_id + ',,N,' + seat_type + ',,'
oldPassengerStr = passenger_name + ',1,' + passenger_id + ',1_'
order_data['passengerTicketStr'] = passengerTicketStr
order_data['oldPassengerStr'] = oldPassengerStr
order_data['REPEAT_SUBMIT_TOKEN'] = repeat_submit_token
response = session.post(check_order_url, data=order_data, headers=order_headers)
result = json.loads(response.text)
if result['data']['submitStatus'] == True:
print('检查订单成功')
return True
else:
print('检查订单失败')
return False
def confirm_order():
# 确认订单
order_data['passengerTicketStr'] = passengerTicketStr
order_data['oldPassengerStr'] = oldPassengerStr
order_data['REPEAT_SUBMIT_TOKEN'] = repeat_submit_token
order_data['key_check_isChange'] = key_check_isChange
order_data['leftTicketStr'] = leftTicketStr
response = session.post(confirm_order_url, data=order_data, headers=order_headers)
result = json.loads(response.text)
if result['data']['submitStatus'] == True:
print('确认订单成功')
return True
else:
print('确认订单失败')
return False
if __name__ == '__main__':
session = requests.session()
# 登录
while not login():
pass
# 查询车票
query_url = 'https://kyfw.12306.cn/otn/leftTicket/queryZ'
params = {
'leftTicketDTO.train_date': train_date,
'leftTicketDTO.from_station': from_station,
'leftTicketDTO.to_station': to_station,
'purpose_codes': 'ADULT'
}
response = session.get(query_url, params=params)
result = json.loads(response.text)
for data in result['data']:
if data['queryLeftNewDTO']['station_train_code'].startswith(train_type):
print(data['queryLeftNewDTO']['station_train_code'], data['queryLeftNewDTO'][seat_type + '_num'])
if data['queryLeftNewDTO'][seat_type + '_num'] != '无' and data['queryLeftNewDTO'][seat_type + '_num'] != '--':
secretStr = data['secretStr']
leftTicketStr = data['queryLeftNewDTO']['ypInfoDetail']
start_train_date = data['queryLeftNewDTO']['start_train_date']
train_no = data['queryLeftNewDTO']['train_no']
train_location = data['queryLeftNewDTO']['location_code']
break
# 获取乘客信息
passenger_url = 'https://kyfw.12306.cn/otn/confirmPassenger/getPassengerDTOs'
data = {
'_json_att': '',
'REPEAT_SUBMIT_TOKEN': ''
}
response = session.post(passenger_url, data=data, headers=order_headers)
result = json.loads(response.text)
for passenger in result['data']['normal_passengers']:
if passenger['passenger_name'] == passenger_name and passenger['passenger_id_no'] == passenger_id:
passengerTicketStr = 'O,0,1,' + passenger_name + ',1,' + passenger_id + ',,N,' + seat_type + ',,'
oldPassengerStr = passenger_name + ',1,' + passenger_id + ',1_'
# 获取REPEAT_SUBMIT_TOKEN和key_check_isChange
init_dc_url = 'https://kyfw.12306.cn/otn/confirmPassenger/initDc'
data = {
'_json_att': ''
}
response = session.post(init_dc_url, data=data, headers=order_headers)
repeat_submit_token = re.findall(r"var globalRepeatSubmitToken = '(.*?)';", response.text)[0]
key_check_isChange = re.findall(r"key_check_isChange':'(.*?)',", response.text)[0]
# 下单
if submit_order():
# 延时5秒
sleep(5)
# 检查订单
if check_order():
# 确认订单
if confirm_order():
print('购票成功')
```
需要注意的是,12306的接口随时可能会变化,代码中的某些参数可能需要修改才能正常运行。此外,代码中的验证码是手动输入的,如果需要自动识别验证码,可以使用一些第三方验证码识别库。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)