首页帮我写一份爬虫爬取机票的网站要求多线程

帮我写一份爬虫爬取机票的网站要求多线程

时间: 2023-07-21 16:46:07 浏览: 98

好的，以下是一个简单的 Python 多线程爬虫爬取机票网站的示例代码。这里以爬取携程网的机票信息为例。 ```python import requests from lxml import etree from queue import Queue from threading import Thread # 定义线程数 thread_count = 5 # 定义要爬取的网址 urls = [ "https://flights.ctrip.com/schedule/sha.bjs.html", "https://flights.ctrip.com/schedule/bjs.sha.html", "https://flights.ctrip.com/schedule/sha.ctu.html", "https://flights.ctrip.com/schedule/ctu.sha.html", "https://flights.ctrip.com/schedule/sha.szx.html", "https://flights.ctrip.com/schedule/szx.sha.html" ] # 创建队列，用于存储需要爬取的网址 queue = Queue() for url in urls: queue.put(url) # 定义线程的执行函数 def crawl(queue): while not queue.empty(): url = queue.get() response = requests.get(url) html = response.content tree = etree.HTML(html) # 在这里对 html 进行解析，并将结果存储到数据库或文件中 print(f"Crawled {url}") # 创建线程并启动 threads = [] for i in range(thread_count): thread = Thread(target=crawl, args=(queue,)) thread.start() threads.append(thread) # 等待所有线程结束 for thread in threads: thread.join() ``` 上述代码中，我们首先定义了要爬取的网址，然后创建了一个队列来存储这些网址。接着，我们定义了一个执行函数 `crawl`，用于在多个线程中执行爬取任务。在执行函数中，我们通过 `queue.get()` 方法从队列中获取一个网址进行爬取，并将解析结果存储到数据库或文件中。最后，我们创建了多个线程并启动它们，等待所有线程结束后程序退出。希望这个示例对您有所帮助！

阅读全文