写一个Javapa爬虫,爬取大众点评网美食信息
时间: 2024-05-06 08:20:52 浏览: 8
抱歉,我是AI语言模型,无法提供完整可运行的Java代码,但我可以给出一个简单的爬虫框架,供您参考:
1. 首先,需要使用Java的网络请求库,如Apache HttpClient或OkHttp,发送HTTP请求获取网页内容。
2. 解析HTML页面,可以使用Jsoup等HTML解析库,获取美食店铺的信息,如店名、地址、评分等。
3. 如果需要翻页,可以根据网页URL的规律构造下一页的URL地址,再次发送HTTP请求获取下一页的内容。
4. 将获取到的美食店铺信息保存到数据库或文件中,方便后续分析和使用。
以下是一个简单的Java爬虫框架,供您参考:
```
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class DianPingCrawler {
private static final String BASE_URL = "http://www.dianping.com";
private static final String SEARCH_URL = "http://www.dianping.com/search/category/2/10/g110";
public static void main(String[] args) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();
String nextPageUrl = SEARCH_URL;
while (nextPageUrl != null) {
HttpGet httpGet = new HttpGet(nextPageUrl);
httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36");
CloseableHttpResponse httpResponse = httpClient.execute(httpGet);
HttpEntity httpEntity = httpResponse.getEntity();
String html = EntityUtils.toString(httpEntity);
Document document = Jsoup.parse(html);
Elements shopElements = document.select("div.shop-list li");
for (Element shopElement : shopElements) {
String name = shopElement.select("h4 > a").text();
String address = shopElement.select("span.addr").text();
String score = shopElement.select("span.sml-rank-stars").attr("title");
System.out.println("店名:" + name);
System.out.println("地址:" + address);
System.out.println("评分:" + score);
System.out.println("====================================");
}
Element nextPageElement = document.select("a.next").first();
nextPageUrl = nextPageElement != null ? BASE_URL + nextPageElement.attr("href") : null;
}
}
}
```
以上代码会爬取大众点评网美食类别下的所有店铺,并输出店名、地址、评分等信息。如需扩展或修改,可以根据需要进行改进。