java实现把这个网站上从省份到村委会还有他的编码都放到excel里面 http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2022/index.html
时间: 2023-07-08 21:45:09 浏览: 112
实现这个功能,我们需要先进行网页的解析,然后将数据写入Excel表格中。这里推荐使用Jsoup进行网页解析,使用Apache POI库进行Excel表格操作。
以下是Java实现的代码示例:
```java
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class WebCrawler {
public static void main(String[] args) {
String url = "http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2022/index.html";
String filePath = "output.xlsx";
try {
Document doc = Jsoup.connect(url).get();
Elements provinces = doc.select("tr.provincetr td a");
List<String> provinceNames = new ArrayList<>();
List<String> provinceUrls = new ArrayList<>();
for (Element province : provinces) {
String provinceName = province.text();
String provinceUrl = province.attr("href");
provinceNames.add(provinceName);
provinceUrls.add(provinceUrl);
}
XSSFWorkbook workbook = new XSSFWorkbook();
for (int i = 0; i < provinceNames.size(); i++) {
String provinceName = provinceNames.get(i);
String provinceUrl = provinceUrls.get(i);
Document provinceDoc = Jsoup.connect(url + provinceUrl).get();
Elements cities = provinceDoc.select("tr.citytr");
for (Element city : cities) {
String cityCode = city.child(0).text();
String cityName = city.child(1).text();
String cityUrl = city.child(1).select("a").attr("href");
Document cityDoc = Jsoup.connect(url + cityUrl).get();
Elements counties = cityDoc.select("tr.countytr");
for (Element county : counties) {
String countyCode = county.child(0).text();
String countyName = county.child(1).text();
String countyUrl = county.child(1).select("a").attr("href");
Document countyDoc = Jsoup.connect(url + countyUrl).get();
Elements towns = countyDoc.select("tr.towntr");
for (Element town : towns) {
String townCode = town.child(0).text();
String townName = town.child(1).text();
String townUrl = town.child(1).select("a").attr("href");
Document townDoc = Jsoup.connect(url + townUrl).get();
Elements villages = townDoc.select("tr.villagetr");
for (Element village : villages) {
String villageCode = village.child(0).text();
String villageName = village.child(2).text();
// 写入Excel表格
Row row = workbook.getSheet(provinceName) != null ? workbook.getSheet(provinceName).createRow(workbook.getSheet(provinceName).getLastRowNum() + 1) : workbook.createSheet(provinceName).createRow(0);
Cell cell0 = row.createCell(0);
cell0.setCellValue(cityName);
Cell cell1 = row.createCell(1);
cell1.setCellValue(countyName);
Cell cell2 = row.createCell(2);
cell2.setCellValue(townName);
Cell cell3 = row.createCell(3);
cell3.setCellValue(villageName);
Cell cell4 = row.createCell(4);
cell4.setCellValue(cityCode + countyCode + townCode + villageCode);
}
}
}
}
}
FileOutputStream outputStream = new FileOutputStream(filePath);
workbook.write(outputStream);
workbook.close();
System.out.println("数据已写入Excel表格!");
} catch (IOException e) {
e.printStackTrace();
}
}
}
```
代码执行后,将会在当前目录下生成一个名为output.xlsx的Excel文件,里面包含了从网站上爬取到的省份、城市、区县、乡镇和村委会信息以及对应的编码。
阅读全文