java8实现:利用java-diff-utils库,将两个文件夹下的同名大数据量文件进行对比,将是否存在、差异文件的差异数据输出到result文件中。要求该方法可以读取jar包中的文件夹,减少嵌套循环,提高性能
时间: 2024-02-17 09:01:24 浏览: 98
记录一下 Java 代码实现文件夹、文件的对比,主要包含 word、pdf、文本、图片等相关文件的对比计算,以及计算文本的相似率(重复率)、筛选出差异的文件、方便文件和文档的去重,并封装为jar包
好的,我会尽力回答你的问题。首先,我们需要先将两个文件夹下的同名文件进行对比,可以使用java-diff-utils库提供的`DiffRowGenerator`类来生成差异行,然后将差异行输出到结果文件中。以下是示例代码:
```java
import difflib.*;
import java.io.*;
import java.util.*;
public class FolderDiff {
public static void main(String[] args) throws Exception {
// 读取文件夹路径
String folder1Path = args[0];
String folder2Path = args[1];
// 获取同名文件列表
File folder1 = new File(folder1Path);
File folder2 = new File(folder2Path);
File[] files1 = folder1.listFiles();
File[] files2 = folder2.listFiles();
List<String> names1 = Arrays.asList(files1).stream().map(File::getName).collect(Collectors.toList());
List<String> names2 = Arrays.asList(files2).stream().map(File::getName).collect(Collectors.toList());
Set<String> commonNames = new HashSet<>(names1);
commonNames.retainAll(names2);
// 初始化差异生成器
DiffRowGenerator generator = DiffRowGenerator.create()
.showInlineDiffs(true)
.inlineDiffByWord(true)
.oldTag(f -> "")
.newTag(f -> "")
.build();
// 遍历同名文件,生成差异行并输出到结果文件
File resultFile = new File("result.txt");
try (PrintWriter writer = new PrintWriter(new FileWriter(resultFile))) {
for (String name : commonNames) {
File file1 = new File(folder1Path, name);
File file2 = new File(folder2Path, name);
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Patch<String> patch = DiffUtils.diff(lines1, lines2);
writer.println(name);
for (Delta<String> delta : patch.getDeltas()) {
List<DiffRow> diffRows = generator.generateDiffRows(delta.getOriginal(), delta.getRevised());
for (DiffRow row : diffRows) {
if (row.getTag().equals(DiffRow.Tag.DELETE)) {
writer.println("- " + row.getOldLine());
} else if (row.getTag().equals(DiffRow.Tag.INSERT)) {
writer.println("+ " + row.getNewLine());
}
}
}
}
}
}
// 读取文件行
private static List<String> readLines(File file) throws Exception {
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
List<String> lines = new ArrayList<>();
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
return lines;
}
}
}
```
接下来是如何读取jar包中的文件夹。可以使用`ClassLoader`的`getResourceAsStream`方法来获取jar包中的文件流,然后使用`FileUtils`的`copyInputStreamToFile`方法将文件流复制到临时文件中,最后再对临时文件进行处理。以下是示例代码:
```java
import org.apache.commons.io.FileUtils;
import java.io.*;
import java.util.*;
import java.util.stream.Collectors;
public class JarFolderDiff {
public static void main(String[] args) throws Exception {
// 读取jar包中的文件夹路径
String folder1Path = extractFolderFromJar(args[0], args[1]);
String folder2Path = extractFolderFromJar(args[0], args[2]);
// 执行文件夹对比
FolderDiff.main(new String[]{folder1Path, folder2Path});
// 删除临时文件
FileUtils.deleteQuietly(new File(folder1Path));
FileUtils.deleteQuietly(new File(folder2Path));
}
// 从jar包中提取文件夹
private static String extractFolderFromJar(String jarPath, String folderPath) throws Exception {
String tempFolderPath = System.getProperty("java.io.tmpdir") + UUID.randomUUID();
FileUtils.forceMkdir(new File(tempFolderPath));
try (InputStream inputStream = ClassLoader.getSystemClassLoader().getResourceAsStream(folderPath)) {
FileUtils.copyInputStreamToFile(inputStream, new File(tempFolderPath, folderPath));
}
return tempFolderPath + File.separator + folderPath;
}
}
```
这样就可以实现利用java-diff-utils库,将两个文件夹下的同名大数据量文件进行对比,将是否存在、差异文件的差异数据输出到result文件中,并且可以读取jar包中的文件夹,减少嵌套循环,提高性能。
阅读全文