使用C#语言，用 itext 打开PDF文件并读取文本

C语言是一种计算机编程语言，由Dennis Ritchie在1972年开发。它是一种高级语言，可以用于编写系统级软件和应用程序。C语言具有高效、可移植、灵活、强大的特点，因此被广泛使用于操作系统、编译器、数据库、游戏、网络等领域。C语言的编程方法简单，学习曲线较平缓，适合初学者学习编程的基础知识。使用C语言还需要掌握一些常用的库函数，如stdio、stdlib、string等，它们可以方便地实现输入输出、内存管理、字符串处理等功能。

使用C sharp语言，用 itext 打开PDF文件并读取文本

以下是使用C#语言和iText库读取PDF文本的示例代码： ``` using iText.Kernel.Pdf; using iText.Kernel.Pdf.Canvas.Parser; using iText.Kernel.Pdf.Canvas.Parser.Listener; string filePath = "example.pdf"; PdfDocument pdfDoc = new PdfDocument(new PdfReader(filePath)); string text = ""; for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++) { PdfPage page = pdfDoc.GetPage(i); LocationTextExtractionStrategy extractionStrategy = new LocationTextExtractionStrategy(); PdfCanvasProcessor parser = new PdfCanvasProcessor(extractionStrategy); parser.ProcessPageContent(page); text += extractionStrategy.GetResultantText(); } pdfDoc.Close(); Console.WriteLine(text); ``` 首先，我们需要导入iText库。然后，我们使用`PdfDocument`类打开PDF文件，并使用`PdfPage`类和`PdfCanvasProcessor`类读取每一页的文本内容。最后，我们使用`LocationTextExtractionStrategy`类将文本内容提取出来，并将其存储在字符串变量`text`中。最后，我们关闭`PdfDocument`对象并将文本内容输出到控制台。请注意，上述代码仅适用于PDF文件中的文本内容。如果您需要读取PDF文件中的其他内容，例如图像或表单字段，则需要使用不同的iText类和方法。

用Csharp调用itext读取pdf文件，并提取出文本，相邻文本距离超过一个字宽度的用空格补齐。

以下是使用C#调用itext读取pdf文件并提取文本的示例代码： ```csharp using iText.Kernel.Pdf; using iText.Kernel.Pdf.Canvas.Parser; using iText.Kernel.Pdf.Canvas.Parser.Listener; using System.Text; public static string ExtractTextFromPDF(string filepath) { StringBuilder sb = new StringBuilder(); using (PdfReader reader = new PdfReader(filepath)) { using (PdfDocument pdfDoc = new PdfDocument(reader)) { for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++) { PdfPage page = pdfDoc.GetPage(i); ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string text = PdfTextExtractor.GetTextFromPage(page, strategy); sb.Append(text); } } } return sb.ToString(); } ``` 为了实现相邻文本距离超过一个字宽度的用空格补齐的功能，我们可以在获取文本时添加一些额外的处理： ```csharp public static string ExtractTextFromPDF(string filepath) { StringBuilder sb = new StringBuilder(); using (PdfReader reader = new PdfReader(filepath)) { using (PdfDocument pdfDoc = new PdfDocument(reader)) { for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++) { PdfPage page = pdfDoc.GetPage(i); CustomTextExtractionStrategy strategy = new CustomTextExtractionStrategy(); string text = PdfTextExtractor.GetTextFromPage(page, strategy); // 将相邻文本距离超过一个字宽度的用空格补齐 text = strategy.FillInSpaceBetweenWords(text); sb.Append(text); } } } return sb.ToString(); } public class CustomTextExtractionStrategy : LocationTextExtractionStrategy { // 记录当前字符的左上角坐标 private Vector lastStart; public override void BeginTextBlock() { base.BeginTextBlock(); lastStart = null; } public override void RenderText(TextRenderInfo renderInfo) { base.RenderText(renderInfo); Vector start = renderInfo.GetDescentLine().GetStartPoint(); Vector end = renderInfo.GetAscentLine().GetEndPoint(); if (lastStart != null) { // 计算当前字符的左上角坐标和上一个字符的右上角坐标之间的距离 float distance = start.Subtract(lastStart).Length(); // 如果距离大于一个字宽度，则认为中间需要插入一个空格 if (distance > renderInfo.GetSingleSpaceWidth()) { AppendText(" "); } } AppendText(renderInfo.GetText()); lastStart = start; } // 将相邻文本距离超过一个字宽度的用空格补齐 public string FillInSpaceBetweenWords(string text) { StringBuilder sb = new StringBuilder(); char[] chars = text.ToCharArray(); for (int i = 0; i < chars.Length; i++) { sb.Append(chars[i]); if (i < chars.Length - 1) { // 计算当前字符和下一个字符的距离 float distance = GetDistanceBetweenChars(chars[i], chars[i + 1]); // 如果距离大于一个字宽度，则认为中间需要插入一个空格 if (distance > GetSingleSpaceWidth()) { sb.Append(" "); } } } return sb.ToString(); } // 获取两个字符之间的距离 private float GetDistanceBetweenChars(char c1, char c2) { Glyph glyph1 = font.GetGlyph(c1); Glyph glyph2 = font.GetGlyph(c2); return glyph1.GetWidth() + glyph2.GetWidth() - glyph1.GetBoundingBox().GetRight(); } // 获取一个空格的宽度 public float GetSingleSpaceWidth() { Glyph glyph = font.GetGlyph(' '); return glyph.GetWidth(); } } ```

使用C#语言，用 itext 打开PDF文件并读取文本

使用C sharp语言，用 itext 打开PDF文件并读取文本

用Csharp调用itext读取pdf文件，并提取出文本，相邻文本距离超过一个字宽度的用空格补齐。

相关推荐

MyPdf:Android使用iText生成pdf文件并读取pdf内容

Android使用iText生成pdf并读取pdf内容

java使用itext实现pdf文件下载

C# itextpdf 图片不遮挡

C# IText7获取Pdf具体一页的PdfDocument

使用的时NuGet源中的IText7 8.0.0版本,将PDF转成图片并保存

严重性 代码 说明 项目 文件 行 禁止显示状态 错误 CS1503 参数 1: 无法从“iText.Kernel.Pdf.PdfReader”转换为“iText.Kernel.Pdf.PdfDocument” IText7Library E:\Study\Pragram\C#\PDF\PDFStudio\IText7Library\PDFHelp\TableAnalyzer.cs 23 活动

delphi读取pdf内容

xamarin开发 pdf

itextsharp 5.1.3

itextsharp.dll 5.5

itextsharp4.1.6 github

C#生成电子发票PDF文件的样例(使用了iText7)

java使用itext导出PDF文本绝对定位(实现方法)

C#使用iTextSharp封装的PDF文件操作类实例

使用itextpdf将PDF大文件拆分成若干份指定大小文件.zip

c# winform Itext 实现PDF导出简单demo

最新推荐

java使用itext导出PDF文本绝对定位(实现方法)

C#实现合并及拆分PDF文件的方法

java根据富文本生成pdf文件过程解析

Java使用itext5实现PDF表格文档导出

itext生成PDF设置页眉页脚的实例详解

基于嵌入式ARMLinux的播放器的设计与实现 word格式.doc

管理建模和仿真的文件

Python字符串为空判断的动手实践：通过示例掌握技巧

box-sizing: border-box;作用是？

经典：大学答辩通过_基于ARM微处理器的嵌入式指纹识别系统设计.pdf

严重性代码说明项目文件行禁止显示状态错误 CS1503 参数 1: 无法从“iText.Kernel.Pdf.PdfReader”转换为“iText.Kernel.Pdf.PdfDocument” IText7Library E:\Study\Pragram\C#\PDF\PDFStudio\IText7Library\PDFHelp\TableAnalyzer.cs 23 活动