利用Python实现Word文档与PDF文档的格式转化 (提取PDF文本内容),实现PDF文件的合并、拆分,PDF页面旋转已经PDF页面增加水印。代码全部
时间: 2024-02-27 07:51:35 浏览: 78
好的,以下是完整的代码实现:
1. Word文档转PDF文档
```
from docx2pdf import convert
# 将Word文档转换为PDF文档
convert("input.docx", "output.pdf")
```
2. PDF文档合并
```
from PyPDF2 import PdfFileMerger
# 合并两个PDF文档
merger = PdfFileMerger()
merger.append("document1.pdf")
merger.append("document2.pdf")
merger.write("merged_document.pdf")
merger.close()
```
3. PDF文档拆分
```
from PyPDF2 import PdfFileReader, PdfFileWriter
# 拆分PDF文档
input_pdf = PdfFileReader(open("input.pdf", "rb"))
for i in range(input_pdf.getNumPages()):
output_pdf = PdfFileWriter()
output_pdf.addPage(input_pdf.getPage(i))
with open(f"output_{i}.pdf", "wb") as output_stream:
output_pdf.write(output_stream)
```
4. PDF页面旋转
```
from PyPDF2 import PdfFileReader, PdfFileWriter
# 旋转PDF页面
input_pdf = PdfFileReader(open("input.pdf", "rb"))
output_pdf = PdfFileWriter()
for i in range(input_pdf.getNumPages()):
page = input_pdf.getPage(i)
page.rotateClockwise(90) # 顺时针旋转90度
output_pdf.addPage(page)
with open("output.pdf", "wb") as output_stream:
output_pdf.write(output_stream)
```
5. PDF页面增加水印
```
from PyPDF2 import PdfFileReader, PdfFileWriter
from reportlab.pdfgen import canvas
# 添加水印
input_pdf = PdfFileReader(open("input.pdf", "rb"))
output_pdf = PdfFileWriter()
watermark_pdf = canvas.Canvas("watermark.pdf")
watermark_pdf.setFont("Helvetica", 80)
watermark_pdf.setFillGray(0.5)
watermark_pdf.rotate(45)
watermark_pdf.drawString(0, 0, "WATERMARK")
watermark_pdf.save()
watermark = PdfFileReader(open("watermark.pdf", "rb")).getPage(0)
for i in range(input_pdf.getNumPages()):
page = input_pdf.getPage(i)
page.mergePage(watermark)
output_pdf.addPage(page)
with open("output.pdf", "wb") as output_stream:
output_pdf.write(output_stream)
```
希望以上代码可以帮到您。
阅读全文