首页如何在Python中正确安装和使用Tesseract OCR库（pytesseract）？

如何在Python中正确安装和使用Tesseract OCR库（pytesseract）？

时间: 2024-12-14 16:20:27 浏览: 255

在Python中安装和使用Tesseract OCR（Optical Character Recognition，光学字符识别）库，通常通过`pytesseract`模块来集成Google Tesseract OCR引擎。以下是安装和使用的步骤： 1. 安装依赖： - 首先，你需要安装`PIL`（Python Imaging Library），因为`pytesseract`依赖它来处理图像文件。你可以使用pip命令行工具安装： ``` pip install pillow ``` 2. 安装Tesseract： - Python `pytesseract`默认依赖系统已安装的Tesseract。如果你的系统没有预装Tesseract，需要下载并安装。Windows用户可以从 https://github.com/tesseract-ocr/tesseract 下载适用于Windows的版本；macOS用户可以使用Homebrew安装： ```bash brew install tesseract ``` 或者在Mac上直接下载源码包并按照说明安装。 3. 安装`pytesseract`： - 现在有了Tesseract，安装`pytesseract`： ``` pip install pytesseract ``` 4. 使用示例： - 导入pytesseract模块，并设置语言环境（例如，英文是'eng'）： ```python import pytesseract pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract' # 替换为你Tesseract的实际路径 lang = 'eng' pytesseract.language = lang ``` 5. 读取和识别图片： - 你可以读取图片并尝试识别文本： ```python from PIL import Image image_path = 'image.jpg' text = pytesseract.image_to_string(Image.open(image_path)) print(text) ```

阅读全文