[PHP] tesseract_ocrで画像認識させる方法
Composerでtesseract_ocrをインストール
composer require thiagoalessio/tesseract_ocr
aptでtesseract-ocrをインストール+バージョン確認
sudo apt update
sudo apt install tesseract-ocr
tesseract --version
日本語モジュールをインストール
wget https://github.com/tesseract-ocr/tessdata/raw/main/jpn.traineddata
sudo mv jpn.traineddata /usr/share/tesseract-ocr/4.00/tessdata/
PHPでコード実装、この例では画像URLを指定して画像認識させている
<?php
require 'vendor/autoload.php';
use thiagoalessio\TesseractOCR\TesseractOCR;
// Replace with the path to your Tesseract executable
$tesseractExecutable = '/usr/bin/tesseract';
// Replace with the URL of the image you want to process
$imageUrl = 'https://example.com/path/to/your/image.png';
try {
// Set the TESSDATA_PREFIX environment variable
putenv('TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata');
// Download the image from the URL
$imageContents = file_get_contents($imageUrl);
// Create a temporary file to store the downloaded image
$tempImagePath = tempnam(sys_get_temp_dir(), 'image_');
file_put_contents($tempImagePath, $imageContents);
// Create an instance of TesseractOCR and specify the language
$tesseract = new TesseractOCR($tempImagePath);
// Set the Tesseract executable path
$tesseract->executable($tesseractExecutable);
// Specify the language (Japanese in this example)
$tesseract->lang('jpn');
// Run OCR on the image
$text = $tesseract->run();
// Output the detected text
echo "Detected Text:\n";
echo $text;
// Clean up by deleting the temporary image file
unlink($tempImagePath);
} catch (Exception $e) {
echo 'Error: ' . $e->getMessage();
}