关于 c#:Unable to read text from an image using tessnet2 and Tesseract-OCR

Unable to read the text from an image using tessnet2 and Tesseract-OCR

我编写了下面的.Net 代码来从图像中读取文本：

用于编写代码的平台：
Windows 10、Visual Studio 2015、tesseract-ocr-setup-4.00.00dev 和 tessnet2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using tessnet2;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.IO;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var image = new Bitmap(@"D://Python//download.jpg");
var ocr = new Tesseract();
ocr.Init(@"C://Program Files (x86)//Tesseract-OCR//tessdata","eng",false);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
{
Console.WriteLine(word.Text);
File.AppendAllText(@"D://Python//writefile.txt",word.Text);

}
Console.ReadLine();
}
}
}

我都尝试过”任何 CPU”和 X86 的 CPU。也尝试从项目属性更改目标框架版本。

但是，我遇到以下错误：

1
2
3
4
5
6
7

An unhandled exception of type ‘System.IO.FileLoadException’ occurred in
mscorlib.dll

Additional information: Mixed mode assembly is built against version
‘v2.0.50727’
of the runtime and cannot be loaded in the 4.0 runtime without additional
configuration information.

编辑：
刚刚在我的 app.config 中写了这个以消除错误，现在看起来如下所示：

1
2
3
4
5
6

<?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true">

通过参考安装 NuGet：https://www.nuget.org/packages/NuGet.Tessnet2/

我无法读取图像。我从其中包含文本的 Google 图片之一下载的图片。

这是我收到的消息：

enter

问题已解决：通过从此处下载 LANG 包：https://github.com/tesseract-ocr/langdata

这是以前缺少的。Tessnet2 工作最重要的是获取语言包，在这里 (https://github.com/tesseract-ocr/langdata) 获取您想要的语言。对于示例，我使用英语。

下载语言并将其解压到”..////Tesseract-OCR////tessdata”文件夹。

注意：默认情况下，语言包在安装过程中似乎不会出现在 tessdata 中。

这是我修改后的代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var image = new Bitmap(@"D://Python//download.jpg");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.Init(@"C://Program Files (x86)//Tesseract-OCR//tessdata","eng",false);
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
{
Console.WriteLine("{0} : {1}",word.Confidence,word.Text);

}

Console.Read();
}

}
}

干杯！！！

关于 c#:Unable to read text from an image using tessnet2 and Tesseract-OCR

Unable to read the text from an image using tessnet2 and Tesseract-OCR

相关推荐

发表回复