Discussion:
[squeak-dev] Squeak and Tesseract
Edwin Ancaer
2018-11-02 10:43:52 UTC
Permalink
Hello list,

As I'm looking at a way to automate the search of documents in my humble
administration, I read some articles about OCR. I came along an article
about using Python with Tesseract, to transform an scan of a document into
text, that is searchable.

My question now is if I can do something similar with Squeak. To my
inexperienced eye, it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?

Or should I forget the API and use OSProcess to start the tesseract
program?

Could anyone point me in the right direction, or just tell if the whole
idea is insane?
But if it can be done in Python....

Thanks in advance,

Edwin Ancaer.
Ben Coman
2018-11-02 13:06:16 UTC
Permalink
Post by Edwin Ancaer
Hello list,
As I'm looking at a way to automate the search of documents in my humble
administration, I read some articles about OCR. I came along an article
about using Python with Tesseract, to transform an scan of a document into
text, that is searchable.
My question now is if I can do something similar with Squeak. To my
inexperienced eye, it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function
symbols,
but good fortune I notice Tesseract has C bindings...
https://github.com/tesseract-ocr/tesseract#for-developers
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.


Or should I forget the API and use OSProcess to start the tesseract
Post by Edwin Ancaer
program?
FFI will be more flexible.


Could anyone point me in the right direction, or just tell if the whole
Post by Edwin Ancaer
idea is insane?
I think its a great idea and actually Tesseract FFI is something I've
wanted to play with before but not had the time.
I'd be interested to hear how you go with it.

cheers -ben
Ben Coman
2018-11-04 03:48:40 UTC
Permalink
While I've done a lot of C programming that is useful for FFI interfacing,
I've not done much C++. So just sharing something new I learnt today to
help with FFI interfacing to combined C/C++ libraries. I thought maybe
others in the same boat could be interested in this.
[Original question asked in squeak-dev, cross-posting to pharo-dev]
Post by Ben Coman
Post by Edwin Ancaer
As I'm looking at a way to automate the search of documents in my humble
administration, I read some articles about OCR. I came along an article
about using Python with Tesseract, to transform an scan of a document into
text, that is searchable.
My question now is if I can do something similar with Squeak. To my
inexperienced eye, it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function
symbols,
but good fortune I notice Tesseract has C bindings...
https://github.com/tesseract-ocr/tesseract#for-developers
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.
Browsing a deeper I got quite confused for a while.
I could see a typedef definition for TessResultRenderer here...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h#L83
"typedef struct TessResultRenderer TessResultRenderer"
which I understood to must refer to *existing* struct, but I couldn't find
the definition of that struct anywhere. In particular...
$ git clone ***@github.com:tesseract-ocr/tesseract.git
$ cd tesseract
$ find . -type f -name "*h" -exec grep -Hn TessResultRenderer {} \;
but didn't find any struct definitions.

I could only find TessResultRenderer as a class definition...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L45-L139
and the only thing that I guessed could possibly make sense was that C++
classes and structs could be used interchangeably. My google-fu failed to
find anything useful, so an experiment...
$ vi test.cpp
#include <stdio.h>
class SomeClass {
public:
int a;
int b;
};
typedef struct SomeClass SomeTypeDef;
int main()
{
SomeTypeDef x;
x.a = 5;
x.b = 7;
printf("Answer is %d\n", x.a + x.b);
}
$ gcc test.cpp
$ ./a.out
Answer is 12

Now I noticed that the TessResultRenderer member variables were private...
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L131-L139
and curious about that I changed my test example from public to private
which somewhat expectedly produced compile errors.

So those TessResultRenderer member variables must only be accessed from a
member function, but how is that C++ member function called from C to
operate on a particular object?
An example is TessResultRendererInsert...
C Declaration:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.h#L135
C Definition:
https://github.com/tesseract-ocr/tesseract/blob/c375f4fbf73b8f761b2e65e0e3ad6776b9fbee78/src/api/capi.cpp#L90-L93
C++ Declaration:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.h#L52
C++ Definition:
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/renderer.cpp#L59-L70

So in the C Defintion "the C++ member-function insert() as being called via
a function pointer in the struct." (is that a reasonable way to describe
it?)

In this case, because of the private member variables, our FFI would treat
TessResultRenderer as an opaque object, which simplifies things. I would
guess in-Image direct access to the member variables from would need to
account for the offset due to variables holding the function pointer to the
member functions.

cheers -ben


P.S. for Tesseract FFI it might be good to start with reproducing this
example...
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-using-the-c-api-in-a-c-program
Kjell Godo
2018-11-04 05:15:10 UTC
Permalink
Can i just write a simple C shared library or DLL which calls the C++ ? So
you are repackaging the C++ as a C library? I can’t see how this tack could
fail to work. Just repackage C++ as C.

You would have to come up with a procedural less OOP-ish API i guess. You
could have C API functions F which take an Object as F’s first input and in
this way each C++ Method becomes a C function. You only need wrap as much
of the C++ API as you want to use and each C function just calls its C++
Method so making the wrappers is highly simple and mechanical i should
think. it could even be automated. But i know some C++ but have never made
anything in it.

I suppose that if Smalltalk cannot contain a C++ Object then you could make
a C struct which can be in Smalltalk and you have the API function copy
this struct into the C++ Object then act on it then copy the Object data
back into the struct which is in Smalltalk. But that’s a lot of work.
Surely you can have a pointer to a C++ Object in Smalltalk.

Maybe it would be better to have a separate C++ program P that you
communicate with by sockets using Object handles H which are just Integer
Array indexes into an Array of Objects in P? i suppose there could be a
shared lib L that FFI could call which could call back program P if sockets
were too slow or something.

I guess Dolphin can input a Smalltalk BlockClosure B into an FFI call to L
which could input B into program P which could call B to get back into
Dolphin but i haven’t tried it myself.

I guess there is a Smalltalk interface to Python via a socket and then from
Python to C++ is easy? Seems like a code generator that has all this stuff
figured out could be good. I think VisualWorks is probably good at
connecting to C++ via FFI. What about chicken scheme or any of the C based
Schemes? What about Smalltalk/X?

borgLisp is an idea to make multiple Lisp dialects each isomorphic to its
target language like C or C++ or Python or Ruby or Prolog or java or C# or
Scheme or Rust etc any language can have an isomorphic Lisp dialect
targeting it in order to bind all the languages into a single borgLisp
where you can mix and match all the languages together. Where each Lisp
dialect is just a simple Lisp code generator. And so once all the languages
are in Lisp then all the Lisp things can be used to mix and match all the
languages together and using Nix to set up and configure everything so
everything works together one click like. all the different languages. so
they can all work together in an easy generative format. So every language
becomes Lisp and Lisp becomes every language. Using code generation you
could even make a Debugger in Lisp and Smalltalk which could source debug
any language like the Smalltalk debugger does for Smalltalk.

but i guess this is off the topic
Post by Ben Coman
Post by Edwin Ancaer
Hello list,
As I'm looking at a way to automate the search of documents in my humble
administration, I read some articles about OCR. I came along an article
about using Python with Tesseract, to transform an scan of a document into
text, that is searchable.
My question now is if I can do something similar with Squeak. To my
inexperienced eye, it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function
symbols,
but good fortune I notice Tesseract has C bindings...
https://github.com/tesseract-ocr/tesseract#for-developers
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.
Or should I forget the API and use OSProcess to start the tesseract
Post by Edwin Ancaer
program?
FFI will be more flexible.
Could anyone point me in the right direction, or just tell if the whole
Post by Edwin Ancaer
idea is insane?
I think its a great idea and actually Tesseract FFI is something I've
wanted to play with before but not had the time.
I'd be interested to hear how you go with it.
cheers -ben
Edwin Ancaer
2018-11-15 13:59:59 UTC
Permalink
Hello all,

FFI was *a little *complexer than I had thought. And the Tesseract api was
not helping either. But now I think I'm getting closer to make the example
Ben proposed (https://github.com/tesseract-ocr/tesseract/wiki/APIExample,
the C-program using the C-API) work in Squeak.

Just one thing I cannot find an example for. I have to create the
ExternalStructure classes for the structures PIXMAP and RGBA_QUAD.
RGBA_QUAD is easy, but the PIXMAP-structure starts with an array of
RGBA_QUADs. RGBA_QUAD[] does not seem to be working as a type
specification, and RGBA_SQUAD* will reserve place for the first element,
but not the whole array. Is there an example for such structures?

From the Header file:

00101 <https://tpgit.github.io/Leptonica/struct_pix_colormap.html>
struct PixColormap
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html>00102
{00103 <https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a2a14164dbec38ebab11eee1bea569cbc>
void *array
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a2a14164dbec38ebab11eee1bea569cbc>;
/* colormap table (array of RGBA_QUAD) */00104
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ac40f93eac5fc385f43e17d0b537e40e2>
l_int32 <https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e>
depth <https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ac40f93eac5fc385f43e17d0b537e40e2>;
/* of pix (1, 2, 4 or 8 bpp) */00105
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ad4398c00071558f7821c82c897548fa8>
l_int32 <https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e>
nalloc
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html#ad4398c00071558f7821c82c897548fa8>;
/* number of color entries allocated */00106
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a99005f6c729d84e55143a208b61f99bf>
l_int32 <https://tpgit.github.io/Leptonica/environ_8h.html#a9085c7874153c280a4171244aa052e4e>
n <https://tpgit.github.io/Leptonica/struct_pix_colormap.html#a99005f6c729d84e55143a208b61f99bf>;
/* number of color entries used */00107 };00108
<https://tpgit.github.io/Leptonica/pix_8h.html#ab2fccb09f9188d3e2cc90f8df11b7de7>
typedef struct PixColormap
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html> PIXCMAP
<https://tpgit.github.io/Leptonica/struct_pix_colormap.html>;00109
00110 00111 /* Colormap table entry (after the BMP version).00112
* Note that the BMP format stores the colormap table exactly00113
* as it appears here, with color samples being stored
sequentially,00114 * in the order (b,g,r,a). */00115
<https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html> struct
RGBA_Quad <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html>00116
{00117 <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a57ceb621e5e83bc2d8b9d78cc426cefd>
l_uint8 <https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad>
blue <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a57ceb621e5e83bc2d8b9d78cc426cefd>;00118
<https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a32f8a3f2225995fcedfb6d80bb480c05>
l_uint8 <https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad>
green <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a32f8a3f2225995fcedfb6d80bb480c05>;00119
<https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a9ad88fbc3a671fbe8406e608b59563fa>
l_uint8 <https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad>
red <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a9ad88fbc3a671fbe8406e608b59563fa>;00120
<https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a0811097c12e668433c357edcb973da76>
l_uint8 <https://tpgit.github.io/Leptonica/environ_8h.html#a7ed60554e7d6dd89aca643189b1e70ad>
reserved <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html#a0811097c12e668433c357edcb973da76>;00121
};00122 <https://tpgit.github.io/Leptonica/pix_8h.html#ac4b7ee5b0e033dd9df33e464059cdf87>
typedef struct RGBA_Quad
<https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html>
RGBA_QUAD <https://tpgit.github.io/Leptonica/struct_r_g_b_a___quad.html>;
Post by Kjell Godo
Can i just write a simple C shared library or DLL which calls the C++ ? So
you are repackaging the C++ as a C library? I can’t see how this tack could
fail to work. Just repackage C++ as C.
You would have to come up with a procedural less OOP-ish API i guess. You
could have C API functions F which take an Object as F’s first input and in
this way each C++ Method becomes a C function. You only need wrap as much
of the C++ API as you want to use and each C function just calls its C++
Method so making the wrappers is highly simple and mechanical i should
think. it could even be automated. But i know some C++ but have never made
anything in it.
I suppose that if Smalltalk cannot contain a C++ Object then you could
make a C struct which can be in Smalltalk and you have the API function
copy this struct into the C++ Object then act on it then copy the Object
data back into the struct which is in Smalltalk. But that’s a lot of work.
Surely you can have a pointer to a C++ Object in Smalltalk.
Maybe it would be better to have a separate C++ program P that you
communicate with by sockets using Object handles H which are just Integer
Array indexes into an Array of Objects in P? i suppose there could be a
shared lib L that FFI could call which could call back program P if sockets
were too slow or something.
I guess Dolphin can input a Smalltalk BlockClosure B into an FFI call to
L which could input B into program P which could call B to get back into
Dolphin but i haven’t tried it myself.
I guess there is a Smalltalk interface to Python via a socket and then
from Python to C++ is easy? Seems like a code generator that has all this
stuff figured out could be good. I think VisualWorks is probably good at
connecting to C++ via FFI. What about chicken scheme or any of the C based
Schemes? What about Smalltalk/X?
borgLisp is an idea to make multiple Lisp dialects each isomorphic to its
target language like C or C++ or Python or Ruby or Prolog or java or C# or
Scheme or Rust etc any language can have an isomorphic Lisp dialect
targeting it in order to bind all the languages into a single borgLisp
where you can mix and match all the languages together. Where each Lisp
dialect is just a simple Lisp code generator. And so once all the languages
are in Lisp then all the Lisp things can be used to mix and match all the
languages together and using Nix to set up and configure everything so
everything works together one click like. all the different languages. so
they can all work together in an easy generative format. So every language
becomes Lisp and Lisp becomes every language. Using code generation you
could even make a Debugger in Lisp and Smalltalk which could source debug
any language like the Smalltalk debugger does for Smalltalk.
but i guess this is off the topic
Post by Ben Coman
Post by Edwin Ancaer
Hello list,
As I'm looking at a way to automate the search of documents in my humble
administration, I read some articles about OCR. I came along an article
about using Python with Tesseract, to transform an scan of a document into
text, that is searchable.
My question now is if I can do something similar with Squeak. To my
inexperienced eye, it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function
symbols,
but good fortune I notice Tesseract has C bindings...
https://github.com/tesseract-ocr/tesseract#for-developers
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.
Or should I forget the API and use OSProcess to start the tesseract
Post by Edwin Ancaer
program?
FFI will be more flexible.
Could anyone point me in the right direction, or just tell if the whole
Post by Edwin Ancaer
idea is insane?
I think its a great idea and actually Tesseract FFI is something I've
wanted to play with before but not had the time.
I'd be interested to hear how you go with it.
cheers -ben
1q
Levente Uzonyi
2018-11-15 14:49:44 UTC
Permalink
Post by Edwin Ancaer
Hello all,
FFI was a little complexer than I had thought. And the Tesseract api was not helping either. But now I think I'm getting closer to make the example Ben proposed
(https://github.com/tesseract-ocr/tesseract/wiki/APIExample, the C-program using the C-API) work in Squeak.
Just one thing I cannot find an example for. I have to create the ExternalStructure classes for the structures PIXMAP and RGBA_QUAD. RGBA_QUAD is easy, but the PIXMAP-structure starts with an array of
RGBA_QUADs.  RGBA_QUAD[] does not seem to be working as a type specification, and RGBA_SQUAD* will reserve place for the first element, but not the whole array. Is there an example for such
structures?   
RGBA_SQUAD* should reserve space for a pointer not for a whole struct.
Also, if you don't need to access the contents of the struct directly and
you don't have to allocate it yourself, then can just declare it as void*.

Levente
Post by Edwin Ancaer
00101 struct PixColormap
00102 {
00103 void *array; /* colormap table (array of RGBA_QUAD) */
00104 l_int32 depth; /* of pix (1, 2, 4 or 8 bpp) */
00105 l_int32 nalloc; /* number of color entries allocated */
00106 l_int32 n; /* number of color entries used */
00107 };
00108 typedef struct PixColormap PIXCMAP;
00109
00110
00111 /* Colormap table entry (after the BMP version).
00112 * Note that the BMP format stores the colormap table exactly
00113 * as it appears here, with color samples being stored sequentially,
00114 * in the order (b,g,r,a). */
00115 struct RGBA_Quad
00116 {
00117 l_uint8 blue;
00118 l_uint8 green;
00119 l_uint8 red;
00120 l_uint8 reserved;
00121 };
00122 typedef struct RGBA_Quad RGBA_QUAD;
Can i just write a simple C shared library or DLL which calls the C++ ? So you are repackaging the C++ as a C library? I can’t see how this tack could fail to work. Just repackage C++ as
C. 
You would have to come up with a procedural less OOP-ish API i guess.  You could have C API functions F which take an Object as F’s first input and in this way each C++ Method becomes a C
function. You only need wrap as much of the C++ API as you want to use and each C function just calls its C++ Method so making the wrappers is highly simple and mechanical i should think. it
could even be automated. But i know some C++ but have never made anything in it. 
I suppose that if Smalltalk cannot contain a C++ Object then you could make a C struct which can be in Smalltalk and you have the API function copy this struct into the C++ Object then act on it
then copy the Object data back into the struct which is in Smalltalk. But that’s a lot of work. Surely you can have a pointer to a C++ Object in Smalltalk. 
Maybe it would be better to have a separate C++ program P that you communicate with by sockets using Object handles H which are just Integer Array indexes into an Array of Objects in P? i
suppose there could be a shared lib L that FFI could call which could call back program P if sockets were too slow or something. 
I guess Dolphin can  input a Smalltalk BlockClosure B into an FFI call to L which could input B into program P which could call B to get back into Dolphin but i haven’t tried it myself. 
I guess there is a Smalltalk interface to Python via a socket and then from Python to C++ is easy? Seems like a code generator that has all this stuff figured out could be good. I think
VisualWorks is probably good at connecting to C++ via FFI. What about chicken scheme or any of the C based Schemes? What about Smalltalk/X? 
borgLisp is an idea to make multiple Lisp dialects each isomorphic to its target language like C or C++ or Python or Ruby or Prolog or java or C# or Scheme or Rust etc any language can have an
isomorphic Lisp dialect targeting it in order to bind all the languages into a single borgLisp where you can mix and match all the languages together. Where each Lisp dialect is just a simple
Lisp code generator. And so once all the languages are in Lisp then all the Lisp things can be used to mix and match all the languages together and using Nix to set up and configure everything
so everything works together one click like. all the different languages. so they can all work together in an easy generative format. So every language becomes Lisp and Lisp becomes every
language. Using code generation you could even make a Debugger in Lisp and Smalltalk which could source debug any language like the Smalltalk debugger does for Smalltalk.
but i guess this is off the topic
Hello list,
As I'm looking at a way to automate the search of documents in my humble administration, I read some articles about OCR. I came along an article about using Python with Tesseract, to
transform an scan of a document into text, that is searchable.
My question now is if I can do something similar with Squeak. To my inexperienced eye, it seems like I should use FFI to call the functions in the Tesseract API, but this API is in 
C++, and I don't know if it is possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function symbols, 
but good fortune I notice Tesseract has C bindings...
    https://github.com/tesseract-ocr/tesseract#for-developers
    https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.
Or should I forget the API and use OSProcess to start the tesseract program? 
FFI will be more flexible.
 
Could anyone point me in the right direction, or just tell  if the whole idea is insane?
I think its a great idea and actually Tesseract FFI is something I've wanted to play with before but not had the time.
I'd be interested to hear how you go with it.
cheers -ben
1q
Ben Coman
2018-11-15 15:22:53 UTC
Permalink
Post by Edwin Ancaer
Hello all,
FFI was *a little *complexer than I had thought. And the Tesseract api
was not helping either. But now I think I'm getting closer to make the
example Ben proposed (
https://github.com/tesseract-ocr/tesseract/wiki/APIExample, the C-program
using the C-API) work in Squeak.
Cool!!!! Is Levente's advice sufficient? Or do you need more?
Do you have a repo when your code is accessible?
(and a few tests interested parties can run to familiarize with the system?)

cheers -ben
Edwin Ancaer
2018-11-15 16:17:33 UTC
Permalink
LEVENTE, Ben,

RGBA_QUAD* is a pointer to the first element, of course, not the first
element itself.Another blow for the ego...l

By a repo, you mean on github? That also is strange territory, so I will
have to ask for some patience.

Thanks,

Edwin
Ben Coman
2018-11-16 03:00:25 UTC
Permalink
Post by Edwin Ancaer
LEVENTE, Ben,
RGBA_QUAD* is a pointer to the first element, of course, not the first
element itself.Another blow for the ego...l
By a repo, you mean on github? That also is strange territory, so I will
have to ask for some patience.
If not git, then www.squeaksource.com or ss3.gemtalksystems.com

cheers -ben

Yoshiki Ohshima
2018-11-15 19:20:18 UTC
Permalink
Not following the conversation fully, but a practical solution may be
to use a small Python program that calls tesseract and print things
out to stdout. Then OSProcess can handle it.On Thu, Nov 15, 2018 at
Post by Edwin Ancaer
Hello all,
FFI was a little complexer than I had thought. And the Tesseract api was not helping either. But now I think I'm getting closer to make the example Ben proposed (https://github.com/tesseract-ocr/tesseract/wiki/APIExample, the C-program using the C-API) work in Squeak.
Just one thing I cannot find an example for. I have to create the ExternalStructure classes for the structures PIXMAP and RGBA_QUAD. RGBA_QUAD is easy, but the PIXMAP-structure starts with an array of RGBA_QUADs. RGBA_QUAD[] does not seem to be working as a type specification, and RGBA_SQUAD* will reserve place for the first element, but not the whole array. Is there an example for such structures?
00101 struct PixColormap
00102 {
00103 void *array; /* colormap table (array of RGBA_QUAD) */
00104 l_int32 depth; /* of pix (1, 2, 4 or 8 bpp) */
00105 l_int32 nalloc; /* number of color entries allocated */
00106 l_int32 n; /* number of color entries used */
00107 };
00108 typedef struct PixColormap PIXCMAP;
00109
00110
00111 /* Colormap table entry (after the BMP version).
00112 * Note that the BMP format stores the colormap table exactly
00113 * as it appears here, with color samples being stored sequentially,
00114 * in the order (b,g,r,a). */
00115 struct RGBA_Quad
00116 {
00117 l_uint8 blue;
00118 l_uint8 green;
00119 l_uint8 red;
00120 l_uint8 reserved;
00121 };
00122 typedef struct RGBA_Quad RGBA_QUAD;
Can i just write a simple C shared library or DLL which calls the C++ ? So you are repackaging the C++ as a C library? I can’t see how this tack could fail to work. Just repackage C++ as C.
You would have to come up with a procedural less OOP-ish API i guess. You could have C API functions F which take an Object as F’s first input and in this way each C++ Method becomes a C function. You only need wrap as much of the C++ API as you want to use and each C function just calls its C++ Method so making the wrappers is highly simple and mechanical i should think. it could even be automated. But i know some C++ but have never made anything in it.
I suppose that if Smalltalk cannot contain a C++ Object then you could make a C struct which can be in Smalltalk and you have the API function copy this struct into the C++ Object then act on it then copy the Object data back into the struct which is in Smalltalk. But that’s a lot of work. Surely you can have a pointer to a C++ Object in Smalltalk.
Maybe it would be better to have a separate C++ program P that you communicate with by sockets using Object handles H which are just Integer Array indexes into an Array of Objects in P? i suppose there could be a shared lib L that FFI could call which could call back program P if sockets were too slow or something.
I guess Dolphin can input a Smalltalk BlockClosure B into an FFI call to L which could input B into program P which could call B to get back into Dolphin but i haven’t tried it myself.
I guess there is a Smalltalk interface to Python via a socket and then from Python to C++ is easy? Seems like a code generator that has all this stuff figured out could be good. I think VisualWorks is probably good at connecting to C++ via FFI. What about chicken scheme or any of the C based Schemes? What about Smalltalk/X?
borgLisp is an idea to make multiple Lisp dialects each isomorphic to its target language like C or C++ or Python or Ruby or Prolog or java or C# or Scheme or Rust etc any language can have an isomorphic Lisp dialect targeting it in order to bind all the languages into a single borgLisp where you can mix and match all the languages together. Where each Lisp dialect is just a simple Lisp code generator. And so once all the languages are in Lisp then all the Lisp things can be used to mix and match all the languages together and using Nix to set up and configure everything so everything works together one click like. all the different languages. so they can all work together in an easy generative format. So every language becomes Lisp and Lisp becomes every language. Using code generation you could even make a Debugger in Lisp and Smalltalk which could source debug any language like the Smalltalk debugger does for Smalltalk.
but i guess this is off the topic
Post by Edwin Ancaer
Post by Edwin Ancaer
Hello list,
As I'm looking at a way to automate the search of documents in my humble administration, I read some articles about OCR. I came along an article about using Python with Tesseract, to transform an scan of a document into text, that is searchable.
My question now is if I can do something similar with Squeak. To my inexperienced eye, it seems like I should use FFI to call the functions in the Tesseract API, but this API is in C++, and I don't know if it is possible to use FFI to call C++ functions?
You are right C++ is difficult because of the name mangling of function symbols,
but good fortune I notice Tesseract has C bindings...
https://github.com/tesseract-ocr/tesseract#for-developers
https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h
so it looks like you are in the clear.
Post by Edwin Ancaer
Or should I forget the API and use OSProcess to start the tesseract program?
FFI will be more flexible.
Post by Edwin Ancaer
Could anyone point me in the right direction, or just tell if the whole idea is insane?
I think its a great idea and actually Tesseract FFI is something I've wanted to play with before but not had the time.
I'd be interested to hear how you go with it.
cheers -ben
1q
Sean P. DeNigris
2018-11-03 02:28:01 UTC
Permalink
Post by Edwin Ancaer
it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
The typical workaround is to wrap the API you want to use in a C library
that exposes them unmangled.
Post by Edwin Ancaer
Or should I forget the API and use OSProcess to start the tesseract
program?
I took this approach for Pharo: https://github.com/seandenigris/Tesseract-St
. There likely will be a lot there that you can use in Squeak.



-----
Cheers,
Sean
--
Sent
tim Rowledge
2018-11-03 03:19:06 UTC
Permalink
Post by Sean P. DeNigris
Post by Edwin Ancaer
it seems like I should use FFI to call the functions in
the Tesseract API, but this API is in C++, and I don't know if it is
possible to use FFI to call C++ functions?
The typical workaround is to wrap the API you want to use in a C library
that exposes them unmangled.
Also consider the trick used in the Bochs plugin - extern "C" { code....}
See https://stackoverflow.com/questions/1041866/what-is-the-effect-of-extern-c-in-c for a faintly intelligible discussion


tim
--
tim Rowledge; ***@rowledge.org; http://www.rowledge.org/tim
"!" The strange little noise
Loading...