General OCR is a very involved process, and general OCR via a
videocamera even more so. With a scanned page, you can generally
assume that the page is scanned pretty close to perpendicular to the
lens, so there is little/no perspective distortion. With a camera,
you can turn a square into a trapezoid by rotating the card, and thus
you need a system that is invariant in 3 dimensions instead of 2.
That's not to say it's not possible. Your license plate is OCR'ed
every time you go through EZ-Pass or an airport parking lot, I
believe. But you're going to quickly get into some very heavy math,
and the naive algorithms (ie. template matching) will not be very
robust.
QC's Core Image filters cannot support a classifier or a feature
detection algorithm directly, because it cannot turn an image into
any other kind of data apart from another image. However, it can be
very good at the pre-processing needed before a classifier or feature
detector is applied. For example, you might want to do adaptive
thresholding, corner detection, edge detection, and/or thinning in
the GPU (all of which I've attempted and put on my blog at http://
www.samkass.com/blog , although it hasn't been updated in awhile.)
The Hough Transform, which you'll find in many texts as an approach
to this kind of OCR, is unfortunately not well-suited to the GPU, as
isn't any algorithm that uses pixels in the destination image as a
non-spacial accumulator. (In Core Image, you're operating on the
output pixels and sampling the input pixels, so histograms, Hough
transforms, and the like are even less efficient than a standard CPU
computation.) Another approach is to measure large numbers of
features (relative lengths of strokes, distance between corners,
etc.) and use a KL transform or the like to pick out the most
"interesting" features and do one of the standard classifiers you'll
find in many pattern recognition book. This has the advantage of
being somewhat self-selecting in finding what to look for. But now
we start to get into the serious math.
In any case, it's a pretty hefty subject. Good luck with it. (I
might instead recommend barcodes or the like that are designed to be
readable by computers.)
--Sam
On Jul 25, 2005, at 11:25 PM, Matthew Williams wrote:
I'm looking for a method to read a simple flash card via iSight,
for example:
A simple card with the number 5 on it would be held up to the camera,
would there be a way to use any of the filters (edge detection or
something?) and identify that it is in fact a 5?
I was thinking this more of a implementation in a Cocoa app, does
anyone know where I would look for a introduction to OCR or have any
tips?
Thanks :)
-MW
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list (Quartzcomposer-
email@hidden)
Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/quartzcomposer-dev/samkass%
40samkass.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Quartzcomposer-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/quartzcomposer-dev/email@hidden
This email sent to email@hidden