Seven Segment Optical Character Recognition

image of six seven segment digits
Use ssocr -T to recognize the above image.

Overview

Seven Segment Optical Character Recognition or ssocr for short is a program to recognize digits of a seven segment display. An image of one row of digits is used for input and the recognized number is written to the standard output. The programm runs on GNU/Linux (and Mac OS X, and even on Windows using Cygwin), and uses Imlib2 to access image data.

Download

Source code ssocr-2.16.0.tar.bz2 (licensed under the terms of the GNU GPL version 3 or later).

Algorithm

The image is optionally filtered and then transformed into a monochrome representation with the digits as foreground using some form of thresholding. This image is segmented to find the digits and then each digit is recognized individually.

Segmentation

Starting at the left margin a column containing some foreground pixels is searched, marking the start of the first digit. After that a column containing only background pixels is searched to find the horizontal stretch of the digit. This process is repeated to find the specified number of digits, or until no more digits are found.

The vertical segmentation works similar, but gaps in digits are allowed, because in some digits the middle segment is unset.

This segmentation technique works for a single row of digits only.

Character Recognition

Every digit found by the segmentation is classified as follows: A vertical scan is started in the center top pixel of the digit to find the three horizontal segments. Any foreground pixel in the upper third is counted as part of the top segment, those in the second third as part of the middle and those in the last third as part of the bottom segment.

To examine the vertical segments two horizontal scanlines starting on the left margin of the digit are used. The first starts a quarter of the digit height from the top, the other from a quarter of the digit height from the bottom. Foreground pixels in the left resp. right half represent left resp. right segments.

The recognized segments are then used to identify the displayed digit using a table lookup (implemented as a switch statement).

Since the above algorithm cannot recognize the digit one, a digit that has a width of less than one quarter of it's height is recognized as a one.

To recognize a decimal point, e.g. of a digital scale, the size of each digit (that was not recognized as a one already) is compared with the maximum digit width and height. If a digit is significantly smaller than that, it is assumed to be a decimal point. The decimal point or thousands separators count towards the number of digits to recognize.

To recognize a minus sign a method similar to recognizing the digit one is used. If a digit is less high than 1/3 of its width, it is considered a minus sign.

Visualization of the Algorithm

grafic illustration of algorithm

In this image the left border of a digit is represented by a red column, the right border as a blue column. Horizontal green lines of digit width show connected vertical digit parts. The gray rectangles represent the digit dimensions.

Pixels found by the vertical scanline are shown in red, green and blue for the top, middle and bottom third. Those found by the horizontal scanlines are shown in red and green for the left and right half of the digit. No scanlines are used to recognize a one.


Manual

Usage

Seven Segment Optical Character Recognition Version 2.16.0
Copyright (C) 2004-2013 by Erik Auerswald <auerswal@unix-ag.uni-kl.de>
This program comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it under the terms
of the GNU GPL (version 3 or later)

Usage: ssocr [OPTION]... [COMMAND]... IMAGE

Options: -h, --help               print this message
         -v, --verbose            talk about program execution
         -V, --version            print version information
         -t, --threshold=THRESH   use THRESH (in percent) to distinguish black
                                  from white
         -a, --absolute-threshold don't adjust threshold to image
         -T, --iter-threshold     use iterative thresholding method
         -n, --number-pixels=#    number of pixels needed to recognize a segment
         -i, --ignore-pixels=#    number of pixels ignored when searching digit
                                  boundaries
         -d, --number-digits=#    number of digits in image (-1 for auto)
         -r, --one-ratio=#        height/width ratio to recognize a 'one'
         -m, --minus-ratio=#      width/height ratio to recognize a minus sign
         -o, --output-image=FILE  write processed image to FILE
         -O, --output-format=FMT  use output format FMT (Imlib2 formats)
         -p, --process-only       do image processing only, no OCR
         -D, --debug-image[=FILE] write a debug image to FILE or testbild.png
         -P, --debug-output       print debug information
         -f, --foreground=COLOR   set foreground color (black or white)
         -b, --background=COLOR   set foreground color (black or white)
         -I, --print-info         print image dimensions and used lum values
         -g, --adjust-gray        use T1 and T2 as percentages of used values
         -l, --luminance=KEYWORD  compute luminance using formula KEYWORD
                                  use -l help for list of KEYWORDS

Commands: dilation                dilation algorithm (with mask of 1 pixel)
          erosion                 erosion algorithm (with mask of 9 pixels)
          closing [N]             closing algorithm
                                  ([N times] dilation then [N times] erosion)
          opening [N]             opening algorithm
                                  ([N times] erosion then [N times] dilation)
          remove_isolated         remove isolated pixels
          make_mono               make image monochrome
          grayscale               transform image to grayscale
          invert                  make inverted monochrome image
          gray_stretch T1 T2      stretch luminance values
                                  from [T1,T2] to [0,255]
          dynamic_threshold W H   make image monochrome w. dynamic thresholding
                                  with a window of width W and height H
          rgb_threshold           make image monochrome by setting every pixel
                                  with any values of red, green or blue below
                                  below the threshold to black
          r_threshold             make image monochrome using only red channel
          g_threshold             make image monochrome using only green channel
          b_threshold             make image monochrome using only blue channel
          white_border [WIDTH]    make border of WIDTH (or 1) of image have
                                  background color
          shear OFFSET            shear image OFFSET pixels (at bottom) to the
                                  right
          rotate THETA            rotate image by THETA degrees
          mirror {horiz|vert}     mirror image horizontally or vertically
          crop X Y W H            crop image with upper left corner (X,Y) with
                                  width W and height H
          set_pixels_filter MASK  set pixels that have at least MASK neighbor
                                  pixels set (including checked position)
          keep_pixels_filter MASK keeps pixels that have at least MASK neighbor
                                  pixels set (not counting the checked pixel)

Defaults: needed pixels          =  1
          ignored pixels         =  0
          no. of digits          =  6
          threshold              = 50.00
          foreground             = black
          background             = white
          luminance              = Rec709
          height/width threshold =  3
          width/height threshold for minus sign =  2

Operation: The IMAGE is read, the COMMANDs are processed in the sequence
           they are given, in the resulting image the given number of digits
           are searched and recognized, after which the recognized number is
           written to STDOUT.
           The recognition algorithm works with set or unset pixels and uses
           the given THRESHOLD to decide if a pixel is set or not.
           Use - for IMAGE to read the image from STDIN.

Exit Codes:  0 if correct number of digits have been recognized
             1 if a different number of digits have been found
             2 if one of the digits could not be recognized
             3 if successful image processing only
            42 if -h, -V, or -l help
            99 otherwise
  

Options

-h, --help
Prints the usage message, shows default values, describes program operation and shows possible exit codes.
-v, --verbose
Some messages regarding program execution are printed.
-V, --version
Prints version and copyright information.
-t, --threshold=THRESHOLD
Set the luminance threshold used to distinguish black from white. THRESHOLD is interpreted as a percentage.
-a, --absolute-threshold
Use the THRESHOLD value without adjustment. Otherwise the THRESHOLD is adjusted to the luminance interval used in the image.
-T, --iter-threshold
Use an iterative method (one-dimensional k-means clustering) to determine the threshold used to distinguish black from white. A THRESHOLD value given via -t THRESHOLD sets the starting value.
-n, --number-pixels=NUMBER
Set the number of foreground pixels that have to be found in a scanline to recognize a segment. Can be used to ignore some noise in the picture.
-i, --ignore-pixels=NUMBER
Set the number of foreground pixels that are ignored when deciding if a column consists only of background or foreground pixels. Can be used to ignore some noise in the picture.
-d, --number-digits=NUMBER
Specifies the number of digits shown in the image. If NUMBER is -1, the number of digits is detected automatically.
-r, --one-ratio=RATIO
Set the height/width ratio threshold to recognize a digit as a one. RATIO takes integers only.
-m, --minus-ratio=RATIO
Set the width/height ratio to recognize a minus sign. RATIO takes integers only.
-o, --output-image=FILE
Write the processed image to FILE. Normally no image is written to disk. If a standard extension is used it is interpreted as the image format to use.
-O, --output-format=FORMAT
Specify the image format. This format must be recognized by Imlib2, standard filename extensions are used.
-p, --process-only
Process the given commands only, no segmentation or character recognition. Should be used together with --output-image=FILE.
-D, --debug-image[=FILE]
Write a debug image showing the results of thresholding, segmentation and character recognition. The image is written to testbild.png unless a filename FILE is given.
-P, --debug-output
Print information helpful for debugging.
-f, --foreground=COLOR
Specify the foreground color (either black or white). This automatically sets the background color as well.
-b, --background=COLOR
Specify the background color (either black or white). This automatically sets the foreground color as well.
-I, --print-info
Prints image dimensions and range of used luminance values.
-g, --adjust-gray
Use values T1 and T2 given to command gray_stretch as percentages instead of absolut luminance values.
-l, --luminance=KEYWORD
Controls the kind of luminace computation. Using help as KEYWORD prints the list of keywords with a short description of the used formula. The default should work well.

Luminance Keywords

rec601
use gamma corrected RGB values according to ITU-R Rec. BT.601-4:
0.299*R + 0.587*G + 0.114*B
rec709
use linear (or gamma corrected, I found both claims and don't have the spec) RGB values according to ITU-R Rec. BT.709:
0.2125*R + 0.7154*G + 0.0721*B
linear
use (R+G+B)/3 as done by cvtool version 0.0.1
minimum
use min(R,G,B) as done by GNU Ocrad 0.14
maximum
use max(R,G,B)
red
use red value
green
use green value
blue
use blue value

Commands

dilation
Filter image using dilation algorithm. Any pixel with at least one neighbor pixel set in the source image will be set in the filtered image.
erosion
Filter image using erosion algorithm. Any pixel with every neighbor pixel set in the source image will be set in the filtered image.
closing [N]
Filter image using closing algorithm, i.e. erosion and then dilation. If a number N>1 is specified, N times dilation and then N times erosion is executed.
opening [N]
Filter image using opening algorithm, i.e. dilation and then erosion. If a number N>1 is specified, N times dilation and then N times erosion is executed.
remove_isolated
Remove any foreground pixels without neighboring foreground pixels.
make_mono
Convert the image to monochrome using thresholding. The threshold can be specified with option --threshold and is adjusted to the used luminance interval of the image unless option --absolute-threshold is specified.
grayscale
Transform image to gray values using luminance. The formula to compute luminance can be specified using option --luminance.
invert
Set every foreground pixel to background color and vice versa. It is faster to set the foreground color to white than to invert the image.
gray_stretch T1 T2
Transform image so that the luminance interval [T1,T2] is projected to [0,255] with any value below T1 set to 0 and any value above T2 set to 255.
dynamic_threshold W H
Convert the image to monochrome using dynamic thresholding a.k.a local adaptive thresholding. A window of width W and height H around the current pixel is used.
rgb_threshold
Convert the image to monochrome using simple thresholding for every color channel. If any of the red, green or blue values is below the threshold, the pixel is set to black. You should use --luminance=minimum and make_mono or dynamic_threshold instead.
r_threshold
Convert the image to monochrome using simple thresholding. Only the red color channel is used. You should use --luminance=red and make_mono or dynamic_threshold instead.
g_threshold
Convert the image to monochrome using simple thresholding. Only the green color channel is used. You should use --luminance=green and make_mono or dynamic_threshold instead.
b_threshold
Convert the image to monochrome using simple thresholding. Only the blue color channel is used. You should use --luminance=blue and make_mono or dynamic_threshold instead.
white_border [WIDTH]
The border of the image is set to the foreground color. This border is one pixel wide unless a WIDTH>1 is specified.
shear OFFSET
Shear the image OFFSET pixels to the right. The OFFSET is used at the bottom. Image dimensions do not change, pixels in background color are used for pixels that are outside the image and shifted inside. Pixels shifted out of the image are dropped. Many seven segment displays use slightly skewed digits, this command can be used to compensate this.
rotate THETA
Rotate the image THETA degrees clockwise around the center of the image. Image dimensions do not change, pixels rotated out of the image area are dropped, pixels from outside the image rotated into the new image are set to the background color.
mirror { horiz | vert }
Mirror the image either horizontally (horiz) or vertically (vert).
crop X Y W H
Use only the subpicture with upper left corner (X,Y), width W and height H.
set_pixels_filter MASK
Set every pixel in the filtered image that has at least MASK neighbor pixels set in the source image.
keep_pixels_filter MASK
Keep only those foreground pixels in the filtered image that have at least MASK neighbor pixels set in the source image (not counting the checked pixel itself).

Bugs

Imlib2 (and therefore ssocr) does not work well with Netpbm images.

Manual Page

Since version 2.8.1 ssocr has a manual page. You can read read it online as well.

History

This program was developed as a proof of concept to test the recognition algorithm (this still shows in the source code...).

RSA SecurID Token

SecurID token inside a box
Use ssocr crop 230 195 220 60 -t 20 to get the token from the image above.

Once upon a time a fellow member of the UNIX-AG got issued an RSA SecurID 600 token, but did not want to carry it around all the time. The available general OCR software was not able to recognize the digits, mainly because the segments are not connected. This gap was filled by ssocr.

Since then, a usb camera points to the token inside a cookie box and ssocr is used to get the number into the computer. A script using this info and a password is then used for login.

Security Implications

This setup means that the user has no need to carry the token with him and it can even be easily shared with co-workers. The complicated login procedure requiring a password and in this case two token numbers (which means a one minute wait for the next number) was incentive enough to replace the two factor authentication by traditional authentication measures. The security of the system is determined by the weakest link, which is not the one-time passcode provided by the token.

Version History

Versions 1.x.x

The first versions of ssocr did not contain the image manipulation algorithms. A seperate program called ssocrpp (seven segment OCR preprocessor) was used instead. Since this program used Imlib2 as well, an intermediate image file had to be used. To overcome this, versions 2.x.x of ssocr include all functionality of ssocrpp.

Versions 2.x.x

The second major version of ssocr integrated all functionality in one binary. This was the first publicly released ssocr version. Development concentrated on adding image manipulation functions. No external image manipulation programs were needed any more, thus easing use of ssocr on differing Linux distributions. Since version 2.9.0 the image can be read from a pipe, easing the use of external image manipulation programs.
Since version 2.11.0 a decimal point can be detected.
Since version 2.12.0 hexadecimal digits can be detected.
Since version 2.13.0 the number of digits can be determined automatically. Recognition of a decimal point and an arbitrary number of digits has been added to read the display of digital scales.
Version 2.14.0 introduces an alternative version of the digit 9, where the lower horizontal segment is not set.
Version 2.15.0 adds detection of minus signs, thanks to a patch by Cristiano Fontana.
Version 2.16.0 adds a command to mirror the image horizontally or vertically.

Lessons Learned


Links

A similar Project in Perl, published by the German Linux Magazin.

LimID, another project using specialized OCR to read a seven segment display. This actually includes some hardware to push a button on the token.

RoastLogger, another project that uses OCR to read seven segment displays (non-free).

Alex Samorukov blogged about using ssocr for the original use case of reading the number shown by an RSA SecurID token. :-)

Matt Kirchstein imported ssocr version 2.9.7 into a github repository and made two minor changes to compile ssocr on Mac OS X. Equivalent changes have been incorporated into ssocr version 2.13.3. I learned about this from someone trying to use ssocr from that site and having problems, not from Matt. :-(
If you have to change something to make ssocr work for you, please tell me so I can improve ssocr for everyone. Thanks.

The FobCAM shows the RSA SecurID token of someone.
This is page is currently (2009-07-17) offline, but you can try the Wayback Machine.

My simple image grabber for linux.

A comparison of formulas to create grayscale images.

Image processing can be done with Netpbm, ImageMagick, GREYC's Magic Image Converter, GraphicsMagick, ExactImage, or cvtool, among others (consider GIMP or ImageJ for interactive use).

The leptonlib, found at Leptonica, is a C library for image processing and analysis. The web site contains some good reading besides library documentation as well. The GD Graphics Library (new site) is a comprehensive graphics library. GEGL is a newcomer from the GIMP community. GFXprim is a recent 2D bitmap graphics library with C and Python APIs. Anyway, I'd like a simple wrapper around the different librarys for working with image formats, to easily load an image into memory and access the individual pixels (and nothing else), but have not found this yet.

There are too many C++ libraries for image processing and computer vision. Some of them are: CImg, FreeImage, GIL (part of Boost), Insight Segmentation and Registration Toolkit (ITK), libCVD, LTI-Lib, Mimas toolkit, NASA Vision Workbench (github repository), OpenCV, OpenImageIO, ORFEO Tool Box (alternative link), VIPS, VSIPL++, VXL.

Marvin seems to be an interesting Java image processing framework.

A few free OCR programs are Clara OCR, Conjecture, Cuneiform (with YAGF GUI), GNU Ocrad, GOCR, ocre, OCRFeeder, ocropus, Tesseract OCR, XPLAB.

unpaper is an interesting program to process scanned paper sheets. Scan Tailor is an interactive program to post-process scanned images.

ssocr uses Imlib2 for image I/O.


back to my homepage.