[Tfug] OT? "Hamming distance" in non-binary codes?

Robert Hunter hunter at tfug.org
Wed Feb 5 22:07:41 MST 2014


Hey, Don.

On Wed, Feb 5, 2014 at 5:48 PM, Bexley Hall <bexley401 at yahoo.com> wrote:

> I need to come up with a (large) set of identifiers.
> Ideally, numeric (decimal).  And, as "short" as
> *practical*.
>

Human readable?  Why decimal?  Why not alphanumeric, for instance?


>
> But, a naive implementation (1, 2, 3, 4, 5, ...) is
> prone to problems in transcription, etc.
>
> For example, depending on typeface, an '8' can be
> misread as a '3'.  Or even a '6' or '9'.  Perhaps
> a '6' misread as a '5', etc.
>

> *If* you can choose the graphic representation of
> each symbol (i.e., typeface and size), you can probably
> minimize the chances for these sorts of "misreads".
> However, if you are at the mercy of others in how they
> record and present the data, then there is nothing that
> you can do to prevent the choice of a typeface that
> adds to this ambiguity (and, for hand-written identifiers,
> all bets are off:  "Is that a 4 or a 9?")
>

> In effect, you want to increase the hamming distance between
> "valid" identifiers so these sorts of transcription errors
> are minimized and/or easily recognized ("Um, '33333' is not
> a valid identifier -- are you sure you don't mean '38838'?").
>

Hamming distance is the minimum number of substitutions it takes to mutate
one string to an other.  If you're concerned with transcription errors that
are due to misreads (rather than some random process), then you should be
able to state rules that will help avoid this, for example: no two
identifiers, M and N, exist such that the i-th position of M is '3', and
that of J is '8'.

Additionally, this helps provide some protection against
> folks "guessing" valid identifiers.  Credit card issuers
> (the phreaking topic reminded me of this) exploit this to
> minimize the chance of a CC account number being "mis-read"
> (or, mis-heard!) orally, etc.
>

The checksum is used in credit card numbers is used to protect against
accidental errors, and is not intended for cryptographic use.  See
http://en.wikipedia.org/wiki/Luhn_algorithm.  You could certainly layer
this on top of some robust transcription scheme.


>
> Any suggestions as to how to approach this deterministically?
> Ruling out digits (symbols) that can be misread/miswritten takes
> a huge cunk out of the identifier space (i.e., each "digit place"
> can only represent ~5 different symbols instead of *10*).  OTOH,
> I think smarter coding schemes can provide me with something more
> than this.  E.g., disallow certain combinations of "susceptible
> digits" but still allow their use (trivial example: 83 and 38 are
> valid but 33 and 88 aren't)
>

This seems likely the subject of existing research.  I would be interested
if you dig anything up.

-- 
RH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tfug.org/pipermail/tfug_tfug.org/attachments/20140205/4ac49467/attachment-0002.html>


More information about the tfug mailing list