[Tfug] Paying coding gig notice - anybody want a piece of this?

Fri Dec 28 22:41:21 MST 2007

I've finished the program specs for a utility meant to check Diebold
vote totals databases.  It's now time to find somebody interested in
coding this in the tool of your choice.

The short form:

Throughout an election the Diebold central vote tabulator station (the
big one taking in all votes county-wide) writes out databases in
MS-Access.  An election period starts with something called a "logic
and accuracy" (L&A) test which is supposed to make sure that the
elements of the database that control the election (candidate names,
ballot layout and rotation, precincts, parties, etc.) "work" - with
test data.  The test data is blown away and the mail-in vote
processing starts before election day.  More votes come in during and
after election day, and then a final canvass is done.  Then more test
data is run through in a final L&A test, and they call it good.

Between those two L&A tests most of the database is supposed to be
"frozen".  Some tables will of course increment - vote totals, audit
logs, etc.  But most of it is by law supposed to be "frozen" - it's
really bad mojo for example to be swapping candidate IDs around in
mid-stream.

It's childishly easy to do so of course!

During all this, each day's work is saved to a newly-named file.
Sometimes if there's heavy processing there will be more than one save
per day.  In the RTA '06 election fr'instance there's about 40 files
all told, backing up each step in the election.

The purpose of the software tool we've designed is to check across all
versions of the data in an automated fashion, looking for trouble.

Design specs to follow.  And yes, this is a real paying gig - the
resulting tool will be open-source and available for anybody
nationally (or globally) who wants to check Diebold databases.
Command line usage is good enough for now.  Bidders should list some
of their credentials :).  We're assuming this is a couple grand
minimum, payable via the Pima County Democratic Party...

As a separate thing, comments on the design specs are welcome too.
And yeah, the "title" is just a working gag for now...

Jim March

---------

APPENDIX A: Program Design Specifications for the "Diebold Automated
Manipulation Notary" Tool (DAMN Tool).

PURPOSE AND SCOPE

The purpose is to provide an automated integrity checker program
("tool") for election databases produced with the Diebold GEMS central
tabulator software.  An initial version of the tool can be
command-line driven with the user inputs as command line parameters,
or the whole thing can be menu driven in a graphical user interface.
It should operate in any common language that can read from an Access
database.  It must be completely open source, and it's own data and
reports must be readable in free tools such as the OpenOffice database
and/or spreadsheet or open-source equivalent.  It should not be
necessary to own a paid-up copy of MS-Access to use the tool.
Microsoft provides a free Access database viewer for Windows and
equivalents that runs under Linux exists.

Running under Windows is (sigh) preferred but cross-platform
compatibility would be very welcome.  If we have to use Linux, maybe
building a LiveCD for this purpose would work?

The tool will take all the versions of the data for a given election
from the initial L&A test to the last and perform a series of
automated integrity checks against them.  It will create it's own
database, write out it's findings to that database and create a series
of reports based on them.

The tool's purpose isn't to "detect fraud", but rather to detect
places in the data warranting further "human eyeball study".  We do
not think there is an alternative to intelligent human analysis;
rather, the tool is about sorting the "wheat from the chaff" and
assist an informed human in looking for potential manipulation or
corruption in Diebold GEMS data.

The core purpose of the tool is to make sure that elements of the
databases that are legally not supposed to change through the election
cycle (after the initial Logic & Accuracy "L&A" test) do not in fact
change.  It does so through internal timestamp analysis and
comparisons of table data.  It starts out "knowing" (through user
input) which tables are known to increment in a properly run election
– the audit logs and vote totals for starters, and possibly more we'll
learn as we go – and then looks for changes in what's not supposed to
change according to the law.

Finally, it will take the vote totals tables (both main ones at a
minimum, the third if possible) and compare them – alerting on any
data iteration that doesn't match.

In some cases we anticipate the tool's reports will be created by
local election integrity activists and then Emailed to experts
elsewhere to get a "first impression" as to the data's integrity.

RUNTIME CONDITIONS

1) Any .GBF files from a given election cycle will be converted to
.MDB prior to the execution of the tool by the users - the tool need
only cope with .MDBs.  (Taking apart GBF files would be a "version 2"
feature – since .GBF is Diebold's variant of the old Pkware .ZIP
compression system it should be possible with some additional R&D.)

2) All of the files for a given election cycle will be loaded into a
given directory ("folder") prior to execution, with nothing else in
that directory.

THE TOOL'S LOGIC

a) First, the tool will be designed so as NOT to alter existing
databases.  It will create it's own database to hold the data it
collects and perform reports.

b) The tool will accept the target directory of the data as it's
initial input (command line or menu driven).

c) The tool will accept the names of tables that are "known to grow"
(audit logs, vote totals, others?) as it's second input, command line
or menu driven.  If we learn through running the tool that other
tables also increment through the Diebold process as normal procedure,
we can start over and add the names of those tables to the tool's
input so as to reduce the number of "false positive hits" later.  (In
other words, as we use the tool and analyze the databases with it, we
should learn enough to improve the automated process if need be.)

d) The tool will also accept as user input the known date and time of
the initial L&A test.

e) The first piece of data recorded from the GEMS databases will be
their filenames and timestamps for file creation, modification and
"last accessed".

f) The second piece of data the tool will extract is the names of
every table.  These will be recorded to the tool's database, and then
each iteration of the GEMS data will be probed to make sure it has all
of those tables and no others – in other words, that all table names
match.

g) The third piece of data extracted from the GEMS databases will be
the timestamps for each table in each database.

h) At this point, a report can be run listing the data files in their
apparent order per their time and date stamps.  A human can look at
these and make sure they correspond to the filenames ("Early day one",
"early day two" and the like).  If things are out of order, that may
be a clue that poor data management practices (at a minimum) are going
on and (possibly) votehacking.

i) Next, the user can ask for a "table timestamp sanity check" - it
can take the names of tables that it suspects shouldn't change (as the
"known to change" tables are already in it's data and can be excluded)
and report the time and date stamps of any table in any database that
appears to be after the initial L&A time and date.  Any positive
reports (listed by database name and table) should be checked with a
human eyeball to see if this is a table that should be frozen post-L&A
and if it is, do the time and date stamps indicate improper tampering?
 This is a reporting function separate from any other report.

j) The user now tells the tool to compare all tables suspected of
being the frozen type (in other words, all tables except those the
tool has been told normally change) and compare them across time,
reporting where changes occur by filename, table and line number.
This will ID possibly tampering in the "frozen tables".  (If a given
table has a large number of changes, that should be analyzed by a
human to determine if that's proper GEMS behavior and if so, add it to
the list of "don't alert" tables.  Once we learn more through the use
of the tool about how GEMS works, the tables excluded from this test
can be hard-coded into the tool.)

k) Finally, the tool should have been pre-programmed with the tables
known to contain vote totals (already understood by the community of
people studying these things) and for every database iteration, report
whether or not vote totals match.  Where they don't match, list the
table names and line numbers of each discrepancy for human review.
(If they don't match at one point in time but do match later, a human
needs to look at it and ask if the database somehow "broke" and was
manually corrected later.  It would also be useful to make sure any
human corrections were accurate, and recorded properly in the audit
log.  If the audit log doesn't show corrections, they were made
illicitly outside of the normal GEMS process (MS-Access or
equivalent).

Note that storing the votes three times violates some very basic data
management practices on Diebold's part and is suspected of being a
deliberately "hack friendly feature".  We know what two of the three
tables do: one reports vote totals if you ask the Diebold software for
a given precinct's vote summary while another is polled for
county-wide totals.  Let's say an honest elections official/staffer
"smells a rat" - he hand-counts the votes for a precinct as a
spot-check and compares it to that precinct's report.  It can come up
good (over and over again) while the county-wide totals are in fact
hacked.  The product manager for this stuff at Global Election Systems
(pre-Diebold) had a previous felony embezzlement conviction of over
$425,000 in which he used a rigged accounting computer...kinda makes
us wonder...?