[Tfug] Let's play "ID this code"! (serious issue actually)

Bexley Hall bexley401 at yahoo.com
Mon Aug 24 07:36:15 MST 2009


> Let's look at where I'm at with this data:
> 
> 1) It came as part of a public records request, so it's 100% legal to
> do anything we want with it.  IF the voting system vendor were to for
> some reason choke on what we've found and went for a court injunction
> trying to "pull it back", they would fail miserably. 
> Copies are now in several states; if they went forward in state court
> somewhere, we'd just examine it in another.  If they tried to get the
> Feds involved...that *might* work, maybe.  Except for point two:
> 
> 2) The county we got it from (in California, I don't want to say which
> one) claims they stripped voting system vendor proprietary crap from
> it, specifically the "schema" which they (jointly, county and voting
> system vendor) claim was vendor trade secrets or other intellectual
> property.  We have that in writing from the county government.  BUT:
> 
> 3) Peering into the raw data with a good text editor I've been able to
> confirm the existence of thousands of lines of MS-SQL source code such
> as the snippets I showed y'all.  While SQL isn't normally my thing,
> I've seen it before and it's not all that hard to understand.  From
> what I can gather so far, and this needs to be confirmed by people
> better at this than me, there is a ton of "logic" here regarding how
> votes are processed.  Which leads to...

From the snippoet you showed, there was nothing that dealt with
"votes".  Rather, it all looked like prep work to order data
for presentation (possibly to the voter at the time the
ballot is cast?  possibly in a "report" shown to polling
officials?  <shrug>  don't know since this isn't my application
domain)
 
> 4) There are RULES about voting systems.  Two of them are "all code
> used needs to be checked out by test labs and then hash-checked to
> make sure the code in the field is the approved crap" (paraphrase
> obviously).  This mess doesn't just violate that rule, it stomps all
> over it.  Because data and program code are mixed into the same file,
> hash-checking is impossible.  It violates two more rules: "no
> machine-modified code" (these databases with the program code embedded
> are created by one of two apps from this vendor) and "no interpreted
> code" - the latter mainly because it's "field modifiable" which is
> supposed to be banned too.

*All* code is "interpreted" by something.  Even "machine level"
code is interpreted by the hardware "machine".  There is nothing
that inherently makes "interpreted code" (in the sense you
intend it here) more vulnerable to tampering than "compiled code"
(again, playing fast and loose with terminology here)

If I write a piece of compiled code that builds a data structure
in memory AT RUN TIME and that data structure governs how some
other piece of code (soon to run) executes, is that interpreted
code?  What if my code goes a step further and builds a piece
of executable (machine language) code and then "jumps to it"?

Clearly, my compiled code could build *different* data structures
or generate different snippets of machine language code -- each
which affects the overall operation of the "program" -- just like
a simple inline *conditional* might!
    if (voter == DEMOCRAT)
       votes_cast[DEMOCRAT]--;
    else
       votes_cast[REPUBLICAN]++;

Even if the code is not (directly) modifiable, it could modify
data structures that *drive* the algorithms.

Naive/simplistic rules like "no interpreted code" do little to
influence the "security" of the code itself.  Just like saying
"no steroids in sports" does little to keep things that BEHAVE
like steroids (but are untestable as such) out of the sport.

> So where does that leave us?  What I want out of the
> data is...
> 
> a) I want to know what the reported vote totals are for this county,
> by precinct, so I can compare it to official numbers. To do this, I
> need readable tables.
> 
> b) I want to do at least a preliminary analysis of the source code, to
> see if it leaves security holes in place and/or whether or not the
> logic present could flip an election if tweaked.  Either (esp. the
> latter) would bolster the case that this junk is illegal top to bottom
> and yet another failure of the oversight/testing process that's failed
> so many times before.  This can actually be done to some degree just
> with the level of access to the files I have now, the ability to page
> through them with a (good) text editor.  (So far "NEdit" dated 2004,
> present in the Ubuntu Karmic64 repos has been the best available...the
> files are filled with null characters and other such problems and
> range in size from 600megs to a gig.  Opening them pushed my poor
> budget 2gig lappy to the max.)
> 
> If ALL we end up with is even a partial source code/security analysis,
> we'll have pushed the ball forward.

The *first* thing YOU (having knowledge of the application domain)
have to do is go through and identify every data type used (in case
there are any "user defined data types) along with every "datum"
(field/column/attribute/whatever-you-want-to-call-it).  So you
know *what* is in the database.  E.g., what is a "contest" (in
the mind of the database's *designer*!)?  Presumably this tells 
you something about what a ContestID is (no doubt a "unique
identifier" assigned -- perhaps automatically by the RDBMS -- to
each specific "Contest"... which are probably tabulated in a
"CONTESTS" table someplace).

Once you know what all the data are and their types, you need to
figure out what each *table* is likely to be used for.  This gives
some meaning to the columns that comprise that table and the
entries/records/rows/whatever within it.

E.g., the CONTESTS table (apologies to SQL-ers for my misuse of
case conventions) might list something like:

ID  STATE  COUNTY   CITY     RACE
1   AZ     Pima     Tucson   Ward 6 Councilperson
2   AZ     Pima     Tucson   Sheriff
3   AZ     Maricopa Feenigs  Mayor
...

This lets you "guess" that the CONTESTS simply enumerate all of the
"races" that are covered in this particular "election".

Somewhere else, there might be a CONTESTANTS table:

ID  FIRST   LAST       PARTY         CONTESTID
85  Joe     Shmoe      Republican    2
...
22  Leroy   Brown      Republican    3
...
15  Joe     Shmoe      Democrat      4
...
96  Bob     Barker     Democrat      2
...

Which tells us that Joe Shmoe (R) is running against Bob
Barker (D) -- and possibly others -- for Sheriff of Tucson
(nonexistent job).  [NB, it also tells us that there are
two Joe Shmoes -- no doubt different individuals? -- running
in different races]

Yet another table might be VOTES:

ID    BALLOT    CONTESTID     VOTE
9345  10277     2             85

(I make no claim that this is a *good* way to do things or
represents how it is actually *done* in the tables!)

Point is, until you have identified all of the fields, you
can't begin to meaningfully put together the tables *or*
fully understand what the SQL is trying to do.


      




More information about the tfug mailing list