[Tfug] Automated security checks pupyve6a

Sun Mar 2 14:31:26 MST 2014

Hi John,

[Mmmm.... Sunday lunch.  Finestkind!]

On 3/2/2014 1:50 AM, John Gruenenfelder wrote:
> On Sat, Feb 01, 2014 at 12:30:22PM -0700, Bexley Hall wrote:
>>> Recently, while digging around some of the new packages to find their way into
>>> the Debian archive, I came across a handful of harden-* packages.  These
>>> are metapackages designed to offer suggestions for *helping* you to secure a
>>> system.
>>
>> I think you need to first define the type of system you are trying to
>> secure (i.e., how it *hopes* to be used and how you *fear* it can be
>> ABused).  E.g., a box that (just) accumulates PR's has a different
>> attack surface than one that has to be a "general timesharing system".
>
> Okay, another email thread I neglected for far too long before replying.  :)

You should make a list of stuff that you need to do.  Then,
maybe get some software to *manage* this list of things to do...

<fiendish grin>

> What I was intending by "automated" was a utility that requires
> not-too-extensive configuration while still providing some useful feedback.
> This would most likely mean a distribution provided package as the maintainer
> would have done a lot of the leg work to make it work semi-seamlessly on
> target distro X.

I don't see that as a *practical* solution (YMMV).  It expects the
"packager" to:
- be pretty competent (enough to recognize what messages can *imply*
   from a security context)
- be pretty thorough (to examine all the *potential* messages that
   can be generated
- have an identically configured system to yours
And, it means your system has to be pretty stagnant (or, the packager
needs to be actively tracking changes to it!)

> In my case, I have several machines configured in a number of different ways
> so, for the Net accessible ones anyway, I don't want to a package to require
> hours of configuration work to get it to function properly without multitudes
> of false positives.

Understood.

I take a very different approach to security on my work machines -- I
just don't let them talk to the outside world!  :>  (and, thus, prevent
the outside world from talking to *them*!)

> In particular, I have my "closet server" which nominally runs Debian/testing.
> My main work/development machine (I'm the admin, but it has a small number of
> other users) also nominally runs Debian/testing.  Both of these machines
> occasionally pull from Debian/unstable in case the machine, myself, or a user,
> needs some software that's too new for Debian/stable/testing.  Finally, I have
> another seldom (few other tasks than sitting on the Net and serving the random
> file) that runs Debian/stable.  If you know much about the Debian archive, you
> can see that each machine can (and is likely) organized rather differently or,
> at least, contains vastly different software versions.

So, there is no guarantee that the messages generated by *today's*
mix of software will be the same as those created by *tomorrow's*
mix.  The phrasing of a message can change, the conditions under
which it is generated can change, new messages can be introduced,
etc.

Similarly, the messages that one box experiences IN ONE TYPE OF
USAGE PATTERN may differ from those that another box generates
in a DIFFERENT usage pattern.

> I guess, in a nutshell, I'm looking for a tool that will periodically scan my
> system for questionable security issues *and* make sure I haven't made some
> stupid mistake that could have a real-world security implication.  The tool
> "rkhunter" fits this bill fairly well, I think.  See below for my impressions
> on it.

You are looking for something *proactive* instead of *reactive* (which
was the approach I was suggesting -- let the box tell you what it
encounters and *you* advise it based on those reports)

>>> After a little more checking, I found that logcheck is actually maintained by
>>> the "Debian logcheck Team" and that there is some effort underway to resurrect
>>> and update the package and database.  The webpage they have set up is rather
>>> sparse and I didn't see any information about when they might put out a new
>>> useful version.

I think the problem with anything that is "database/dataset driven"
is that it will inherently be the proverbial dog's tail -- it will
always lag the releases of the software that it monitors as it's
maintainer's will not know what to add to the dataset until they
get a chance to see and use the code in question.

My suggestion inherently reflected this -- except it expected *you*
to review the generated messages and instruct it as to which to
ignore (presumably, you will *act* on those that should not be
ignored!)

[I know its not ideal --- none of the "third party" approaches will be!]

What you really need is a unified approach *across* applications, etc.

E.g., I've taken this approach with my (personal) distribution wrt
configuration issues.  Instead of a hodge-podge of different
configuration files (each with different syntax rules), command line
switches, etc., I've been methodically rewriting every service to
query a centralized database in which the configuration data is
preserved.

This lets me configure *everything* in the system in one place
(what Windows' "registry" tried to do -- poorly and incompletely).
This lets other agents tweek that configuration without having to
be aware of specific syntax requirements.

And, it lets services interact with each other without having to
know the details of their individual protocols.

For example, the ARP/DNS caches are just tables in that dataset.
An application doesn't have to know those individual, DIFFERENT
protocols to gain access to that information.  Instead, it can
just issue a query against the appropriate table to get the
information that it needs (while the appropriate agent/service
is simultaneously *maintaining* that data)

You'd need a similarly consistent approach to logfile entry generation
that you could "templatize" in order to be able to create simple
recognizers without having to understand the mindset of the various
maintainers of each service over its lifetime.

>> You can probably hack together a comparable system using something like
>> spamassasin (or any other trainable SPAM system) and a bit of your time
>> "feeding it" good/bad log file entries.  This seems more effective
>> (in the long term) than trying to analyze *all* potential log sources
>> and building regex's for each type of message (and, having to repeat the
>> exercise each time you change/upgrade some bit of software)
>>
>> You could conceivably use such a system to process reports from other
>> "security" packages.  Sort of an "intelligent agent".
>
> I can see this potentially working, but I think the needed training set would
> be far larger than the logs I have available on these systems.  Unlike mass
> spam, log entries are just going to be too different, even for valid entries:
> timestamps, hostnames, path names, differing config entries, etc.

I wasn't expecting you to train the system in one shot.  Rather,
let it spit out messages that -- using its current notion of
"what can be ignored" -- it deems important.  Then, you *respond*
to those messages in a manner that the tool can observe and use
to adjust its acceptance criteria.  So, the next time a "similar"
(whatever that means) message is emitted, the tool knows how to
route/handle it.

I started work on a Tbird plugin for USENET that would essentially
act as an "NNTP spam filter" -- where the individual user decided
what was "spam" based on which messages (posts) he opted to read
and how he reacted to those, once read.

For example, much of USENET has degenerated into "old farts" who
don't have the balls to go into a bar and engage others in their
rants -- for fear of getting a mouth full of fist!  So, I would
want that filter to read the contents of my portion of the
news feed (this is atypical for NNTP where you normally only
see the contents of messages you explicitly ask to read!) and
*hide* those posts that contain content that I don't want to
be bothered with.

E.g., it's easy to recognize profanity, political rants, etc.
just with short keyword lists.  Then, *learn* which "posters"
are most likely to engage in those behaviors and use that to
further bias the decision *against* showing their posts.  At
the same time, noticing which posts I opted to read *and*
reply to and bias the decision making process to *favor*
those posts.

As I didn't want to have to dick with the Tbird sources (I read
mail/news on a windows machine and have no desire to engage in
any development under windows!), i just hacked together an
interface whereby I would "reply" to messages in different ways
based on how I wanted to train the recognizer for that "type"
of message content.

The box that was acting as my "USENET agent" (slurp(1)-ing the feed
into an NNTP server that it presented to me) would then, parse my
"replies" to decide what to pass to the filtering agent (that it
implemented) and what to pass back to the *real* NNTP service.

I would think something similar could work for log files -- "reply"
to messages in different ways based on whether you think they should
be ignorable in the future *or* "significant".  Then, let the agent
that you "replied to" use that information to tune the filter.

The advantage this has is that it lets you *see* the messages -- which
will be different based on your different machines, software
configurations and *usage* patterns -- and simultaneously tweek
the rules as appropriate for *that* machine/software/usage.  Any
software/configuration changes could cause it to just send you
*advisories*:  "Based on what I had previously learned, I think
I can ignore this message, below.  If you think otherwise, please
tell me!"

> Despite the workload, I do think the extensive regex method is probably
> better, though it has the big disadvantages of requiring a lot of assistance
> from *other* package maintainers along with the need to be frequently
> monitored/updated.
>
> At the very least, this is a lot more work that I'm looking into puttint into
> this (presently, anyway).

Exactly.  This is the same argument that is used AGAINST commenting
code -- comments mean additional *maintenance* (unmaintained comments
are worse than MISSING comments!)

I try to come up with solutions that fit an existing workflow/pattern
and try to "learn" from it instead of putting a big pile of work in
one place -- esp if that will end up needing to be repeated RSN!  :<

>>> On a somewhat related note, I've been using "denyhosts" for quite some time on
>>> a few different systems.  Denyhosts is an answer to the problem of idiot
>>> crackers trying to get into your system by flooding your box with countless
>>> SSH connection attempts.  They are not trying to exploit flaws in SSH
>>> implementations, rather they appear to have some database of common account
>>> IDs and common poor passwords and operate on the idea that eventually one of
>>> them will work somewhere on the Net.  Denyhosts is a fairly simple
>>> countermeasure which monitors your login attempt logfile and keeps track of
>>> the number of failed attempts from each IP address.  It then adds entries to
>>> the hosts.deny file to keep these people away.
>>
>> Here there be dragons.  Someone can potentially trick a system like
>> this into denying *you* access (by masquerading as your machine and
>> trying to do things that it *knows* you wouldn't like being done;
>> once these things are detected, a rule is dutifully added that locks
>> *you* out!)
>
> I'm not sure I follow.  The tool tracks IP addresses and it blocks users for
> either a) too many failed login attempts, or b) too many connection attempts
> in too short a time.  How might somebody use this against a valid user?  If
> they knew of a username that was valid on the target host they could
> conceivably block access to that user FROM that IP, but how could they deny
> access for that user who will likely be connecting from a different IP?

*Does* the user have access to another IP?  E.g., if I am accessing your
host from my house, I have to change the IP at my house (meaning a call
to my service provider and *hoping* to convince them that my need is
genuine and not just "more work for them") in order to log in.

Imagine you are a business.  I work there.  I *know* "johng" is a valid
user name -- because I send you intraoffice email all the time and
notice that *my* email address happens to coincide with *my* login
name (not unusual).

You piss me off.  Or, I just want to "mess with you".  I walk up to
every workstation in the department (come in early, stay late, at
lunch, etc.) and casually log in as "johng" with the password
"invalid".  Try it three or four times.  Then, walk away.

Chances are, "bobm" (the guy whose workstation I just used) will
*never* need to log in as "johng".  So, he will never be aware that
"johng" is not a valid login from *his* workstation.  Similarly,
*you* will never need to login at bob's workstation so *you*
won't know that you have been denied access from there.

[i.e., you have no advance knowledge that I am undertaking this
attack]

Some time (days, weeks?) later, I have done this from every workstation
*except* yours.  Now, I repeat the exercise there.  Shortly thereafter,
you try to login and can't.  Of course, you don't understand *why*
you are being denied access.

For the hell of it, you try to log in using the workstation of your
office mate -- or the guy across the hall, etc.

And, are denied access from there, as well!

[If you were "surprised" at your initial inability to login from
*your* workstation, by this time, you are probably starting to
wonder if "corporate" has deliberately locked you out -- as is
common just before *firing* someone!  :> ]

Keep in mind the other practical aspects of "actual" network
setups.  In most (all?) cases, your IP is defined by something
*in* your workstation (static assignment and/or dynamic assignment
based on MAC).  It is *not* (usually) defined by a physical location
or a specific "network cable".

So, I could, theoretically, carry out this entire attack from
*my* office just by tweeking the IP/MAC in my "attack machine"
to *masquerade* as each of these other workstations.  I could
methodically harvest all the information that I need ahead of
time and just replay an "automated attack" during a lunch break!

[Sure, this could be "fixed" by visiting the IT staff and
convincing them to look into it, etc.  But, I've now made
*you* more visible -- not me -- in the context of this
"problem"]

> In my
> experience, Denyhosts did its job very well and I only ever had to manually
> purge one entry from the DB when a valid user accidentally failed too many
> times (caps lock key mistake, I think).
>
>>> A new package to appear in the archive is "libpam-shield".  Its description is
>>> quite short, but it does indicate that it is used to lock out remote attackers
>>> trying to brute-force their way in with password guessing and does so using
>>> iptables.  Since it is a PAM module it won't cover as many services as
>>> fail2ban appears to.  I have not yet enabled it, but having looked at its
>>> config file it hits all the important points such as a persistent database of
>>> IPs and automatic timed expiration of entries.  It also supports blocking by
>>> null-routing, using iptables, or by using iptables via the ufw firewall
>>> package.  Has anybody used libpam-shield before?
>
> I eventually decided to go with libpam-shield.  So far, I found it to be
> extremely easy to configure and easy to use since PAM config on Debian is so
> easy.  By occasionally monitoring the logfiles, it appears to be operating as
> it should.
>
> The documentation could be a bit better, though.

FOSS == "the documentation could be a bit better"  :>

(and "a bit" is being exceedingly gracious!)

> For example, as I mentioned
> above, Denyhosts would block when, from a given IP, a valid user account login
> failed too many times *or* if that IP made too many connections in a given
> time span (say, 10 connections in five minutes).  From the logs, I know that
> pam-shield will block based on the second type of attack, but I don't know if
> it also blocks based on the first.

"Use the source, Luke."

> At any rate, I'm happy enough with its performance.  Also, this type of attack
> seems to fallen out of favor in recent times.  When I first installed
> denyhosts, the machine was literally being hammered all the time by script
> kiddies trying this stupid (and unlikely to every actually work) method.
> Today, it doesn't seem to happen nearly as often.

Have you been able to gauge what (if any) attacks are being deployed
against you?  I.e., do you even know what the attack surface *is*
that you need to protect/shrink?

> During my investigations, I tried a number of other tools.  In particular, I
> gave the intrusion detection system (IDS) "Samhain" a try.  Based on its
> apt-provided description, I liked its supported features.  Unfortunately, as I
> mentioned above about how my machines are configured, the installed software
> simply changes *far* too often for an IDS to work effectively.  For any IDS to
> work properly, the software, and therefore the computer hash signatures of
> that software, simply can't change too often otherwise the tool will
> constantly be detecting changed binaries and complain.

Yup.  And, it doesn't do anything to protect against "live exploits".
I.e., (mis)using a tool in its *original* state to "do evil".

> Samhain *does* have the really nice feature of natively supporting the usage
> of "prelink" which, if you don't know, pre-links binaries and some libraries
> so that there is less work for the linker to do when you initially run a
> program.  Of course, this alters the binary and and hash computed over its
> contents.  Since I do have one machine running Debian/stable and its software
> rarely changes, I may still try using Samhain on that machine and see how it
> goes.
>
> Finally, there is my old standby security tool "rkhunter" that I have been
> using for quite some time.  As its name implies, its primary focus is to scan
> your system for rootkit installations by looking for their telltale
> signatures.  That's good.  It also performs a number of other useful security
> checks for you.  That's also very good, and, in fact, it's the real reason I
> use rkhunter.
>
> So, along with the RK scanning, rkhunter also keeps a properties DB on a small
> handful of files (far smaller than a true IDS) so that it can detect changes
> in them: things like user/owner changes, suid bits, etc.  In can also check
> the hashes of these files, if configured, and also has support for systems
> using prelink.  I'm not using it, but rkhunter can also work with certain
> package managers to, like an IDS, verify that a file's hash matches what the
> package manager thinks it should.  rkhunter is also nice enough to have a
> aptitude hook so that its properties database is updated whenever you finish
> an install/uninstall action with aptitude.

Sounds like it at least *tries* to accommodate reality -- instead of
forcing reality to adapt to *it*!

> Don, thanks for the DDOS documents.  They were interesting, especially the
> DDOS taxonomy list.  I'm glad I don't have to deal with this sort of thing...
> :)

IMO, the *real* threat that you (at least, *I*) have to sort out.
AFAICT, there is really no way to guard against it (short of having
a multiplicity of "hidden" interfaces so you can shut down one that
is under attack).

In my case, I can easily recognize attacks as I *know* the traffic
that I expect on each interface at all times.  If I even *see* an
unexpected IP, MAC, port, protocol, etc. on a particular i/f at a
particular time, it is either an attack or something *broken*.
In either case, I can shut down *that* (and ONLY that) i/f and
"shed" the service(s) associated with it (or, the device that
I *assumed* was on the other end).

E.g., "Why is that IP camera (at least, that's what the IP/MAC
is registered as!) trying to connect to port 23?"