[Tfug] "Downgrading" ("underclocking?") processors aha0evug

Fri Feb 21 04:31:15 MST 2014

Hi John,

>> Can I take a box off the shelf, write ANY SOFTWARE I WANT to run
>> on that BARE METAL and be assured that the machine will protect
>> itself *and* guarantee a specific level of performance (if so,
>> what EXACTLY is that?) regardless of temperature?
>
> If it is a modern Intel/AMD based system: yes it will protect itself. I
> believe that all of the protection happens at the BIOS level or below.

It can only do that by shedding it's "responsibilities".  I.e.,
by not providing the intended functionality.

>> Hmmm... I think UofA auction was yesterday. So, I'll have to wait
>> two weeks before I'll have a chance to find something "disposable"
>> to experiment on. Or, maybe I'll try WC to see if they have a
>> couple of "scrap" machines that I can toast.
>>
>> See what happens when machine sits idle with no ventilation.
>> Running a full workload on bare metal.
>> Running a "modern OS" (Windows/Linux/*BSD) with the same full workload.
>> Same experiments with fans unplugged (system should be able to *sense*
>> this BEFORE it ever starts to heat up! What will it do to protect
>> itself?)
>>
>> Then, figure out what constraints this imposes on the choice of
>> components that can be stuffed *in* the case. It may be that
>> the server-side of this project is just not well suited to
>> an "open" solution. Maybe just let folks design their own
>> "motes" and "applets" and keep the server's design more "controlled".
>
> What is the level of guarantee needed? Who cares if you don't run the
> user applets and what not? (See more below.)

What aspects are important to the user?  How does he convey those
preferences to the system?  How does he know what portion (percentage)
of the system will be available to him?  And, how much each "aspect"
consumes?

(see what I mean about creating a *bigger* problem?  what if the user
is NOT technically competent?)

>>> Something would have to screw up pretty badly for the machine to melt or
>>> even damage itself. Generally you'd just see performance go down.
>>
>> So, what level does it fall to? Where do you "look up" that detail?
>> If you can only count on 80% of "normal", then why not set the CPU to
>> run at 80% of normal and design the entire system to operate under
>> those conditions -- because it *has* to guarantee that all the
>> intended work actually gets done!
>>
>> Or, do you come up with some scheme for prioritizing which activities
>> can be shed?
>>
>> "Hmmm... maybe I shouldn't worry about monitoring for intruders as its
>> probably more important to ensure the temperature inside the building
>> stays comfortable for the pets/plants/etc? Or, maybe skip watering
>> the yard in the hope that it rains while I concentrate on watching
>> for burglars? Or, ..."
>
> Or you just give up.

Would you be happy if you discovered that your plants didn't get
watered, the house temperature was exactly equal to the outdoor
temperature, the burglar alarm was turned off, the garage door
didn't open when you drove up (or, was left OPEN!), none of your
incoming phone calls were answered, none of the TV shows you wanted
were recorded, etc.?

> Have you done a hazard analysis to understand what
> happens if something fails? Which failures are acceptable? If someone is
> trusting you $1,000,000,000,000 house to a single computer then they are
> asking for trouble. What if the power goes out. Or the the water main
> burst and the computer is underwater? The machine throttling and/or
> shutting down is just another of these failures. If you really need to
> guarantee that it works then you need a second computer, performing the
> same calculations, and then checks to make sure that both machines got
> the same.

Do you have two thermostats in your house?  Two furnaces?  Two answering
machines?  Two DVRs?  Two burglar alarms?  etc.

You expect a certain level of availability/reliability from each of
these.  Why would you expect less from an "integrated system"?

What do you do if your TV is underwater?  Or, the power fails and
your answering machine stops taking messages?

I.e., if you don't want to risk your TV being underwater, you don't
locate it where it is likely to find itself in that situation!  If
you don't want to miss messages due to a power outage, you ensure
your answering machine is backed up.

> It sounds like the problem is that the system doesn't fail safety. I'm
> not sure that it needs to but you are talking like it does. If being
> unable to ensure the temperature stays comfortable is a 'serious'
> problem, then you need to evaluate how to guarantee that any problem
> (e.g. power loss) doesn't cripple the system.

What is "safe"?  With a real-time system, you have deadlines.  You
care *when* things get done.  I.e., a machine running at 80% of
capacity conceivably only gets 80% of its work done.  Depending on
the individual needs of those "things", it is possible that *all*
deadlines are missed (unless the system's design knows to accommodate
"operating at 20% derating")

How do *I* decide what is important to a particular user?  How can
I *ensure* these goals are met, in all cases?

E.g., my "water controller" allows "water consumers" to request
"water allocations".  It acts as a resource allocator, enabling
individual consumers and ensuring that the total resource is not
overconsumed.

For example, if I turn on all of the irrigation valves, the water
pressure to the house will drop to a point where it would be very
noticeable.  So, if I *sense* that someone is taking a shower, I
will defer irrigation activities -- because I can't defer the
*shower*!

The AI that governs the irrigation controller takes into account
lots of factors that affect the water needs of the plants in the
yard.  Watering is done at night as this cuts down on evaporative
losses.  OTOH, doing so *long* before sunrise results in less
effective uptake of that water by the plants (they are dormant
at night).

If "tomorrow" (i.e., after sunrise) is likely to be very warm/dry,
the watering cycles are increased to make more water available to
each plant (cuz they won't be watered *during* the heat of the day).

But, what happens if SWMBO wakes up early and opts to take a shower
before sunrise.  The irrigation controller dutifully tries not to
compete for water -- running the risk that this "user activity"
causes the irrigation activities to spill over into daylight hours.

Is the system broken?  Unsafe?  Do I blame the system for the *user's*
behavior?

Similarly, the HVAC controller looks at environmental conditions
to determine the most effective way of heating/cooling the house.
On a warm winter day, it may be smarter to crack some doors and
windows than run the heat.  During the summer, it may be wise to
do the same in the wee hours of the morning to take advantage of
cooler outdoor air.

If the expected outdoor temperature/humidity suggests that the
house can be cooled with the swamp cooler instead of the ACbrrr,
what happens if the windows are NOT open/cracked?  Running the
cooler will just make the house "icky".  Has the system failed?
Or, has the *user* (by not opening windows *for* the system to
exploit).

If windows are open and the outdoor temperature/humidity climb to
a point where *continued* use of the cooler will only INCREASE
the indoor temperature/discomfort level, the cooler should be
switched off, windows closed and ACbrrr turned on (assuming the
user is willing to trade this cost for comfort -- if the house is
empty, it may be better to just leave the cooler off and hope the
indoor temp doesn't rise faster than the cooler would *force* it
to rise)

What happens if these "actions/inactions" are the result of a
lack of computational resources in the system (because it is
operating at reduced capacity *or* "shutdown" due to overheating)?

I deliberately designed everything as soft real-time (which
requires more effort than hard real-time).  This ensures everything
is a "best effort" subject to the limitations in place at the
time the individual tasks present themselves.

When the user can define the nature and frequency of the tasks,
you can't do a schedulability analysis a priori.  How much water
will the plants need *today*?  When will the irrigation system
be done using water?  What TV shows will the user want to record
today?  Will he want to record while watching others?  Or, streaming
from a media tank?  What if someone wants to listen to music at
the same time?

[In school, we used to watch TV with the sound off and listen to
music at the same time]

> At my current job we are using Alan Bradley PLCs in our Safety System.
> In short the system is aware of all the other pieces and if it doesn't
> get a heartbeat from the kill button saying "I'm not pressed" then it
> kills things.[1] If the amplifiers aren't receiving their enable single
> from the Interlock Controller, they won't move, and the brakes will be
> de-energized (i.e. clamped shut preventing movement). It's a PITA but
> according to MIL-STD-882 it means we are 'safe'.

But you have a fixed set of tasks/responsibilities that your system
design has declared "important".

Imagine your "code" (ladder rungs) running on a faster/slower PLC.
Or, a different *model* AB PLC.
Or, a Siemens PLC.
Or, a PLC emulator running on a PC.
Or, a PLC emulator in a VM on a PC.
Or, hosting other *applications* (ladder rungs) that the user
injected into the system.

What guarantees do/can you now make?

That's the position I'm in.  Let the user pick his own PC.  Let him
add applications.  Let him (implicitly) decide what his priorities
are.  Then, *hope* my codebase performs as I intend it to...

> In your case, I think that the answer is that if the pet will get
> uncomfortable after more than 4 hours of offline system then someone
> needs to physically go to the premises and make sure that things are OK
> every 4 hours. If the plants will die after a 3 days of no water than
> someone needs to physically inspect them every ~2 days.

Or, you design the system so that it *will* reliably prevent these
things from happening.

What prevents your house temperature from dropping to dangerously
cold levels (frozen pipes) when unoccupied in the winter?  Do you
have folks stopping by every few hours to verify the pipes haven't
burst?  Or, do you EXPECT your thermostat to ensure this doesn't
happen?

> When we had a
> water main burst at my office over a long weekend, it was the 'security
> guard' who caught it, and prevented the basement from flooding. We have
> a security guard instead of a bunch of sensors, for exactly that reason.
> The cost of the guard, is a lot less than the value of the stuff being
> 'guarded'. If the pets, and plants, are that important then using /just/
> a typical computer to safeguard them is a bad idea.

I've taken the approach of adding instrumentation.  And, "smarts" to
interpret the signals from that instrumentation.  So, prolonged water
usage when the house is unoccupied or the occupants are asleep causes
the water main to be shut off.  Likewise, a slow, steady "leak" can
be detected and reported:  "one of the toilets probably has a faulty
valve/seal" (something a security guard *won't* catch -- unless its
really NOISEY!)

> [1] It became particularly apparent that we had a network configuration
> problem when my scp'ing a file off of one machine brought everything to
> its knees. Turns out that my machine, and the machine I was talking to
> were doing so over the PLC/Safety network. The large file that I was
> copying delayed the safety signals and things shutdown. Good news is
> that everything was safe.

I've armored the communication network.  You wouldn't be able
to inject traffic into the system even if you were inside one of
the bedrooms!  (or, the equipment closet itself!)  All
foreign/unauthorized/unexpected traffic is, at best, flagged as
a "system configuration error"; at worst, a deliberate *attack*.
In either case, it is blocked at the switch and each system node.
(imagine a firewall on each network drop *and* in each network
device -- and, that firewall knowing exactly what sort of traffic
from which MACs is expected on that link)