[Tfug] "Blade" servers

Bexley Hall bexley401 at yahoo.com
Tue Jul 23 01:54:15 MST 2013


Hi Yan,

On 7/22/2013 10:42 PM, Yan wrote:
>> These are individual *servers*?  Or, blade servers??
>
> We use a server form factor that fits 4 servers in one 2U case. Other than
> being in the same case, though, they aren't related at all (is, no central
> power management or some such).

OK.  So, it's no different than a bunch of physically separate
boxes all talking over a shared network.

>> OK, so you're "virtually" doing something akin to my approach
>> (I add actual power control to the process since "half" of the
>> computational power is not collocated with the blade server
>> and I have to ensure the "correct" processors are spun up to
>> make the "necessary" I/O's available for the task set at hand).
>>
>> In my case, processors boot differently depending on their
>> locations, roles, etc.  E.g., one boots of a physical disk;
>> some PXE boot; others boot specialized kernels from ROM/FLASH;
>> etc.  I can't allow the power cycling/bootstrapping to become
>> a visible/perceptible activity (i.e., users would never tolerate
>> waiting seconds for a node/service to come on-line -- so, most
>> nodes boot and load their applications in a fraction of a second).
>
> I think the way we'd handle that in our paradigm is to have some amount of
> compute nodes standing by, ready yo start computing at any point. That way,
> spinning up of new machines would only happen if we went over that minimal

Yes.  But that's because it is expensive (time) for you to bring a
machine up.  Legacy BIOS-type issues, spinning up disks, etc.  In
my case, it's just a matter of how quickly the power supplies
stabilize.  All of my protocols have been designed with this
sort of "quick startup" in mind.

E.g., a node already knows it's IP address, mask, etc. before it
accesses the network hardware.  No waiting for a DHCP server to
reply.  The PLL's used in the time synchronization algorithm
switch to very short time constants so they can quickly (almost
immediately) capture the *precise* current time (think: PTP, not
NTP).  Node specific "software" is already present in ROM/FLASH
so there's no need to request it from a central repository
(TFTP service, etc.).  And, since the node was *intentionally*
powered up by some other node in the system, the decision as to
what the node will be doing has already been made (i.e., which
applets will be pushed to it).  Lastly, applets are *tiny* so
they can be pushed in a couple of large packets and ready to run.

> capacity. Of course, if you really have non-abstractable differences in
> your nodes, this wouldn't be possible. I tend to think that any such
> differences can be abstracted away using a tradeoff between
> general-purposeness and efficiency. Personally, I think the tradeoff is
> worth it.

I generalize the "compute" resource in each node.  I.e., what sort
of memory and MIPS are available in a node GIVEN THE OTHER FIXED
SERVICES THAT ARE HARD-WIRED TO IT.

My goal has been to produce a set of computationally identical nodes
(at least, all the satellites are the same -- though they may differ
from the nodes in the router) and use differential stuffing options
to tailor specific nodes to specific purposes.  For example, the
node that connects to the irrigation valves needs the ability to
drive lots of solenoids:  high current, reasonably high voltage,
*binary* outputs.  OTOH, the node that interfaces to the weather
station needs lots of low level *inputs* (and *no* outputs).  By
designing a common "core" board, I can save a lot of money (and
time!) on fabrication costs:  "I'd like to order 50 of these and
50 sets of parts, too" vs. "I'd like 13 of these, 10 of those,
5 of this other thing..."

But, the *code* required to interface to a given set of I/O's
varies based on the nature of the I/O's and the rolls they play.
Turning on a valve and then twiddling your thumbs for 15 minutes
is a lot less taxing than capturing live video off a camera.

So, I discount the effort required to provide those underlying
services from the resources available on the board.  This can
result in two identical boards appearing to have very different
capabilities.  The workload manager looks at these advertised
capabilities -- updated to reflect any workload already assigned
to the node -- before it decides where a new task should be
deployed.  (or, if existing tasks should be RE-deployed)

>> Are they *capable* of being powered down and you just don't
>> take advantage of that (extra complexity?  *you* aren't paying
>> for the electricity?  etc.)  Any idea as to what it costs
>> to idle a processor vs. having it *do* something useful?
>> (I suspect it's not a huge difference given all the other
>> cruft in each box)
>
> I'd say anything is capable. The machines can certainly wake-on-lan and
> start processing, but we don't bother. The actual case is that we don't pay
> the electricity costs :-). I would imagine that idling (including idle
> disks and so forth) modern machines costs significantly less than keeping
> them utilized, but significantly more than keeping them off. No better
> numbers than that here, sorry :-)

Yeah, I think I just have to treat these costs as "variable functions"
that are evaluated in each deployment.  E.g., in my case, if I don't
need the particular I/O's on a given node, I can theoretically
*move* the tasks currently executing on its processor *if* I have
surplus capacity somewhere else.  (But, knowing which nodes will
find themselves in this condition varies from installation to
installation, day to day, etc.)

>> I think businesses have historically been much less concerned with
>> energy consumption.  E.g., PBX's stay lit even when the building
>> is deserted; most places don't even enforce a policy of requiring
>> employees to power down their PC's before leaving for the day, etc.
>>
>> I think businesses have a much higher -- and "narrower" -- peak
>> consumption period than residences.  E.g., ~10 hours (single
>> shift) of very high demand followed by ~14 hours of very little
>> demand.  Contrast this with residences that have a spurt of
>> demand early in the morning, some demand during the day (while
>> some residents are "away at work"), significantly increased
>> demand in the evening (meal prep, entertainment) followed by
>> virtually no demand "while sleeping".
>>
>> And, residents tend to *feel* the cost of the energy that is
>> consumed on their behalf!
>>
>> (Many businesses pay for electricity using different tariffs...
>> power used "off hours" is often "free" -- or, comparatively so
>> when contrasted with "peak" consumption)
>>
>> But, that doesn't mean one should ignore power requirements in
>> a system's design if you have a choice!  Especially if the capability
>> is there (and just isn't *typically* used -- at the present time).
>
> There's definitely a tradeoff here, though. If you spend X dollars more
> designing a custom solution, is it going to be offset by the Y dollars in
> energy savings? I would say the answer could go either way...

You're thinking small.  :>

How much money went into designing WoL in all these new NIC's?
Putting "energy save" mode into LCD (and CRT before them) monitors?
Putting sensors and controls in buildings to shut off the lights
in offices that aren't being used?  Making *ice* overnight so
the air conditioners can be "smaller" and still meet the cooling
needs of the building?

You spend the money/time *once* then reap the savings every time
a device is deployed.  *And*, once a technology is designed and
proven, it costs very little to replicate that technology/feature
in subsequent designs.

> On top of
> that, if you design this thing to be run on general-purpose hardware with a
> general-purpose cloud backing it, it'd *greatly* increase the amount of
> people that would understand the components enough to contribute, and could
> positively impact the success of the project.

Most of the work that will go into this after its released will be:
- supporting new "peripheral devices" (e.g., "I just bought a new
   Model XYZ Demand/Tankless Water Heater.  How do I interface it to
   this system that currently only supports a few *traditional*
   water heaters?")
- developing new applets to provide capabilities built on the
   existing services offered ("Gee, I wish this thing would open
   my garage door a few inches late at night to let the cooler air
   in!")
- developing new AI's to "create knowledge" based on new and existing
   observations from the sensors available ("Hmmm... yesterday, it
   climbed to 103 degrees with an RH of 45% rendering the swamp cooler
   ineffective LATER IN THE DAY.  As a result, I had to resort to
   bringing the ACbrrr on-line.  But, all that EXTRA moisture the
   cooler had put in the air in the morning presented an extra load
   on the ACbrrr decreasing/delaying its effectiveness.  The current
   indoor temperature suggests I turn on the cooler *now*.  But, outside
   conditions resemble what they looked like yesterday at this time.
   What's the relative *co$t* for me to use the ACbrrr, instead, from
   the outset?  And, what degree of confidence do I have in this
   decision?  What's the expected value of each possible course of
   action, *now*?")

[Developing new "peripheral interfaces/handlers" will always require
specialized knowledge.  First, knowledge of the hardware interfaces
required to interact with the device(s) being monitored/controlled.
Second, knowing how those devices *work* and how they can potentially
NOT work.]

Most (all?) of this can be done without having to look at any of
the details of the implementation.  Like most iOS developers never
need to see how the OS is implemented.  How most User Land developers
are oblivious to kernel implementation issues.   "Follow these rules
and you can expect..."

You don't worry about how the processor on which you are executing
was chosen.  Or, how to open a connection to an application that
is running on some other node.  Or, how "relay(ROSE_BUSHES, on)"
ends up getting routed to the node that actually *controls* that
relay.  You don't worry if you have "permission" to access a
resource.  Etc.

You (developer) concentrate on implementing a particular algorithm.
Not on the trivial minutiae involved with making it happen.  I.e.,
you code a (small) application in a week/weekend because the
system does all the "heavy lifting" for you.  ("Um, you turned on
the valve for the rose bushes 3 hours ago and never turned it off!
No, don't worry; I already took care of it for you...")

The goal isn't to "dumb down" the developer's interface but,
rather, remove the cruft that leads to the more insidious bugs,
etc. ("What do you mean, the RPC stalled midstream?")

--don





More information about the tfug mailing list