Introducing Servers


Servers are the cornerstone of corporate infrastructure relied upon to
provide the services that employees and customers need to perform
day-to-day operations in a timely and efficient manner.  The single
most important attribute of most enterprise-grade servers is
reliability – and a good level of fault tolerance is factored into the
design of most servers in order to increase uptime.

Many readers
run servers in their own home. They are the headless Linux box in the
corner of the study that provides email web server DNS routing and
file sharing services for the home. While these machines still
constitute servers in a raw sense it would take a brave Technology
Officer to put their faith in these types of servers to fulfil
the ITS requirements of a business.

This
guide demonstrates what differentiates business-class servers from the
typical white-box server that you can build from off-the-shelf
components and highlights some of the many factors of a server’s design
that needs to be carefully considered in order to provide reliable
services for business.

Form Factor

Servers
come in all shapes and sizes. The tower server is designed for
organisations or branch offices whose entire infrastructure consists of
a server or two. From the outside they wouldn’t look out of place on
or under someone’s desk – but the components that make up the server are often of a higher build quality than workstation components.
Tower cases are generally designed to minimise cost whilst providing
smaller businesses some sense of familiarity with the design of the
enclosure.

For larger server infrastructures the rack-mount case is
used to hold a server’s components. As the name suggests rack-mount
servers
are almost always installed within racks and located in
dedicated data rooms where power supply physical access temperature
and humidity (among other things) can be closely monitored. Rack-mount
servers come in standard sizes – they are 19-inches wide and have
heights in multiples of 1.75 inches where each multiple is 1 Rack Unit
(RU). They are often designed with flexibility and manageability in
mind.

Lastly the blade server is designed for dense server
deployment scenarios. A blade chassis provides the base power
management networking and cooling infrastructure for numerous
space-efficient servers. Most of the top 500 supercomputers these days
are made up of clusters of blade servers in large data centre
environments.

Processors

With
the proliferation of quad-core processors in computing’s mainstream performance
sector the main difference between
servers and workstations comes down to the support
for multiple sockets.

Consumer-class Core 2 and Phenom-based systems
are built around a single-socket designs that feature multiple cores
per socket – and cannot be used in multi-socket configurations.

Xeon
and Opteron processors on the other hand provide interconnects that
allow processes to be scheduled across separate processors
featuring multiple cores contributing towards the total processing
power of a server. It’s not uncommon to see quad-socket four-core
processors in some high-end servers providing a total of 16
processing cores at upwards of 3.0GHz per core. The scary thing is that
six- and eight-core processors are just around the corner.

The
other main difference that you see between consumer and enterprise
processors is the amount of cache. Xeon and Opteron
processors often have significantly larger Level-2 and Level 3 caches to reduce the amount of data that has to be shifted to memory
generally resulting in slightly faster computation times depending on
the application.

A server’s form factor will also impact on
the type of processor that can be used. For instance blade servers
often need more power-efficient cooler processors due to their
increased deployment density. Similarly a 4RU server may be able to
run faster and hotter processors than a 1RU server from the same vendor.

Memory

While
the physical RAM modules that you see in today’s servers don’t differ
dramatically from consumer parts there are numerous subtle differences that provide additional fault-tolerance
features.

Most memory controllers feature Error Checking and
Correction (ECC) capabilities and the RAM modules installed in such
servers need to support this feature. Essentially ECC-capable memory
performs a quick parity check before and after a read or write operation
to verify that the contents or memory has been read or written
properly. This feature minimised the likelihood of memory corruption.

The other main
difference in memory controller design is how much RAM is supported.
The newest Intel-based servers now use a  memory controller built onto the processor die as has been the case with
AMD-based systems for years. Even the newest mainstream memory
controllers support a maximum of 16GB of RAM. HP have recently
announced a “virtualisation-ready” Nahalem-based server design that
will support 128GB of RAM.

Many
modern servers provide mirrored memory features. A memory mirror
essentially provides RAID-1 functionality for RAM – the contents of
your system memory are written to two separate banks of identical RAM
modules. If one bank develops a fault it is taken offline and the
second bank is used exclusively. The memory controller of the server
can usually handle this failover without the operating system even
being aware of the change preventing unscheduled downtime of the
server.

Hot-spare memory can also be installed in a bank of some
servers. The idea is that if the memory in one bank is determined
to be faulty the hot-spare bank can be brought online and used in
place of the faulty bank. In this scenario some memory corruption can
occur depending on the operating system and memory controller
combination in use. The worst-case scenario usually involves a
crash of the server followed by an automated reboot by server recover
mechanisms. Upon reboot the memory
controller brings the hot-spare RAM online limiting downtime.

Hot-swappable
memory is often used in conjunction with both of the features – giving
you the ability to swap-out faulty RAM modules without having to shut
down the entire server.

Storage Controllers

Drive
controllers are dramatically different in servers. Forget on-board
firmware-based SATA RAID controllers that provide RAID 0 1 and 1+0 and
consume CPU cycles every time data is read or written to the array.
Server-class controllers have dedicated application-specific integrated
circuits (ASICs) and a bucket full of cache (sometimes as much as
512MB) in order to boost the performance of the storage subsystem.
These controllers also frequently support advanced RAID levels
including RAID 5 & 6.

The controller cache can be one of the
most critical components of a server depending on the application. At
my place of employment we have a large number of servers that capture
video in HD-quality at real time. A separate “ingest” server often
pulls this data from the encode server immediately after it has been
captured for further processing and transcoding. Having 512MB of cache
installed on the drive controller allows data to be pushed out via the
network interface before it has been physically written to disk
significantly boosting performance. Testing has revealed that if we
reduced the cache size to 64MB data has to be physically written to
disk and then physically read when the ingest process takes place
placing significant additional load on the server. Finally consider
that most mainstream controllers have no cache whatsoever – the impact
on performance in this scenario would probably prevent us from working
with HD-quality content altogether.

But what happens if there is
a power outage and the data that is in the controller cache has not yet
been written to the disk? In order to prevent data loss some
controllers feature battery backup units (BBUs) that are capable of
keeping the contents of the disk cache intact for in excess of 48-hours
or until power is restored to the server. Once the server is switched
on again the controller commit the data from the cache to the disk
array before flushing the cache and continuing with the boot process.
No data is lost. BBUs are another feature missing from mainstream
controllers.

External Storage

Any
computer chassis has a physical limitation to the number of drives that
you can install. This limitation is overcome in enterprise servers by
connections to Storage Area Networks (SANs). This is typically
accomplished in two ways – via a fibre channel or iSCSI interfaces.

iSCSI
is generally the cheaper option of the two because data transferred
between the SAN and server is encapsulated in frames sent over
ubiquitous Ethernet networks meaning that existing Ethernet
interfaces cabling and switches can be used (aside from the cost of
the SAN enclosure itself the only additional costs are generally an
Ethernet interface module for the SAN and software licenses).

On
the other hand fibre channel requires its own fibre-optic interfaces
cabling and switches which significantly drives up cost. However
having a dedicated fibre network means that bandwidth isn’t shared with
other Ethernet applications. Fibre channel presently offers interface
speeds of 4Gb/s compared to the 1Gb/s often seen in most enterprise
networks. Fibre channel also has less overhead than Ethernet which
provides an additional boost to comparative performance.

Disk Drives

For
years enterprise servers have utilised SCSI hard disk drives instead
of ATA variants. SCSI allows for up to 15 drives on a single parallel
channel versus the two on a PATA interface; PATA drives ship with the
drive electronics (the circuitry that physically controls the drive)
integrated on the drive (IDE) whereas SCSI controllers perform this
function in a more efficient manner; many SCSI interfaces provide
support for drive hot-swapping thus reducing downtime in the event of a
drive failure; and the SCSI interface allows for faster data transfer
rates than what could be obtained via PATA giving better performance
especially in RAID configurations.

However over the last year
Serial-Attached SCSI (SAS) drives have all but superseded SCSI in the
server space in much the same way that SATA drives have replaced their
PATA brethren. The biggest problem with the parallel interface was
synchronising clock rates on the many parallel connections – serial
connections don’t require this synchronisation allowing clock rates to
be ramped up and increasing bandwidth on the interface.

SAS
drives are the same as SCSI drives in many ways – the SAS controller
is still responsible for issuing commands to the drive (there is no
IDE) SAS drives are hot-swappable and data transfer over the interface
is faster compared to SATA. SAS drives come in both 2.5- and 3.5-inch
form factors 2.5-inch drives proving popular in servers as
they can be installed vertically in a 2RU enclosure.

In
addition SAS controllers can support 128 directly-attached devices on
a single controller or in excess of 16384 devices when the maximum 128 port expanders are in use (however the maximum amount of bandwidth
that all devices connected to a port expander can use equals the amount
of bandwidth between the controller and the port expander). In order to
support this many devices SAS also uses higher signal voltages in
comparison to SATA which allows the use of 8m cables between
controller and device. Without using higher signal voltages I’d like
to see anyone install 16384 devices to a disk controller with a
maximum cable length of 1 meter (the current SATA limitation).

In
the next few months there will be another major advantage to using SAS
over SATA in servers. SAS does support multipath I/O. Suitable
dual-port SAS drives can then connect to multiple controllers within a
server which provides additional redundancy in the event of a
controller failure.

GPUs and Video

One of the areas where
enterprise servers are inferior to regular PCs is graphics acceleration. Personally I’m yet to see a server in a data centre with a PCI-Express
graphics adapter although that’s not to say that it’s not possible to
install one in an enterprise server. In general though most
administrators find the on-board adapters more than adequate for server
operations.

Networking

Modern day desktops and laptops
feature Gigabit Ethernet adapters and the base adapters seen on
servers are generally no different. However like most other components
in servers there are a few subtle differences that improve performance
in certain scenarios.

In order to provide network fault
tolerance two or more network adapters are integrated on most server
boards. In most cases these adapters are able to be teamed. Like RAID
fault tolerance schemes there are numerous types of network fault
tolerance options available including:

  • Network Fault
    Tolerance (NFT) – In this configuration only one network interface is
    active at any given time with the rest remaining in a slave mode. If the
    link to the active interface is severed a slave interface will be
    promoted to be the active one. Provides fault-tolerance but does not
    aggregate bandwidth.
  • Transmit Load Balancing (TLB) –
    Similar to NFT but slave interfaces are capable of transmitting data
    provided that all interfaces are in the same broadcast domain. This
    provides aggregation of transmission bandwidth but not receive – and
    also provides fault-tolerance.
  • Switch-assisted Load
    Balancing (SLB) and 802.3ad Dynamic – provides aggregation of both
    transmit and receive bandwidth across all interfaces within the team
    provided that all interfaces are connected to the same switch. Provides
    fault-tolerance on the server side (however if the switch connected to the server fails you have an outage). 802.3ad Dynamic
    requires a switch that supports the 802.3ad Link Aggregation Control
    Protocol (LACP) in order to dynamically create teams whereas SLB must
    be manually configured on both the server and the switch.
  • 802.3ad
    Dynamic Dual-Channel – provides aggregation of both transmit and
    receive bandwidth across all interfaces within the team and can span
    multiple switches provided that they are all in the same broadcast
    domain and that all switches support LACP.

Just about all
server network interface cards (NICs) support Virtual Local Area
Network (VLAN) trunking. Imagine that you have two separate networks –
an internal one that connects to all devices on your LAN and an
external on that connects to the Internet with a router in between. In
conventional networks the router needs to have at least two network
interfaces – one dedicated to each physical network.

Provided
that your network equipment and router supports VLAN trunking your two
networks could be set up as separate VLANs. In general your switch
would keep track of which port is connected to which VLAN (this is
known as a port-based VLAN) and your router is trunked across both
VLANs utilising a single NIC (physically it becomes a
router-on-a-stick). Frames sent between the switch and router are
tagged – so that each device knows which network the frame came from or
is destined to go to.

VLANs operate in the same physical
manner as physical LANs – but network reconfigurations can be made in
software as opposed to forcing a network administrator to physically
move equipment.

Because of the sheer amount of data received on Gigabit and Ten-Gigabit interfaces it can become
exhaustive to send Ethernet frames to the CPU for it to
process TCP headers. It roughly requires around 1GHz of processor power
to transmit TCP data at Gigabit Ethernet speeds.

As a result
TCP Offload Engines are often incorporated into server network
adapters. These integrated circuits process TCP headers on the
interface itself instead of pushing each frame off to the CPU for
processing. This has a pronounced effect on overall server performance
in two ways – not only does the CPU benefit from not having to process
this TCP data but less data is transmitted across PCI express lanes
toward the Northbridge of the server. Essentially TCP Offload engines
free up resources in the server so that they can be assigned to other
data transfer and processing needs.

The final difference that
you see between server NICs and consumer ones is that the buffers on
enterprise-grade cards are usually larger. Part of the reason is the additional features mentioned above but there is also a
small performance benefit to be gained in some scenarios (particularly
inter-VLAN routing).

Power Supplies

One of the great
features about ATX power supplies are the standards. ATX power supplies are always the same form factor and feature the
same types of connectors (even if the number of those connectors can
vary). But while having eight 12-volt Molex connectors is great in a
desktop system so many connectors are generally not required in
a server where the cable clutter could cause cooling problems.

Power
distribution within a server is well thought out by server
manufacturers. Drives are typically powered via a backplane instead of
individual Molex connectors and fans often drop directly into plugs on
the mainboard. Everything else that requires power draws it from other
plugs on the mainboard. Even the power supplies themselves have
PCB-based connectors on them. All of this is designed to help with the
hot-swapping of components in order to minimise downtime.

Most
servers are capable of handling redundant power supplies. The first
advantage here is if one power supply fails the redundant supply can
still supply enough juice to keep the server running. Once aware of the
failure you can then generally replace the failed supply while the
server is still running.

The second advantage requires facility
support. Many data centres will supply customer racks with power feeds
on two separate circuits (which are usually connected to isolated power
sources). Having redundant power supplies allows you to connect each
supply up to a different power source. If power is cut to one circuit
your server remains online because it can still be powered by the
redundant circuit.

Server Management

Most servers
support Intelligent Platform Management Interfaces (IPMIs) which allow
administrators to manage aspects of the server and to monitor server
health – including when the server is powered off.

For
example say that you have a remote Linux server that encountered a
kernel panic – you could access the IPMI on the server and initiate a
reboot instead of having to venture down to the data centre gain
access and press the power button yourself. Alternatively say that
your server is regularly switching itself on and off every couple of
minutes too short a time for you to log in and perform any kind of
troubleshooting. By accessing the IPMI you could quickly determine
that a fan tray has failed and that the server is automatically shutting
down once temperature thresholds are exceeded. These are two scenarios where having access to IPMIs has saved my skin.

Many
servers also incorporate Watchdog timers. These devices perform regular
checks on whether the Operating System on the server is responding
and will reboot the server if the response time is greater than a
defined threshold (usually 10 minutes). These devices can often
minimise downtime in the event of a kernel panic or blue-screen.

Finally
most server vendors will also supply additional Simple Networking
Management Protocol (SNMP) agents and software that allows
administrators to monitor and manage their servers more closely. The
agents often supplied provide just about every detail about
the hardware installed that you could ever want to know – how long a
given hard disk drive has been operating in the server the temperature
within a power supply or how many read-errors have occurred in a
particular stick or RAM. All of this data can be polled and retrieved
with an SNMP management application (even if your server provider
doesn’t supply you with one of these there are dozens of GPL packages
available that utilise the Net-SNMP project).

The future…

All
of the points detailed in this article highlight the differences between today’s high-end consumer gear (which is typically used to make
the DIY server) and enterprise-level kit. However emerging
technologies will continue to have an impact on both the enterprise and
consumer markets.

As the technology becomes more refined
solid-state drives (SSDs) will start to emerge as a serious alternative
to SAS hard-disk drives for some server applications. Initially
they’ll most likely be deployed where lower disk capacity and lower
disk access times are required (such as database servers). When the
capacity of these drives increases they’ll start to become more
prominent – but will probably never replace the hard-disk drive for
storing large amounts of data.

The other big advantage to using
SSDs is that RAID-5 failures become less of an issue (RAID 5 arrays can tolerate the failure of a single
drive in the array. If during the time that it takes to replace the
faulty drive and rebuild the array a second drive fails or an
unrecoverable read error (URE) occurs on one of the surviving drives in
the array the rebuild will fail and all data on the array will be lost.). SSDs shouldn’t exhibit UREs – once data is written to the disk
it’s stored physically not magnetically. A good SSD will also verify
that the contents of a block including whether it can be read before
the write operation is deemed to have succeeded. Thus if the drive
can’t write to a specific block it should be marked as bad and a
reallocation block should be brought online to take its place. Your
SNMP agents can then inform you when the drive starts using up its
reallocation blocks indicating that a drive failure will soon happen.
In other words you’ll be able to predict when an SSD fails with more
certainty which could give RAID-5 a new lease of life.

Moving
further forward the other major break from convention in server
hardware will most likely be the use more
application-specific processor units instead of the CPU as we know it
today. There’s already some movement in this area – Intel’s Larrabee is
an upcoming example of a CPU/GPU hybrid and the Cell Broadband Engine
Architecture (otherwise know as the Cell architecture) that is used in
Sony’s Playstation 3 is also used in the IBM RoadRunner supercomputer
(the first to sustain performance over the 1 petaFLOPS mark).

Next: Visual tour of the insides of a server.