System Administration
Server Management
Thái Minh Tuấn - minhtuan@ctu.edu.vn
Slides are adapted from:
[1] Slides prepared by Prof. Brian D. Davison (http://www.cse.lehigh.edu/~brian/)
[2] The Practice of System and Network Administration, 3rd Ed., by Limoncelli, Hogan, and Chalup (Addison Wesley, 2017) 1
[3] Practical Linux System Administration: A Guide to Installation, Configuration, and Management, by Kenneth Hess (O'Reilly Media, 2023)
Workstations vs. Servers
● Server different from desktop? Yes!
○ May serve tens, hundreds or many thousands
of users
○ Different hardware design/OS configurations
○ Requires reliability and high uptime
○ Requires tighter security
○ Often expected to last longer
○ Extra cost is amortized across users, life span
○ Deployment within the data center
○ Remote access
2
Server Hardware Design Differences
● Buy server hardware for servers
○ More CPU performance
○ High performance I/O (both disk and network)
○ Expandability
○ More upgrade options
○ Rack mountable/optimized
○ Front and rear access
○ High-availability options
○ Remote management
● Use vendors known for reliability
○ Your time is valuable
3
Server OS and Management Differences
● Server hardware runs a server OS
○ Includes additional software for providing services
○ Various defaults are changed to provide better performance for
long-running applications
○ Stripped down to the bare essentials
■ Easier to debug and maintenance
■ Fewer source of vulnerabilities
○ Often patched on a different schedule
■ Weekly during off-hours
■ Monthly during carefully announced maintenance windows
4
Data Backups - Separate Administrative Networks
● Servers are often unique with critical data that must be backed up
○ Clients are often not backed up (most data is on server)
● Consider separate administrative network
○ Separating administrative traffic from normal service traffic
○ Administrative network is often more stable and static,
■ While the service network is more dynamic
○ Might want to keep bandwidth-hungry backup jobs off of production network
○ Administrative network often has more restrictive firewall policies
○ Connects other administrative devices and control systems
● More details later in semester
5
Server Reliability
● Servers often have internal redundancy
○ One part can fail and the system keeps running
● Server hardware should include various high availability options
○ Dual power supplies, RAID, multiple network connections, and hot-swap components.
○ RAM should be error correcting, not just error checking.
● Levels of Redundancy
○ N+0 (no spare capacity)
○ N+1 redundancy
○ N+2 redundancy
○ etc.
6
Server Data Integrity - RAID
● Disk drives fail!
○ Often useful to consider RAID for data integrity
● The main system disk is often the most difficult to
replace
● Mirrored root disks:
○ Two disks; copy from the working disk to the clone at
regular intervals (e.g., once a night)
○ Use hardware or software RAID 1 to keep both in sync
● Additional storage should be RAID 5, 6, or 10
● RAID disks still need to be backed up
○ Why?
○ RAID is not a backup strategy. RAID and backups are both
needed; they are complimentary 7
Server Data Integrity - Non-RAID approaches
● Assume data integrity is handled elsewhere
○ Server disks have no RAID protection
○ Failures are handled at another layer of the design
● A group of redundant web servers all contain the same data
○ if a disk fails, the web server is simply shut down for repairs
○ The traffic is divided among the remaining web servers
● Distributed storage system such as Google’s GFS, Hadoop’s HDFS
○ Store copies of data on many hosts
8
Redundant Power Supplies
● Power supplies 2nd most failure-prone part
● Ideally, servers should have redundant
power supplies
○ Means the server will still operate if one power
supply fails
○ Should have separate power cords
○ Should draw power from different sources (e.g.,
separate UPSes)
9
Hot-swap Components
● Redundant components should be hot-swappable
○ New components can be added without downtime
○ Failed components can be replaced without outage
● Hot-swap components increases cost
○ But consider cost of downtime
● Always check
○ Does OS fully support hot-swapping components?
○ What parts are not hot-swappable?
○ How long/severe is the service interruption?
10
Servers in the Data Center
● Servers should be located in
computer rooms (data centers)
● Data centers provide
○ Proper power (enough power,
conditioned, UPS, maybe generator)
○ Fire protection/suppression
○ Networking
○ Sufficient air conditioning (climate
controlled)
○ Physical security
11
Remote Administration
● Data centers are expensive, and thus often cramped,
cold, noisy, and may be distant from admin office
● Servers should not require physical presence at a
console
● Typical solution is a console server
○ Eliminate need for keyboard and screen
○ Can see booting, can send special keystrokes
○ Access to console server can be remote (e.g., ssh, rdesktop)
● Power cycling provided by remote-access power-strips
● Media insertion & hardware servicing are still
problems
12
Maintenance contracts, spare parts
● All machines eventually break!
● Vendors have variety of service contracts (SLA)
○ On-site with 4-hour, 12-hour, or next-day response
○ Customer-purchased spare parts get replaced when used
● How to select maintenance contract? Determine needs.
○ Non-critical hosts: next-day or two-day response time is likely reasonable, or perhaps no
contract
○ Large groups of similar hosts: use spares approach
○ Controlled model: only use a small set of distinct technologies so that few spare part kits
needed
○ Critical host: stock failure-prone and interchangeable parts (power supplies, hard drives);
get same-day contract for remainder
○ Large variety of models from same vendor: sufficiently large sites may opt for a contract
with an on-site technician 13
Server Hardware Strategies
● 3 most common strategies:
○ All eggs in one basket: One machine used for many purposes
○ Beautiful snowflakes: Many machines, each uniquely configured
○ Buy in bulk, allocate fractions: Large machines partitioned into many
smaller virtual machines using virtualization or containers
● Alternatives
○ Cloud services: Renting use of someone else’s servers and applications
○ Server appliances: Purpose-built devices, each providing a different
service
14
All Eggs in One Basket
● Purchase a single server and use it for many services
○ DNS server, DHCP server, email server, and web server, etc. of an organization
● If you are going to put all your eggs in one basket, make sure you have a
really, really, really strong basket
○ Should buy top-of-the-line hardware and a model that has plenty of expansion slots.
■ This machine will last a long time
● This approach is NOT recommended
○ Any hardware problems the machine has will affect many services
○ Different applications demand different configurations
■ Upgrading individual applications also becomes more perilous (dependency hell)
■ Difficult to schedule downtime for hardware upgrades
15
Beautiful Snowflakes
● Use a separate machine for each service
○ Each machine is sized for the desired application
● Benefit: best possible choice meets the requirements
● Downside: result to a fleet of unique machines
○ Each is a beautiful, special little snowflake
○ Each new system adds administrative overhead proportionally
■ To install, configure, manage, etc.
● Recommendations:
○ Asset tracking
○ Reducing variations
○ Global optimization
16
Buy in Bulk, Allocate Fractions
● Buy computing resources in bulk and allocate fractions of
it as needed
○ Through virtualization
○ Each application runs on virtual machines (VM)
● Virtual machines
○ Created in minutes
○ Controlled via a portal or API calls, easy for automation
○ Easy to resize up/downward
○ Better isolation
■ Independence: Each VM can run a different operating system
■ Resource isolation: Processes running on one VM can’t access
the resources of another
■ Granular security: separate privileges on different VMs
■ Reduced dependency hell: VM can be upgraded independently
○ Live VM migration
17
Cloud Services
● Not own any machines/applications at all, but rather to rent capacity on
someone else’s system
○ Amazon AWS, Microsoft Azure, and Google Cloud, etc.
● Benefits:
○ Less acquisition/purchase cost; reduced operational cost
○ Reduced system management responsibility
○ Use-basis payment facility
○ Unlimited computing power and storage; Reliability/ continuous availability
● Cloud-Based Compute Services: rent VMs from cloud providers
○ AWS EC2, Google Compute Engine, Azure VMs, etc.
● XXXX-as-a-Service:
○ Database-as-a-Service: AWS RDS, MongoDB Atlas
○ Storage-as-a-Service: Google Drive, OneDrive, Dropbox 18
Hybrid Strategies
● Most organizations actually employ a number of different strategies
○ Small organizations:
■ A few snowflakes, a few eggs in one basket
■ Public cloud services
○ Medium-size organizations:
■ Virtualization cluster plus
■ Plus a snowflakes for situations where virtualization would not work
■ Public cloud services
○ Large organizations
■ Have a little of everything
■ Build a private, in-house cloud
■ Public cloud services
○ Companies needs to get started quickly
■ Public cloud services
19