It’s a renovated note.
Nowadays many of us will enjoy the cloud cluster rather than build a self managed cluster, as it’s less management, high availability, more secure, pay-as-you-go, and all the advantages you can think of the cloud computing. However, if you accidentally own several old computers, and don’t want to sell/transfer them and don’t know how to deal with them, a home-managed cluster will be a good choice. there is a lot of fun of a self managed cluster.
Architecture
Let’s define our architecture here: 4 nodes: 1 master/worker, 3 worker
1. master node
-
eth0: connect with institute cable
- public interface, using DHCP
- internet download, update
- users can access it: ssh/scp
-
eth1: connect with worker ndoes
- private interface, using static IP
-
communicate other code:
- ssh/scp
- data transfer
- parallel communicate
config network interface eth0 and eth1
# Configure public interface (assumes DHCP from institute network) cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF TYPE=Ethernet DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes # Request static IP from our institute's DHCP server if possible # This makes routing more reliable DHCP_CLIENT_ID=cluster-master EOF # Configure private cluster network cat > /etc/sysconfig/network-scripts/ifcfg-eth1 << EOF TYPE=Ethernet DEVICE=eth1 BOOTPROTO=static IPADDR=192.168.10.1 NETMASK=255.255.255.0 ONBOOT=yes EOF # Apply new network configuration systemctl restart network
NAT
# Enable IP forwarding persistently echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf # Enable connection tracking timeout optimization for HPC workloads echo "net.netfilter.nf_conntrack_tcp_timeout_established = 86400" >> /etc/sysctl.conf echo "net.netfilter.nf_conntrack_max = 131072" >> /etc/sysctl.conf # Apply sysctl changes sysctl -p # Set up NAT with higher connection limits iptables -t nat -A POSTROUTING -o eth0 -s 192.168.10.0/24 -j MASQUERADE
the last command is important, as it will contribute to the return traffic. We will explain later.
setup the firewall
# Clear existing rules iptables -F iptables -X iptables -t nat -F iptables -t nat -X iptables -t mangle -F iptables -t mangle -X # Set default policies iptables -P INPUT DROP iptables -P FORWARD DROP # We'll explain this choice iptables -P OUTPUT ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow established and related connections iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH from institute network iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT # Allow all traffic from the cluster's private network iptables -A INPUT -i eth1 -s 192.168.10.0/24 -j ACCEPT # Allow forwarding from cluster to internet iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT # Allow HTTP/HTTPS for package downloads iptables -A INPUT -i eth0 -p tcp --dport 80 -j ACCEPT iptables -A INPUT -i eth0 -p tcp --dport 443 -j ACCEPT # Set up local package caching repository later (optional) iptables -A INPUT -i eth1 -p tcp --dport 80 -j ACCEPT # Save iptables rules iptables-save > /etc/sysconfig/iptables
I set the default FORWARD policy to DROP for security reasons:
It prevents unauthorized traffic from traversing the master node
It creates a default-deny stance, where only explicitly allowed traffic passes
It prevents potential lateral movement if one node is compromised
2. worker nodes
config
# Worker node 1 (192.168.10.2) cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF TYPE=Ethernet DEVICE=eth0 BOOTPROTO=static IPADDR=192.168.10.2 NETMASK=255.255.255.0 GATEWAY=192.168.10.1 DNS1=8.8.8.8 ONBOOT=yes EOF # here GATEWAY will auto generate the route rule in table # Worker node 2 (192.168.10.3) # Change IPADDR=192.168.10.3 on the second worker node # Worker node 3 (192.168.10.4) # Change IPADDR=192.168.10.4 on the third worker node # Restart network service on each worker node systemctl restart network
Routing Configuration for Worker Nodes
As we use the master node eth1
(192.168.10.1
) as the gateway for work nodes (GATEWAY=192.168.10.1
), above setting creates a default route on each worker node that sends all traffic not destined for the local network (192.168.10.0/24) to the master node (192.168.10.1).
$ route -n # see the result: (send all traffic (0.0.0.0/0) to gateway 192.168.10.1) 0.0.0.0 192.168.10.1 0.0.0.0 UG 0 0 0 eth0
Manual Command
Using manual command can also achieve the same result.
route add -net 0.0.0.0 gw 192.168.10.1
This manually adds a default route to the current routing table. It has the same immediate effect as the configuration file setting, but it's temporary and will be lost after a reboot or network service restart.
The difference is primarily in persistence and when the configuration happens. Using the network configuration file is the standard way to set up permanent routes in CentOS/RHEL systems.
Fire wall
# Clear existing rules iptables -F iptables -X # Set default policies iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow established connections iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH from cluster nodes only iptables -A INPUT -p tcp --dport 22 -s 192.168.10.0/24 -j ACCEPT # HPC/MPI Communication - comprehensive approach # Allow all TCP/UDP between cluster nodes for parallel computing # this is optinal if you don't want do parallel computing iptables -A INPUT -s 192.168.10.0/24 -p tcp -j ACCEPT iptables -A INPUT -s 192.168.10.0/24 -p udp -j ACCEPT # Save iptables rules iptables-save > /etc/sysconfig/iptables systemctl enable iptables
3. Package management
3.1 Local yum Repo
because the worker node will not have the access to internet, we keep them inside the private, therefore, for package installation, updating, we need to find a way to resolve these.
we want to specify the packages in our yum repo
so here we build a local yum repo
# On master node # Install required packages yum install -y createrepo nginx # Create repository directory mkdir -p /var/www/html/centos-repo # Configure Nginx cat > /etc/nginx/conf.d/repo.conf << EOF server { listen 80; server_name _; root /var/www/html; location / { autoindex on; } } EOF # Start and enable Nginx systemctl enable nginx systemctl start nginx # Download packages to repository yum install -y yum-utils repotrack -p /var/www/html/centos-repo <package-name> # Repeat for packages we need # Create repository metadata createrepo /var/www/html/centos-repo # Configure worker nodes to use this repository cat > /etc/yum.repos.d/cluster-local.repo << EOF [cluster-local] name=Cluster Local Repository baseurl=http://192.168.10.1/centos-repo enabled=1 gpgcheck=0 EOF
3.2 Optional for package management
In order to realise
worker node package update
self-managed packages
we can also use scp to transfer package from master node.
# On master node, download and transfer RPM yum install -y yum-utils yumdownloader <package-name> scp <package-name>.rpm 192.168.10.1:/tmp/ # On worker node sudo rpm -ivh /tmp/<package-name>.rpm
3.3 directly route worker node to internet
If we don’t need high security, we can also open the private cluster to public internet. Which will configure the router table and we don’t discuss here.
4. Other Installations
ssh configuration
# On master node, generate SSH key ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N "" # Copy the key to all nodes (including itself) for i in {1..4}; do ssh-copy-id -i ~/.ssh/id_rsa.pub 10.10.10.$i done # Do the same on each worker node to allow any-to-any communication # (Run similar commands on each worker node)
Network Monitor & iptables Log
# Install tools yum install -y tcpdump nmap iftop # Set up automatic monitoring with fail2ban to prevent brute force attacks yum install -y fail2ban cat > /etc/fail2ban/jail.local << EOF [sshd] enabled = true port = ssh filter = sshd logpath = /var/log/secure maxretry = 5 bantime = 3600 EOF # Start and enable fail2ban systemctl enable fail2ban systemctl start fail2ban # Add logging rules before the final DROP rules iptables -A INPUT -j LOG --log-prefix "IPTables-Input-Dropped: " --log-level 4 iptables -A FORWARD -j LOG --log-prefix "IPTables-Forward-Dropped: " --log-level 4 # Save iptables rules iptables-save > /etc/sysconfig/iptables
Optional Parallel Computing Configuration
# On all nodes (master and workers) # Install OpenMPI yum install -y openmpi openmpi-devel # Configure environment in /etc/profile.d/ cat > /etc/profile.d/mpi.sh << EOF export PATH=\$PATH:/usr/lib64/openmpi/bin export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib EOF # Source the new environment source /etc/profile.d/mpi.sh # Test MPI communication # Create a hostfile cat > /home/username/hostfile << EOF 192.168.10.1 slots=128 192.168.10.2 slots=128 192.168.10.3 slots=128 192.168.10.4 slots=128 EOF # Run a simple MPI test mpirun -np 4 --hostfile /home/username/hostfile hostname
Torque/PBS
# Install Torque on master node yum install -y torque-server torque-scheduler torque-client # Configure server nodes file cat > /var/torque/server_priv/nodes << EOF 192.168.10.1 np=128 192.168.10.2 np=128 192.168.10.3 np=128 192.168.10.4 np=128 EOF # Start Torque server systemctl enable pbs_server systemctl start pbs_server # Install Torque on worker nodes for i in {2..4}; do ssh 192.168.10.$i "yum install -y torque-mom torque-client; systemctl enable pbs_mom; systemctl start pbs_mom" done
5. Traffic Flow:
Forward Chain Traffic Flow in Both Directions
When we create the forward chain, iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT
, this command let the traffic come into eth1
can be forward to eth0
, which means traffic from worker nodes to master nodes, and master nodes forward it to institute internet. Here comes the question, where is the backward flow?
iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT
Let’s see how the traffic flows first.
Outbound Traffic (Worker → Internet)
The rule above allows packets to travel from the worker nodes (coming in on eth1
) to be forwarded out to the institute network (through eth0
). This handles the first half of any connection, which is the outbound request.
Return Traffic (Internet → Worker)
For the return traffic, typically we will think what we need as:
iptables -A FORWARD -i eth0 -d 192.168.10.0/24 -o eth1 -j ACCEPT
However, if we look closely at the original configuration, here is the command:
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
This rule is handling the return traffic, because:
When a worker node initiates a connection, the outbound packet creates an entry in the connection tracking table (create a ESTABLISHED connection)
Any returning packets associated with that connection are marked as ESTABLISHED
The rule above allows all ESTABLISHED connections through, regardless of interface
This is more secure than explicitly allowing all traffic from eth0
to eth1
, because it only permits return traffic for connections that were initiated from inside our cluster.
If this state tracking rule wasn't present, we would absolutely need to the backward traffic rule. Without either approach, connections would work one-way only, which means worker nodes could send requests, but never receive response…
Connection Flow
Let's trace a web request from a worker node:
- Worker (192.168.10.2) tries to access google.com
- Packet travels: Worker → Master's
eth1
- Master checks FORWARD chain, matches
-i eth1 -s 192.168.10.0/24 -o eth0
rule - Master performs NAT, changing source IP to its own public IP
- Packet leaves through
eth0
to institute network - Google responds to master's public IP
- Packet arrives at master's
eth0
- Master checks connection tracking table, sees this is a response
- Packet is marked as ESTABLISHED
- Master checks FORWARD chain, matches the ESTABLISHED rule
- Master performs reverse NAT, changing destination to worker's IP
- Packet leaves through eth1 to worker
- Worker receives response
More details
more details can check the post before here.
Top comments (0)