Posted on Mar 2 • Originally published at notes-renovation.hashnode.dev

Configuring Your Home Cluster Network: A Complete Guide

It’s a renovated note.

Nowadays many of us will enjoy the cloud cluster rather than build a self managed cluster, as it’s less management, high availability, more secure, pay-as-you-go, and all the advantages you can think of the cloud computing. However, if you accidentally own several old computers, and don’t want to sell/transfer them and don’t know how to deal with them, a home-managed cluster will be a good choice. there is a lot of fun of a self managed cluster.

Architecture

Let’s define our architecture here: 4 nodes: 1 master/worker, 3 worker

1. master node

eth0: connect with institute cable
- public interface, using DHCP
- internet download, update
- users can access it: ssh/scp
eth1: connect with worker ndoes
- private interface, using static IP
- communicate other code:
  - ssh/scp
  - data transfer
  - parallel communicate

config network interface eth0 and eth1

# Configure public interface (assumes DHCP from institute network) cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF TYPE=Ethernet DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes # Request static IP from our institute's DHCP server if possible # This makes routing more reliable DHCP_CLIENT_ID=cluster-master EOF # Configure private cluster network cat > /etc/sysconfig/network-scripts/ifcfg-eth1 << EOF TYPE=Ethernet DEVICE=eth1 BOOTPROTO=static IPADDR=192.168.10.1 NETMASK=255.255.255.0 ONBOOT=yes EOF # Apply new network configuration systemctl restart network

NAT

# Enable IP forwarding persistently echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf # Enable connection tracking timeout optimization for HPC workloads echo "net.netfilter.nf_conntrack_tcp_timeout_established = 86400" >> /etc/sysctl.conf echo "net.netfilter.nf_conntrack_max = 131072" >> /etc/sysctl.conf # Apply sysctl changes sysctl -p # Set up NAT with higher connection limits iptables -t nat -A POSTROUTING -o eth0 -s 192.168.10.0/24 -j MASQUERADE

the last command is important, as it will contribute to the return traffic. We will explain later.

setup the firewall

# Clear existing rules iptables -F iptables -X iptables -t nat -F iptables -t nat -X iptables -t mangle -F iptables -t mangle -X # Set default policies iptables -P INPUT DROP iptables -P FORWARD DROP # We'll explain this choice iptables -P OUTPUT ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow established and related connections iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH from institute network iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT # Allow all traffic from the cluster's private network iptables -A INPUT -i eth1 -s 192.168.10.0/24 -j ACCEPT # Allow forwarding from cluster to internet iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT # Allow HTTP/HTTPS for package downloads iptables -A INPUT -i eth0 -p tcp --dport 80 -j ACCEPT iptables -A INPUT -i eth0 -p tcp --dport 443 -j ACCEPT # Set up local package caching repository later (optional) iptables -A INPUT -i eth1 -p tcp --dport 80 -j ACCEPT # Save iptables rules iptables-save > /etc/sysconfig/iptables

I set the default FORWARD policy to DROP for security reasons:

It prevents unauthorized traffic from traversing the master node
It creates a default-deny stance, where only explicitly allowed traffic passes
It prevents potential lateral movement if one node is compromised

2. worker nodes

config

# Worker node 1 (192.168.10.2) cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF TYPE=Ethernet DEVICE=eth0 BOOTPROTO=static IPADDR=192.168.10.2 NETMASK=255.255.255.0 GATEWAY=192.168.10.1 DNS1=8.8.8.8 ONBOOT=yes EOF # here GATEWAY will auto generate the route rule in table # Worker node 2 (192.168.10.3) # Change IPADDR=192.168.10.3 on the second worker node # Worker node 3 (192.168.10.4) # Change IPADDR=192.168.10.4 on the third worker node # Restart network service on each worker node systemctl restart network

Routing Configuration for Worker Nodes

As we use the master node eth1 (192.168.10.1) as the gateway for work nodes (GATEWAY=192.168.10.1), above setting creates a default route on each worker node that sends all traffic not destined for the local network (192.168.10.0/24) to the master node (192.168.10.1).

$ route -n # see the result: (send all traffic (0.0.0.0/0) to gateway 192.168.10.1) 0.0.0.0 192.168.10.1 0.0.0.0 UG 0 0 0 eth0

Manual Command

Using manual command can also achieve the same result.

route add -net 0.0.0.0 gw 192.168.10.1

This manually adds a default route to the current routing table. It has the same immediate effect as the configuration file setting, but it's temporary and will be lost after a reboot or network service restart.

The difference is primarily in persistence and when the configuration happens. Using the network configuration file is the standard way to set up permanent routes in CentOS/RHEL systems.

Fire wall

# Clear existing rules iptables -F iptables -X # Set default policies iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow established connections iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH from cluster nodes only iptables -A INPUT -p tcp --dport 22 -s 192.168.10.0/24 -j ACCEPT # HPC/MPI Communication - comprehensive approach # Allow all TCP/UDP between cluster nodes for parallel computing # this is optinal if you don't want do parallel computing iptables -A INPUT -s 192.168.10.0/24 -p tcp -j ACCEPT iptables -A INPUT -s 192.168.10.0/24 -p udp -j ACCEPT # Save iptables rules iptables-save > /etc/sysconfig/iptables systemctl enable iptables

3. Package management

3.1 Local yum Repo

because the worker node will not have the access to internet, we keep them inside the private, therefore, for package installation, updating, we need to find a way to resolve these.
we want to specify the packages in our yum repo

so here we build a local yum repo

# On master node # Install required packages yum install -y createrepo nginx # Create repository directory mkdir -p /var/www/html/centos-repo # Configure Nginx cat > /etc/nginx/conf.d/repo.conf << EOF server { listen 80; server_name _; root /var/www/html; location / { autoindex on; } } EOF # Start and enable Nginx systemctl enable nginx systemctl start nginx # Download packages to repository yum install -y yum-utils repotrack -p /var/www/html/centos-repo <package-name> # Repeat for packages we need # Create repository metadata createrepo /var/www/html/centos-repo # Configure worker nodes to use this repository cat > /etc/yum.repos.d/cluster-local.repo << EOF [cluster-local] name=Cluster Local Repository baseurl=http://192.168.10.1/centos-repo enabled=1 gpgcheck=0 EOF

3.2 Optional for package management

In order to realise

worker node package update
self-managed packages

we can also use scp to transfer package from master node.

# On master node, download and transfer RPM yum install -y yum-utils yumdownloader <package-name> scp <package-name>.rpm 192.168.10.1:/tmp/ # On worker node sudo rpm -ivh /tmp/<package-name>.rpm

3.3 directly route worker node to internet

If we don’t need high security, we can also open the private cluster to public internet. Which will configure the router table and we don’t discuss here.

4. Other Installations

ssh configuration

# On master node, generate SSH key ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N "" # Copy the key to all nodes (including itself) for i in {1..4}; do ssh-copy-id -i ~/.ssh/id_rsa.pub 10.10.10.$i done # Do the same on each worker node to allow any-to-any communication # (Run similar commands on each worker node)

Network Monitor & iptables Log

# Install tools yum install -y tcpdump nmap iftop # Set up automatic monitoring with fail2ban to prevent brute force attacks yum install -y fail2ban cat > /etc/fail2ban/jail.local << EOF [sshd] enabled = true port = ssh filter = sshd logpath = /var/log/secure maxretry = 5 bantime = 3600 EOF # Start and enable fail2ban systemctl enable fail2ban systemctl start fail2ban # Add logging rules before the final DROP rules iptables -A INPUT -j LOG --log-prefix "IPTables-Input-Dropped: " --log-level 4 iptables -A FORWARD -j LOG --log-prefix "IPTables-Forward-Dropped: " --log-level 4 # Save iptables rules iptables-save > /etc/sysconfig/iptables

Optional Parallel Computing Configuration

# On all nodes (master and workers) # Install OpenMPI yum install -y openmpi openmpi-devel # Configure environment in /etc/profile.d/ cat > /etc/profile.d/mpi.sh << EOF export PATH=\$PATH:/usr/lib64/openmpi/bin export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib EOF # Source the new environment source /etc/profile.d/mpi.sh # Test MPI communication # Create a hostfile cat > /home/username/hostfile << EOF 192.168.10.1 slots=128 192.168.10.2 slots=128 192.168.10.3 slots=128 192.168.10.4 slots=128 EOF # Run a simple MPI test mpirun -np 4 --hostfile /home/username/hostfile hostname

Torque/PBS

# Install Torque on master node yum install -y torque-server torque-scheduler torque-client # Configure server nodes file cat > /var/torque/server_priv/nodes << EOF 192.168.10.1 np=128 192.168.10.2 np=128 192.168.10.3 np=128 192.168.10.4 np=128 EOF # Start Torque server systemctl enable pbs_server systemctl start pbs_server # Install Torque on worker nodes for i in {2..4}; do ssh 192.168.10.$i "yum install -y torque-mom torque-client; systemctl enable pbs_mom; systemctl start pbs_mom" done

5. Traffic Flow:

Forward Chain Traffic Flow in Both Directions

When we create the forward chain, iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT, this command let the traffic come into eth1 can be forward to eth0, which means traffic from worker nodes to master nodes, and master nodes forward it to institute internet. Here comes the question, where is the backward flow?

iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT

Let’s see how the traffic flows first.

Outbound Traffic (Worker → Internet)

The rule above allows packets to travel from the worker nodes (coming in on eth1) to be forwarded out to the institute network (through eth0). This handles the first half of any connection, which is the outbound request.

Return Traffic (Internet → Worker)

For the return traffic, typically we will think what we need as:

iptables -A FORWARD -i eth0 -d 192.168.10.0/24 -o eth1 -j ACCEPT

However, if we look closely at the original configuration, here is the command:

iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

This rule is handling the return traffic, because:

When a worker node initiates a connection, the outbound packet creates an entry in the connection tracking table (create a ESTABLISHED connection)
Any returning packets associated with that connection are marked as ESTABLISHED
The rule above allows all ESTABLISHED connections through, regardless of interface

This is more secure than explicitly allowing all traffic from eth0 to eth1, because it only permits return traffic for connections that were initiated from inside our cluster.

If this state tracking rule wasn't present, we would absolutely need to the backward traffic rule. Without either approach, connections would work one-way only, which means worker nodes could send requests, but never receive response…

Connection Flow

Let's trace a web request from a worker node:

Worker (192.168.10.2) tries to access google.com
Packet travels: Worker → Master's eth1
Master checks FORWARD chain, matches -i eth1 -s 192.168.10.0/24 -o eth0 rule
Master performs NAT, changing source IP to its own public IP
Packet leaves through eth0 to institute network
Google responds to master's public IP
Packet arrives at master's eth0
Master checks connection tracking table, sees this is a response
Packet is marked as ESTABLISHED
Master checks FORWARD chain, matches the ESTABLISHED rule
Master performs reverse NAT, changing destination to worker's IP
Packet leaves through eth1 to worker
Worker receives response

More details

more details can check the post before here.

DEV Community