I know how to run two or more tomcats under one apache server. I want to know how to run two or more apache servers in a cluster and run my application. I know it's possible. can someone suggest a simple tutorial? there are many articles for apache - tomcat integration, but not for apache clustering. :( it would be great if you guys suggested a basic tutorial. thanks.
4 Answers
get a software or hardware load balancer and put it in front of the apache servers.
simple tutorial:
1) install free load balancer: "balance" from http://www.inlab.de/balance-3.42.tar.gz
2) run: balance -f 80 server1:80 server2:80 server3:80
advanced tutorial: The problem with the solution above is that it introduces a single point of failure. If your "balance" server dies, then you cannot access anything behind it.
If you want to go with a home grown solution GPL, then you need to run "vvrpd", "heartbeat" or some other failover cluster solution (or google: linux load balancing).
If you want to stay with a software solution, but have money, you could look at Redat Cluster, Veritas Cluster, or some other vendors cluster software.
Your best/most reliable solution though is to get a pair of hardware load balancers (which will do failover automatically).
-  Great answer, I vote the OP makes this the winner.. I work with clustering all day long this is a great simple suggestion that's legit.Ethode– Ethode2015-04-12 12:59:20 +00:00Commented Apr 12, 2015 at 12:59
I'm assuming that you want to run this on multiple machines (running multiple Apache instances on the one machine makes very little sense).
Unfortunately, you can't find a simple tutorial for HA clustering because it's not a simple topic. You've got a lot of different concepts, ranging from low-level network stuff to application-specific items like shared data storage.
I started my clustering adventures at http://linuxvirtualserver.org/. You might like to start at http://www.linuxvirtualserver.org/architecture.html and go from there.
Looking at a pair of hardware appliances will certainly be a good start. You can do it in software, but you may find that your networking team (if you have one) may object to you performing network-related functions on non-approved equipment -- this can limit failover technologies.
Apache (httpd) itself doesn't know anything about clustering httpd instances. You should therefore have a way (ie. a script) to synchronise the config from one node to the other worker nodes. I'll assume that your httpd instances are working as a reverse-proxy, although that doesn't have to be the case).
You will certainly want to invest some time and energy in centralising the logging, and for this I have found the ELK stack to be tremendously useful. Here's a fairly extended CustomLog declaration -- everything in there has proved to be useful at some point.
# Note also that httpd will escape " to \", plus various others... (see the docs), # which conveniently matches up with JSON's requirements. # (well, almost; it doesn't do Unicode correctly) # # THINGS TO NOTE/CHECK/ADD/REMOVE: # Any session cookies are good to log # Example: JSESSION (change/remove as required) # Any particular HTTP request headers (particularly for servers behind a reverse proxy) # my_application_stack # Set this to something obvious for the stack you're working on (eg. 'main_website') # # Also, you'll want to remove the ### comments below, and ensure that the \ is the last character on each line # LogFormat "{ \ \"@timestamp\":\"%{%FT%T%z}t\", \ \"client_ip\":\"%a\", \ \"client_port\":\"%{remote}p\", \ \"server_ip\":\"%A\", \ \"X-Forwarded-For\":\"%{X-Forwarded-For}i\", \ \"user\":\"%u\", \ ### Note: probably not useful unless Apache is doing auth (eg. Basic auth) \"JSESSIONID\":\"%{JSESSIONID}C\", \ ### CHANGE \"pid\":\"%p\", \ \"protocol\":\"%H\", \ \"http_method\":\"%m\", \ \"vhost\":\"%{Host}i\", \ \"service_port\":\"%p\", \ \"path\":\"%U\", \ \"query_string\":\"%q\", \ \"referer\":\"%{Referer}i\", \ \"user_agent\":\"%{User-agent}i\", \ \"response_code\":\"%>s\", \ \"response_location\":\"%{Location}o\", \ \"Content-Type\":\"%{Content-Type}o\", \ \"bytes_in\":\"%I\", \ \"bytes_out\":\"%O\", \ \"keepalive\":\"%X\", \ \"duration_micros\":\"%D\", \ \"my_application_stack\":\"change_me\", \ ### CHANGE \"my_environment\":\"prod\" \ ### CHANGE }" my_logstash_json CustomLog logs/access_log.logstash_json my_logstash_json To log cache hit/miss in Apache 2.2 (this is better in 2.4, but we don't use it):
SetEnv CACHE_MISS 1 And then include %{CACHE_MISS}e in your LogFormat
Don't forget a logrotate rule:
/var/log/httpd/access_log.logstash_json { rotate 1 # We do reload httpd after each rotation, so we don't want it to be # too frequent, so set to a reasonably chunky value size 10M nocompress missingok notifempty sharedscripts postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript } You would then need to choose a log-shipper (there are many to chose from), which will tail the log-file and send them off to the ELK stack.
Note that sometimes comparing configurations is very useful to do... or rather auditing for differences, or just finding out where something is set. For that, I have a tool I wrote called httpd-dump-config, which I use very frequently when configuring the over 1000 lines in our reverse-proxy config.
Use DNS Round robin if you have more than one IP address,
Otherwise I would encourage you to use Pacemaker+Corosync with IP and Apache failover
How To Set Up an Apache Active-Passive Cluster Using Pacemaker on CentOS 7




