7

I have an apache/nginx/whatever web server which logs client IP addresses to the access logs. Now these log files are rotated via logrotate.

I want to keep the IP addresses for some days, then after 7 days, I want to remove the IPs from the log files for privacy reasons (mostly dictated by German law).

Using mod_removeip or something like that doesn't work because I need to filter some requests based on their IP addresses.

Is there any 'standard' way to do it? Maybe even with logrotate?

EDIT

I just found this script but it depends on the ability to pipe all logging through the script in real-time. I'm not really sure about the performance implication of this approach.

Also, this only works for the 'front-end' server logs, not the application server logs.

3 Answers 3

2

PCRE! (Perl-Compatible Regular Expression)

s/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g 

Use that as a filter in a perl script or any other suitable language (quite a few use PCRE or some other close-enough regex language that will work) to rewrite your log files at 7 days.

$ cat > file_with_ip some text from 192.168.1.1 ^D $ perl -p -i -e 's/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g' file_with_ip $ cat file_with_ip some text from REMOVED IP 
1

On Ubuntu > 12.04 / apache 2.4, with default config you could use something like this:

for file in `find /var/log/apache2 -type f -name ".*gz" ! -name "*.ano.*" -mtime +7` do datestamp=`date +"%Y%m%d%H%M%s"` # echo Process $file zcat $file |sed -E "s/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.0.0/"|gzip > ${file%.*}.ano.${datestamp}.gz # rm -f $file # Only call this if you are sure that the command before succeeds, otherwise you will lose data. done 

This creates a copy of all *.gz files older then 7 days and replaces the last two bytes of all IPs 0.0 in the copied version with ano suffix added.

If you don't use compression or different compression like bz2 you have to change the commands accordingly, e.g. zcat -> bzcat.

Finally you can call this routine via cron once per day/week.

0

I don't think logrotate will do it; you may need to look at creating a script that will decompress the files, process them through awk or sed to strip the IP's out, then recompress them. Just can't do it on "active" log files.

3
  • 3
    I believe logrotate has pre/post hooks that you could use to launch the script you mention, then the OP wouldn't need to manage a separate process. Commented Feb 9, 2012 at 14:17
  • 3
    Maybe you can use logroate's "postrotate" for this. Commented Feb 9, 2012 at 14:21
  • i even thought of creating a "compress" script which filters and then pipes to gzip. this would essentially save the step of decompressing the logs but would 'kill' the time window of 7 days i want Commented Feb 9, 2012 at 14:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.