We're looking at a long-term migration towards the cloud. The plan is to start small, and gradually move less essential parts of the infrastructure into the cloud. All good so far.
Part of this migration includes log files from web servers and whatnot. Keep in mind that the servers are still in datacenters outside the cloud. It should be easy to have a cron job grab log files at the end of each day, compress them, and shove them into Amazon S3, with a possible backup to Glacier. That's easy.
The problem occurs when S3 is the only place where you store the logs, and you want to search the logs for various events. If you don't know the time interval, you may have to download all logs from S3 for a comprehensive search, and that turns out to be expensive - moving your data into the cloud is cheap, getting it out of there is expensive.
Or I could setup an EC2 instance template. When someone wants to do a log search, fire up the instance, download the logs from S3 to it, and grep away. Downloading files from S3 to EC2 is cheap. But the download may take a while; also, again, if you don't know what you're looking for, you need to download a lot of logs, which means using lots of space in EBS.
Another way is to upload logs into DynamoDB or something. Price might be an issue. Another issue is that the logs are completely unstructured Apache and Squid logs, and the like - and so queries might take a very long time.
We're talking 500GB / year of compressed logs, storing up to 5 years.
To me, storing logs in the cloud like this starts to sound like a not very good idea. Maybe just use Glacier as a "tape backup", but keep the logs locally for now, on a couple hard drives.
Which way do you lean?