store log files in the cloud, when servers are not in the cloud

Question

We're looking at a long-term migration towards the cloud. The plan is to start small, and gradually move less essential parts of the infrastructure into the cloud. All good so far.

Part of this migration includes log files from web servers and whatnot. Keep in mind that the servers are still in datacenters outside the cloud. It should be easy to have a cron job grab log files at the end of each day, compress them, and shove them into Amazon S3, with a possible backup to Glacier. That's easy.

The problem occurs when S3 is the only place where you store the logs, and you want to search the logs for various events. If you don't know the time interval, you may have to download all logs from S3 for a comprehensive search, and that turns out to be expensive - moving your data into the cloud is cheap, getting it out of there is expensive.

Or I could setup an EC2 instance template. When someone wants to do a log search, fire up the instance, download the logs from S3 to it, and grep away. Downloading files from S3 to EC2 is cheap. But the download may take a while; also, again, if you don't know what you're looking for, you need to download a lot of logs, which means using lots of space in EBS.

Another way is to upload logs into DynamoDB or something. Price might be an issue. Another issue is that the logs are completely unstructured Apache and Squid logs, and the like - and so queries might take a very long time.

We're talking 500GB / year of compressed logs, storing up to 5 years.

To me, storing logs in the cloud like this starts to sound like a not very good idea. Maybe just use Glacier as a "tape backup", but keep the logs locally for now, on a couple hard drives.

Which way do you lean?

EEAA · Accepted Answer · 2014-04-10 17:55:53Z

Logstash + ElasticSearch + Kibana. That is the combination you need.

Sounds like you're looking towards AWS, so build up a modest EC2 cluster for this - perhaps a single logstash "router/broker" box and two Elasticsearch cluster nodes.

Keep a reasonable amount of data "online" in ES indexes, and then archive older indexes to S3. ElasticSearch supports this out of the box and is able to (relatively seamlessly) export and import data from S3.

As far as how to get logs up to EC2, I'd just use an IPsec tunnel from your self-hosted servers to the EC2 cluster and send logs using whatever protocol you'd like. Logstash has broad support for a bunch of input formats.

Thank you! We've talked about using Splunk before, so this combination might be acceptable (since it's sort-of in the same ballpark). Additionally, it sounds like it might also work and be useful after we've migrated the whole thing into the cloud. Question from someone not familiar with ES: in addition to indexes, I'd have to also store the actual data in S3, right? And that might introduce some overhead? Also, is the data stored in compressed form? — Florin Andrei
– Florin Andrei, Commented Apr 10, 2014 at 18:22
No, active data in ES is stored on the local filesystem. Archived data can be in S3. — EEAA
– EEAA, Commented Apr 10, 2014 at 18:59

Michael Martinez · Accepted Answer · 2014-04-10 18:51:18Z

1

What about running a splunk server in the cloud, keep it up all the time on a small instance, can use EBS volumes, or even try S3 volumes if I/o doesn't become a bottleneck.

answered Apr 10, 2014 at 18:51

Michael Martinez

2,7654 gold badges30 silver badges40 bronze badges

That also sounds like a doable thing, thank you. I'm concerned about price. It's up to 2.5TB of compressed logs (for a 5 year interval). Just storing the zipped files in S3 is cheap - but then of course it's hard to search. I really need to figure out how much more expensive is it to keep data in large S3 volumes.

Florin Andrei
– Florin Andrei

2014-04-10 18:56:25 +00:00
Commented Apr 10, 2014 at 18:56
1

You may be able to run the splunk server on mounted S3 volumes (again, if I/o doesn't become a bottleneck). Also doesn't Splunk support compression or storage alternatives to save space? There are tools for searching Splunk logs, so retrieving the data is not an issue.

Michael Martinez
– Michael Martinez

2014-04-10 19:14:20 +00:00
Commented Apr 10, 2014 at 19:14

Add a comment |

Torindo · Accepted Answer · 2014-04-22 20:35:39Z

We lunched a cloud log management service (slicklog.com) which might address your issues. When your logs arrive on the platform,

they are searchable
downloadable as many times as you wish
you can visualise some statistics such as Min, Max, Count, Sum, Percentiles Etc.

After a period of time (min 1 month) configurable via the UI, the logs are archived ($0.001/GB no compressed), when the logs are archived

are not searchable.
can be download as many times as you wish at no additional cost.

Given your requirements, it might be considered a solution if you can live with the fact that the archived logs files are not searchable.

Hope this helps.

Stack Exchange Network

store log files in the cloud, when servers are not in the cloud

3 Answers 3

You must log in to answer this question.

Hot Network Questions

store log files in the cloud, when servers are not in the cloud

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions