On a daily basis we generate about 3.4 Million small jpeg files. We also delete about 3.4 Million 90 day old images. To date, we've dealt with this content by storing the images in a hierarchical manner. The heriarchy is something like this:
/Year/Month/Day/Source/ This heirarchy allows us to effectively delete days worth of content across all sources.
The files are stored on a Windows 2003 server connected to a 14 disk SATA RAID6.
We've started having significant performance issues when writing-to and reading-from the disks.
This may be due to the performance of the hardware, but I suspect that disk fragmentation may be a culprit at well.
Some people have recommended storing the data in a database, but I've been hesitant to do this. An other thought was to use some sort of container file, like a VHD or something.
Does anyone have any advice for mitigating this kind of fragmentation?
Additional Info:
The average file size is 8-14KB
Format information from fsutil:
NTFS Volume Serial Number : 0x2ae2ea00e2e9d05d Version : 3.1 Number Sectors : 0x00000001e847ffff Total Clusters : 0x000000003d08ffff Free Clusters : 0x000000001c1a4df0 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x000000208f020000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x000000001e847fff Mft Zone Start : 0x0000000002163b20 Mft Zone End : 0x0000000007ad2000