2

I've got a very simple java 7 app that reads from a proprietary file format (Hadoop sequencefile) sitting on disk xvdb and creates millions of 2-20kb files on disk xvdf.

 ByteBuffer byteBuffer = ByteBuffer.wrap(imageBinary, 0, ((BytesWritable)value).getLength()); File imageFile = new File(filePath); FileOutputStream fos = new FileOutputStream( imageFile ); fos.getChannel().write(byteBuffer); fos.close(); 

Running iostat -d 30 shows that we're doing more than double reading on the disk than writing. There is no other activity on this volume than the application above which is only writing to this disk.

 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.40 0.00 3.07 0 92 xvdb 19.90 828.67 0.00 24860 0 xvdap3 0.00 0.00 0.00 0 0 xvdf 988.93 3538.93 1385.47 106168 41564 

mount options:

 /dev/xvdf on /mnt/ebs1 type ext4 (rw,noatime,nodiratime) 

1 Answer 1

3

Creating a file requires first determining whether or not that file already exists. Since these files are so small, the reading of metadata to determine how and where to create the file exceeds the tiny write done once the file has been created.

If you're familiar with data structures, thing about adding a tiny leaf node to a binary tree, B-tree, or similar structure. You're going to do a lot of reading to figure out where the leaf node goes, whether it's already in the tree, and so on. That will be much greater than the tiny amount of data in the leaf node.

2
  • Doesn't that meta data get cached by the OS? Ubuntu server in this case. Commented Mar 16, 2013 at 4:48
  • @davidparks21: Yes. It would certainly be much worse were that not true. (And making your metadata access pattern more cache friendly may reduce the reads significantly. Things such as your choice of filesystem, number of files per directory, metadata journaling options, and so on can make a huge difference.) Commented Mar 16, 2013 at 4:59

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.