4

I have a tricky dilemma. I've got some files on two different destination drives, copied from the same source drive. The source drive had been failing, so I used dd to copy over the data to one destination (with options conv=noerror,sync which fills error'd blocks with zero bytes) and I used ddrescue on the same source drive to copy data to a second partition, and I've heard that ddrescue also fills errors with zero bytes.

Now, I have two destination drives with near-duplicate data, except that some of the data on both of these destination drives is definitely different. I can only presume that the differences are going to be caused by those zero bytes, which seem to be located in different places amongst the data on these two destination drives. I can only presume that these differences are being caused by the parts of the files that have been zero-filled where errors had been encountered during copying. However, the zero-filled spots are different on the two separate destination drives. Most of that data consists of binary files. So some files on the source are fully intact while their counterparts on the destination are not, while other files are fully intact on the destination while their counterparts on the source are not. A lot of these files are binary files too.

Ideally, I'd like to synchronize both drives as follows:

  • Compare each file, bit-by-bit.
  • If the left file's bit is 1 and the right file's bit is 0, copy the 1 over to the right.
  • If the left file's bit is 0 and the right file's bit is 1, copy that 1 to the left, or at least keep the 1 on the right, if two-way synchronization isn't an option.

This functionality makes sense to me, but is there a utility that can handle this automatically? I thought about using rsync for this, but it seems that rsync only checks the file based on size & timestamp or by checksum, rather than bit-by-bit, and a simple checksum won't tell you where there are 0s when there should be 1s. I also looked into rdiff and bsdiff, both of which support binary files, but both of them seem to just output a diff file, rather than doing any actual copying/synchronizing.

So is there a utility in existence that does what I'm looking for, as described in my ideal syncing behavior described above? The OS shouldn't necessarily matter, as I have access to OSX, Windows and Ubuntu.

10
  • What you want to do is pretty rough -- it's difficult to determine which copy of the file is "correct" in this situation (is that zero supposed to be a zero, or is it an error?). What you're really asking for is a magic wand that will repair data loss, and I'm afraid that's not something that anyone can offer you in software, beyond what ddrescue already tried or what may be available through a commercial data recovery group. Commented Feb 8, 2011 at 16:38
  • Think about it though... If a zero is supposed to be there, and instead, a zero is there because of an error, both sides will have a zero, regardless of an error. Thus, it will stay a zero anyway. No erroneous data will have anything other than a zero. So technically only 1s will need to be copied over. Commented Feb 8, 2011 at 16:43
  • 1
    @purefusion the problem is the "supposed to be" part. Software doesn't know "supposed to", it knows "is" and "is not". In the pathological case (two copies of a file are bitwise NOTs of eachother) your algorithm above will produce a file that's all 1s -- that's almost certainly not what you want... Commented Feb 8, 2011 at 16:46
  • I don't see how that would be the case. If both sides' bits are the same, why would it need to change anything for those bits? It should know that both sides have a zero already, and thus not change anything. Commented Feb 8, 2011 at 16:50
  • @purefusion -- you've got a very special case where the only errors you expect are zeroed-out blocks. I don't think any existing programs are built to deal with this, since it's such a narrow problem. But even if you do make or find something, what are you going to do about overlapping bad areas from your two sources? Commented Feb 8, 2011 at 17:11

2 Answers 2

3

It almost sounds like what you want is a tool, that will retrieve each block of both files, and then do a bitwise OR on each block, and send the output to a new file.

The psuedo-code might look like below. Nothing would happen to identical bits, and bits that where not identical a bit would be set to 1.

while not end-of-files: read block file_a read block file_b merged_block = file_a bitwise_or file_b write merged_block to file_c 
3
  • I was just browsing en.wikipedia.org/wiki/Bitwise_operation and noticed the bitwise OR, which seemed to be what I was after. Of course, finding or scripting a utility that does this in a file-comparative fashion is another story. My programming experience is limited to the Web up until now. What would you recommend if I were to script something like this myself? Commented Feb 8, 2011 at 17:39
  • Python should be pretty easy for something like this. Commented Feb 8, 2011 at 17:43
  • I know a bit of python, maybe I'll give it a go... Commented Feb 8, 2011 at 17:52
0

Rsync should let you do one way syncing. I believe it has a check option too also, to tell you if files differ.

7
  • If you had read the question, I already mentioned why rsync would NOT work. It doesn't sync bit-by-bit. It will only replace whole files, as far as I'm aware... Commented Feb 8, 2011 at 16:40
  • rsync will make file A into a copy of file B, but that doesn't seem to be what purefusion wants here... Commented Feb 8, 2011 at 16:43
  • 1
    Rsync does, in fact, use an intelligent binary-diff algorithm, although it works on a block basis rather than bit-by-bit. rsync.samba.org/tech_report/node2.html Commented Feb 8, 2011 at 17:00
  • Ah right, hense the --block-size option. Well, since I was using a block size of 512 using DD, I wonder if simply using the same block size with rsync would allow just those blocks to be copied over? After all, when errors are encountered in DD, the whole block would be filled with 0s... ah, but then again can rsync know which blocks are zero-filled vs not? Commented Feb 8, 2011 at 17:05
  • Yeah, rsync doesn't care about your special-case. Commented Feb 8, 2011 at 17:30

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.