Skip to content

Conversation

@chrisgorgo
Copy link
Member

No description provided.

@satra
Copy link
Member

satra commented Nov 10, 2010

this gets to be a little more complicated. i'll try to fix this up tonight.

say fl = ['a/f.nii','b/f.nii','c/f.nii']
copyfiles(fl, '.', copy=False, create_new=True)
returns f.nii, fc.nii, fcc.nii
do it again and you get:
copyfiles(fl, '.', copy=False, create_new=True)
returns fccc.nii, fcccc.nii, fccccc.nii

these files are exactly the same as the original files.

ps. i don't like the c's increasing.

@chrisgorgo
Copy link
Member Author

You are right but this will never happen since we are deleting folders that are about to be rerun.

The problem with checking if two files are the same before overwriting (which I assume is the solution you are leaning to) is that it is almost impossible to do using timestamp method. I still think (and primarily use) this method because it is so much faster than reading and hashing file content for big files.

@satra
Copy link
Member

satra commented Nov 11, 2010

we are deleting folders, but there are two cases, where i think the delete doesn't take place but don't affect your solution:

  1. if a node crashes and we just load the node from the crashfile and call node.run()
  2. datasink, which also uses copyfile (i guess it will rarely be the case that inputs have the same name).

also talking of this change the split_filename function will fail on a file such as /b/c.d.nii.gz i think that function needs to become specific to neuroimaging extensions .img, .hdr, .nii, .nii.gz, .mgz, .mgh and others
could you please fix the split_filename function to properly account for extensions?

also would you mind using the following instead of adding 'c'

 base, fname, ext = split_filename(newfile) s = re.search('_c[0-9]{4,4}',fname) i = 0 if s: i = int(s.group()[2:])+1 fname += "_c%04d"%i 

(think 2000 con_0001.img :) )

@chrisgorgo
Copy link
Member Author

  1. If you do node.run() on an unfinished node the workdir should be deleted before execution. See https://github.com/nipy/nipype/blob/maint/0.3/nipype/pipeline/engine.py#L1061
  2. DataSink uses copyfile without create_new option specified and it is False by default, therefore DataSinks behaviour will not be affected. I dont like the DataSink always overwrite policy which breaks double workflow scenarios which use timestamp hashing (because copying changes mtime and ctime).

I have included the renaming changes you have suggested. I agree this is cleaner.

I have also slightly modified the split_filename function to correctly split your example. I have assumed that we'll have a list of multiple segment extensions (like nii.gz) and default to single segment extension (like .img). Afterall for uknown (non neuroimaging) extension we'd like it to still work ie split_filename("foo.txt") should give ".txt" as an extension.

…es/copyfile * 'fixes/copyfile' of github.com:chrisfilo/nipype: improved handling of extensions in split_filename incremental numbering instead of stacking "c"s BF: proper linking/copying multiple files of the same name into working directory Conflicts:	nipype/utils/filemanip.py
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants