Skip to content

Checkpoint-restart with dmtcp #1134

@oesteban

Description

@oesteban

Hi there,

How would you feel about integrating dmtcp (https://conference.scipy.org/scipy2013/presentation_detail.php?id=201) with nipype?

I'm currently missing it for the following. When using mrtrix3 or any other multithreaded method in debian jessie, there seems to be a bug causing a deadlock with my processor. This is a bit random, and it'd be a great relief for interfaces that take a lot of time running if you could continue them from a recent checkpoint, instead from the beginning.

This is just a thought, and I don't see clearly how this would be implemented in distributed environments (maybe keeping track of which unit you sent the job, and use dmtcp locally?).

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions