Message 139800 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nirai
Recipients	Giovanni.Bajo, avian, bobbyi, gregory.p.smith, neologix, nirai, pitrou, sdaoden, vstinner
Date	2011-07-04.19:41:36
SpamBayes Score	7.038814e-14
Marked as misclassified	No
Message-id	<1309808497.69.0.0743071616941.issue6721@psf.upfronthosting.co.za>
In-reply-to

Content
> Sorry, I fail to see how the "import graph" is related to the correct > lock acquisition order. Some locks are created dynamically, for > example. Import dependency is a reasonable heuristic to look into for inter-module locking order. The rational is explained in the following pthread_atfork man page: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html "A higher-level package may acquire locks on its own data structures before invoking lower-level packages. Under this scenario, the order specified for fork handler calls allows a simple rule of initialization for avoiding package deadlock: a package initializes all packages on which it depends before it calls the pthread_atfork() function for itself." (The rational section is an interpretation which is not part of the standard) A caveat is that since Python is an object oriented language it is more common than with C that code from a higher level module will be invoked by code from a lower level module, for example by calling an object method that was over-ridden by the higher level module - this actually happens in the logging module (emit method). > That's why I asked for a specific API: when do you register a handler? > When are they called? When are they reset? Read the pthread_atfork man page. > The whole point of atfork is to avoid breaking invariants and > introduce invalid state in the child process. If there is one thing we > want to avoid, it's precisely reading/writting corrupted data from/to > files, so eluding the I/O problem seems foolish to me. Please don't use insulting adjectives. If you think I am wrong, convincing me logically will do. you can "avoid breaking invariants" using two different strategies: 1) Acquire locks before the fork and release/reset them after it. 2) Initialize the module to some known state after the fork. For some (most?) modules it may be quite reasonable to initialize the module to a known state after the fork without acquiring its locks before the fork; this too is explained in the pthread_atfork man page: "Alternatively, some libraries might be able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed)." > > A "critical section" lock that protects in-memory data should not be held for long. > > Not necessarily. See for example I/O locks and logging module, which > hold locks until I/O completion. Oops, I have always used the term "critical section" to describe a lock that protects data state as tightly as possible, ideally not even across function calls but now I see the Wikipedia defines one to protect any resource including IO. The logging module locks the entire emit() function which I think is wrong. It should let the derived handler take care of locking when it needs to, if it needs to at all. The logging module is an example for a module we should reinitialize after the fork without locking its locks before the fork.

> Sorry, I fail to see how the "import graph" is related to the correct > lock acquisition order. Some locks are created dynamically, for > example. Import dependency is a reasonable heuristic to look into for inter-module locking order. The rational is explained in the following pthread_atfork man page: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html "A higher-level package may acquire locks on its own data structures before invoking lower-level packages. Under this scenario, the order specified for fork handler calls allows a simple rule of initialization for avoiding package deadlock: a package initializes all packages on which it depends before it calls the pthread_atfork() function for itself." (The rational section is an interpretation which is not part of the standard) A caveat is that since Python is an object oriented language it is more common than with C that code from a higher level module will be invoked by code from a lower level module, for example by calling an object method that was over-ridden by the higher level module - this actually happens in the logging module (emit method). > That's why I asked for a specific API: when do you register a handler? > When are they called? When are they reset? Read the pthread_atfork man page. > The whole point of atfork is to avoid breaking invariants and > introduce invalid state in the child process. If there is one thing we > want to avoid, it's precisely reading/writting corrupted data from/to > files, so eluding the I/O problem seems foolish to me. Please don't use insulting adjectives. If you think I am wrong, convincing me logically will do. you can "avoid breaking invariants" using two different strategies: 1) Acquire locks before the fork and release/reset them after it. 2) Initialize the module to some known state after the fork. For some (most?) modules it may be quite reasonable to initialize the module to a known state after the fork without acquiring its locks before the fork; this too is explained in the pthread_atfork man page: "Alternatively, some libraries might be able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed)." > > A "critical section" lock that protects in-memory data should not be held for long. > > Not necessarily. See for example I/O locks and logging module, which > hold locks until I/O completion. Oops, I have always used the term "critical section" to describe a lock that protects data state as tightly as possible, ideally not even across function calls but now I see the Wikipedia defines one to protect any resource including IO. The logging module locks the entire emit() function which I think is wrong. It should let the derived handler take care of locking when it needs to, if it needs to at all. The logging module is an example for a module we should reinitialize after the fork without locking its locks before the fork.

History
Date	User	Action	Args
2011-07-04 19:41:37	nirai	set	recipients: + nirai, gregory.p.smith, pitrou, vstinner, bobbyi, neologix, Giovanni.Bajo, sdaoden, avian
2011-07-04 19:41:37	nirai	set	messageid: <1309808497.69.0.0743071616941.issue6721@psf.upfronthosting.co.za>
2011-07-04 19:41:37	nirai	link	issue6721 messages
2011-07-04 19:41:36	nirai	create