* [PATCH 0/3] avoid bogus "recursion detected in die handler" message @ 2013-04-15 23:06 Jeff King 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Jeff King @ 2013-04-15 23:06 UTC (permalink / raw) To: git; +Cc: Brandon Casey If upload-pack fails while generating a pack, the receiving side of a fetch will print out a message like this: fatal: git remote host hung up unexpectedly fatal: recursion detected in die handler You can see it by instrumenting upload-pack to fail like this: diff --git a/upload-pack.c b/upload-pack.c index bfa6279..721b416 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -777,6 +777,7 @@ static void upload_pack(void) receive_needs(); if (want_obj.nr) { get_common_commits(); + exit(0); create_pack_file(); } } and doing a clone or fetch. What's actually happening is that we have two threads running, each of which dies, fooling the recursion check. This series attempts to fix it. The errors afterwards look like: fatal: git remote host hung up unexpectedly fatal: index-pack failed which is what was printed pre-cd163d4 (usage.c: detect recursion in die routines and bail out immediately). -Peff ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 1/3] usage: refactor die-recursion checks 2013-04-15 23:06 [PATCH 0/3] avoid bogus "recursion detected in die handler" message Jeff King @ 2013-04-15 23:08 ` Jeff King 2013-04-15 23:45 ` Eric Sunshine 2013-04-16 0:11 ` Brandon Casey 2013-04-15 23:08 ` [PATCH 2/3] run-command: factor out running_main_thread function Jeff King 2013-04-15 23:09 ` [PATCH 3/3] usage: do not check die recursion outside main thread Jeff King 2 siblings, 2 replies; 23+ messages in thread From: Jeff King @ 2013-04-15 23:08 UTC (permalink / raw) To: git; +Cc: Brandon Casey When any git code calls die(), we chain to a custom die_routine, which we expect to print a message and exit the program. To avoid infinite loops, we detect a recursive call to die() with a simple counter, and break out of the loop by printing a message and exiting ourselves, without chaining to the die_routine. But the user does not get to see the message that would have been fed to the die_routine, which makes debugging harder. The user does not know if it was a true infinite loop, or simply a single re-entrant call, since they cannot compare the messages. Furthermore, if we are wrong about detecting the recursion, we have blocked the user from seeing the original message, which is probably the more useful one. This patch teaches die() to print the original die message to stderr before reporting the recursion. The custom die_routine may or may not have put it the message to stderr, but this is the best we can do (it is what most handlers will do anyway, and it is where our recursion error will go). While we're at it, let's mark the "recursion detected" message as a "BUG:", since it should never happen in practice. And let's factor out the repeated code in die and die_errno. This loses the information of which function was called to cause the recursion, but it's important; knowing the actual message fed to the function (which we now do) is much more useful, as it can generally pin-point the actual call-site that caused the recursion. Signed-off-by: Jeff King <peff@peff.net> --- This helped me debug the current problem. And factoring it out helps with patch 3. :) usage.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/usage.c b/usage.c index 40b3de5..c6b7ac5 100644 --- a/usage.c +++ b/usage.c @@ -6,8 +6,6 @@ #include "git-compat-util.h" #include "cache.h" -static int dying; - void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; @@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...) usagef("%s", err); } +static void check_die_recursion(const char *fmt, va_list ap) +{ + static int dying; + + if (!dying++) + return; + + vreportf("fatal: ", fmt, ap); + fputs("BUG: recursion detected in die handler\n", stderr); + exit(128); +} + void NORETURN die(const char *err, ...) { va_list params; - if (dying) { - fputs("fatal: recursion detected in die handler\n", stderr); - exit(128); - } - dying = 1; - va_start(params, err); + check_die_recursion(err, params); die_routine(err, params); va_end(params); } @@ -102,13 +107,6 @@ void NORETURN die_errno(const char *fmt, ...) char str_error[256], *err; int i, j; - if (dying) { - fputs("fatal: recursion detected in die_errno handler\n", - stderr); - exit(128); - } - dying = 1; - err = strerror(errno); for (i = j = 0; err[i] && j < sizeof(str_error) - 1; ) { if ((str_error[j++] = err[i++]) != '%') @@ -126,6 +124,7 @@ void NORETURN die_errno(const char *fmt, ...) snprintf(fmt_with_err, sizeof(fmt_with_err), "%s: %s", fmt, str_error); va_start(params, fmt); + check_die_recursion(fmt_with_err, params); die_routine(fmt_with_err, params); va_end(params); } -- 1.8.2.8.g44e4c28 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King @ 2013-04-15 23:45 ` Eric Sunshine 2013-04-15 23:47 ` Jeff King 2013-04-16 0:11 ` Brandon Casey 1 sibling, 1 reply; 23+ messages in thread From: Eric Sunshine @ 2013-04-15 23:45 UTC (permalink / raw) To: Jeff King; +Cc: Git List, Brandon Casey On Mon, Apr 15, 2013 at 7:08 PM, Jeff King <peff@peff.net> wrote: > This patch teaches die() to print the original die message > to stderr before reporting the recursion. The custom > die_routine may or may not have put it the message to s/put it the/emitted/ perhaps? > stderr, but this is the best we can do (it is what most > handlers will do anyway, and it is where our recursion error > will go). > > While we're at it, let's mark the "recursion detected" > message as a "BUG:", since it should never happen in > practice. And let's factor out the repeated code in die and > die_errno. This loses the information of which function was > called to cause the recursion, but it's important; knowing Was this supposed to be s/important/unimportant/? > the actual message fed to the function (which we now do) is > much more useful, as it can generally pin-point the actual > call-site that caused the recursion. > > Signed-off-by: Jeff King <peff@peff.net> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-15 23:45 ` Eric Sunshine @ 2013-04-15 23:47 ` Jeff King 0 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2013-04-15 23:47 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, Brandon Casey On Mon, Apr 15, 2013 at 07:45:03PM -0400, Eric Sunshine wrote: > On Mon, Apr 15, 2013 at 7:08 PM, Jeff King <peff@peff.net> wrote: > > This patch teaches die() to print the original die message > > to stderr before reporting the recursion. The custom > > die_routine may or may not have put it the message to > > s/put it the/emitted/ perhaps? I meant s/ it//, but I think "sent the message to..." is probably more clear. > > stderr, but this is the best we can do (it is what most > > handlers will do anyway, and it is where our recursion error > > will go). > > > > While we're at it, let's mark the "recursion detected" > > message as a "BUG:", since it should never happen in > > practice. And let's factor out the repeated code in die and > > die_errno. This loses the information of which function was > > called to cause the recursion, but it's important; knowing > > Was this supposed to be s/important/unimportant/? Urgh, yes, it was originally "not important" but I lost the "not" while trying to clarify the wording. Thanks. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King 2013-04-15 23:45 ` Eric Sunshine @ 2013-04-16 0:11 ` Brandon Casey 2013-04-16 0:42 ` Jeff King 1 sibling, 1 reply; 23+ messages in thread From: Brandon Casey @ 2013-04-16 0:11 UTC (permalink / raw) To: Jeff King; +Cc: git On Mon, Apr 15, 2013 at 4:08 PM, Jeff King <peff@peff.net> wrote: > When any git code calls die(), we chain to a custom > die_routine, which we expect to print a message and exit the > program. To avoid infinite loops, we detect a recursive call > to die() with a simple counter, and break out of the loop by > printing a message and exiting ourselves, without chaining > to the die_routine. > > But the user does not get to see the message that would have > been fed to the die_routine, which makes debugging harder. > The user does not know if it was a true infinite loop, or > simply a single re-entrant call, since they cannot compare > the messages. Furthermore, if we are wrong about detecting > the recursion, we have blocked the user from seeing the > original message, which is probably the more useful one. > > This patch teaches die() to print the original die message > to stderr before reporting the recursion. The custom > die_routine may or may not have put it the message to > stderr, but this is the best we can do (it is what most > handlers will do anyway, and it is where our recursion error > will go). > > While we're at it, let's mark the "recursion detected" > message as a "BUG:", since it should never happen in > practice. And let's factor out the repeated code in die and > die_errno. This loses the information of which function was > called to cause the recursion, but it's important; knowing > the actual message fed to the function (which we now do) is > much more useful, as it can generally pin-point the actual > call-site that caused the recursion. > > Signed-off-by: Jeff King <peff@peff.net> > --- > This helped me debug the current problem. And factoring it out helps > with patch 3. :) > > usage.c | 29 ++++++++++++++--------------- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/usage.c b/usage.c > index 40b3de5..c6b7ac5 100644 > --- a/usage.c > +++ b/usage.c > @@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...) > usagef("%s", err); > } > > +static void check_die_recursion(const char *fmt, va_list ap) > +{ > + static int dying; > + > + if (!dying++) > + return; > + > + vreportf("fatal: ", fmt, ap); How do you know it's safe to call vreportf() ? If the bug is in the vreportf code path, we will recurse infinitely (at least until the stack is used up). An implementation of vsnprintf exists in compat/snprintf.c for example. It's nice to print out the error message here, but I think doing so defeats the purpose of this "dying" check. Better to get the stack trace from a core dump. > + fputs("BUG: recursion detected in die handler\n", stderr); > + exit(128); > +} > + -Brandon ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 0:11 ` Brandon Casey @ 2013-04-16 0:42 ` Jeff King 2013-04-16 1:41 ` Jonathan Nieder 2013-04-16 2:34 ` Brandon Casey 0 siblings, 2 replies; 23+ messages in thread From: Jeff King @ 2013-04-16 0:42 UTC (permalink / raw) To: Brandon Casey; +Cc: git On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote: > > +static void check_die_recursion(const char *fmt, va_list ap) > > +{ > > + static int dying; > > + > > + if (!dying++) > > + return; > > + > > + vreportf("fatal: ", fmt, ap); > > How do you know it's safe to call vreportf() ? Because I hand-audited it. But I think the more important question you are getting at is: how do I know that it will remain safe to call vreportf? > If the bug is in the vreportf code path, we will recurse infinitely > (at least until the stack is used up). An implementation of vsnprintf > exists in compat/snprintf.c for example. Right. My assumption was that we are primarily interested in protecting against the die_routine. Compat functions should never be calling die. Of course that is assuming nobody violates that rule, which is part of the point of the check. > It's nice to print out the error message here, but I think doing so > defeats the purpose of this "dying" check. Better to get the stack > trace from a core dump. Easier said than done, sometimes, if you are debugging somebody else's problem from a dump of a terminal session. :) But I can live with dropping this patch; my primary goal is the bug-fix in patch three. I was also tempted to suggest just dropping the recursion check altogether. While it is neat to detect such things, it's a "should never happen" bug situation, and an infinite loop of printing out the same message is pretty easy to notice. Do you remember what spurred the original check? The message in cd163d4 doesn't say. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 0:42 ` Jeff King @ 2013-04-16 1:41 ` Jonathan Nieder 2013-04-16 2:34 ` Brandon Casey 1 sibling, 0 replies; 23+ messages in thread From: Jonathan Nieder @ 2013-04-16 1:41 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Casey, git Jeff King wrote: > I was also tempted to suggest just dropping the recursion check > altogether. While it is neat to detect such things, it's a "should never > happen" bug situation, and an infinite loop of printing out the same > message is pretty easy to notice. On servers it might be useful to avoid accidentally filling up logs quickly. IIUC the context is in the following two threads: http://thread.gmane.org/gmane.comp.version-control.git/182982 http://thread.gmane.org/gmane.comp.version-control.git/181421/focus=181443 Thanks, Jonathan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 0:42 ` Jeff King 2013-04-16 1:41 ` Jonathan Nieder @ 2013-04-16 2:34 ` Brandon Casey 2013-04-16 2:50 ` Jeff King 1 sibling, 1 reply; 23+ messages in thread From: Brandon Casey @ 2013-04-16 2:34 UTC (permalink / raw) To: Jeff King; +Cc: git On Mon, Apr 15, 2013 at 5:42 PM, Jeff King <peff@peff.net> wrote: > On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote: > >> > +static void check_die_recursion(const char *fmt, va_list ap) >> > +{ >> > + static int dying; >> > + >> > + if (!dying++) >> > + return; >> > + >> > + vreportf("fatal: ", fmt, ap); >> >> How do you know it's safe to call vreportf() ? > > Because I hand-audited it. :) > But I think the more important question you > are getting at is: how do I know that it will remain safe to call > vreportf? Right. >> If the bug is in the vreportf code path, we will recurse infinitely >> (at least until the stack is used up). An implementation of vsnprintf >> exists in compat/snprintf.c for example. > > Right. My assumption was that we are primarily interested in protecting > against the die_routine. Compat functions should never be calling die. I think the rule we have been enforcing is less strict than that. We have only said that any compat function in a die handler path should never call die. But maybe that's what you meant. > Of course that is assuming nobody violates that rule, which is part of > the point of the check. > >> It's nice to print out the error message here, but I think doing so >> defeats the purpose of this "dying" check. Better to get the stack >> trace from a core dump. > > Easier said than done, sometimes, if you are debugging somebody else's > problem from a dump of a terminal session. :) > > But I can live with dropping this patch; my primary goal is the bug-fix > in patch three. > > I was also tempted to suggest just dropping the recursion check > altogether. While it is neat to detect such things, it's a "should never > happen" bug situation, and an infinite loop of printing out the same > message is pretty easy to notice. Do you remember what spurred the > original check? The message in cd163d4 doesn't say. That's a valid option. The primary motivation was that Hannes Sixt had to step in and point out yet again that the high-level memory allocators should not be called in anything that could be in a die handler code path. I was on the thread, but I don't remember the topic (ah, Jonathan has stepped in with the answer). I do remember that I was not the only one who had forgotten about that rule though. We didn't actually have someone report that they encountered infinite recursion, but it seemed easy enough to add a check for it, so... Unfortunately, I didn't realize that the async functions installed their own die handler which may not exit the process, allowing die to be called multiple times. To implement this check correctly/completely (i.e. detect recursion in the main thread as well as in any child threads), I think you really do need to use thread-local storage as you mentioned in 3/3 which could look something like: static pthread_key_t dying; static pthread_once_t dying_once = PTHREAD_ONCE_INIT; void setup_die_counter(void) { pthread_key_create(&dying, NULL); } check_die_recursion(void) { pthread_once(&dying_once, setup_die_counter); if (pthread(getspecific(dying)) { puts("BUG: recursion..."); exit(128); } pthread_setspecific(dying, &dying); } or maybe the setup could be performed in set_die_routine(), but it does kinda seem like overkill for a "nicety" like this. So maybe checking for recursion in just the main thread as this series does is better than nothing. -Brandon ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 2:34 ` Brandon Casey @ 2013-04-16 2:50 ` Jeff King 2013-04-16 7:18 ` Johannes Sixt 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2013-04-16 2:50 UTC (permalink / raw) To: Brandon Casey; +Cc: git On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote: > > Right. My assumption was that we are primarily interested in protecting > > against the die_routine. Compat functions should never be calling die. > > I think the rule we have been enforcing is less strict than that. We > have only said that any compat function in a die handler path should > never call die. But maybe that's what you meant. No, I assumed we were following the stronger rule. If you are a compat function for a C library function, then you should never need to die. You should be conforming to the existing interface, which will have some mechanism for passing back an error. > The primary motivation was that Hannes Sixt had to step in and point > out yet again that the high-level memory allocators should not be > called in anything that could be in a die handler code path. I was on > the thread, but I don't remember the topic (ah, Jonathan has stepped > in with the answer). I do remember that I was not the only one who > had forgotten about that rule though. Yeah, it is subtle enough that it may be worth protecting against. > To implement this check correctly/completely (i.e. detect recursion in > the main thread as well as in any child threads), I think you really > do need to use thread-local storage as you mentioned in 3/3 which > could look something like: > > static pthread_key_t dying; > static pthread_once_t dying_once = PTHREAD_ONCE_INIT; > > void setup_die_counter(void) > { > pthread_key_create(&dying, NULL); > } > > check_die_recursion(void) > { > pthread_once(&dying_once, setup_die_counter); > if (pthread(getspecific(dying)) { > puts("BUG: recursion..."); > exit(128); > } > > pthread_setspecific(dying, &dying); > } Yeah, that seems sane; my biggest worry was that it would create headaches for Windows folks, who would have to emulate pthread_key. But it seems like we already added support in 9ba604a. I'll try to re-work the series with thread-local storage, and I'll leave off the extra printing. This _should_ never happen, so if we are going to put in the check, it is probably better to be more thorough than to worry about what the error message looks like. Thanks for looking it over. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 2:50 ` Jeff King @ 2013-04-16 7:18 ` Johannes Sixt 2013-04-16 13:01 ` Jeff King 0 siblings, 1 reply; 23+ messages in thread From: Johannes Sixt @ 2013-04-16 7:18 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Casey, git Am 4/16/2013 4:50, schrieb Jeff King: > On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote: > >>> Right. My assumption was that we are primarily interested in protecting >>> against the die_routine. Compat functions should never be calling die. >> >> I think the rule we have been enforcing is less strict than that. We >> have only said that any compat function in a die handler path should >> never call die. But maybe that's what you meant. > > No, I assumed we were following the stronger rule. If you are a compat > function for a C library function, then you should never need to die. > You should be conforming to the existing interface, which will have some > mechanism for passing back an error. This rule has been violated LOOOONG ago, and not only in compat/mingw.c (see xmalloc in compat/qsort.c, for example). >> The primary motivation was that Hannes Sixt had to step in and point >> out yet again that the high-level memory allocators should not be >> called in anything that could be in a die handler code path. I was on >> the thread, but I don't remember the topic (ah, Jonathan has stepped >> in with the answer). I do remember that I was not the only one who >> had forgotten about that rule though. > > Yeah, it is subtle enough that it may be worth protecting against. Too late. >> To implement this check correctly/completely (i.e. detect recursion in >> the main thread as well as in any child threads), I think you really >> do need to use thread-local storage as you mentioned in 3/3 which >> could look something like: >> >> static pthread_key_t dying; >> static pthread_once_t dying_once = PTHREAD_ONCE_INIT; >> >> void setup_die_counter(void) >> { >> pthread_key_create(&dying, NULL); >> } >> >> check_die_recursion(void) >> { >> pthread_once(&dying_once, setup_die_counter); >> if (pthread(getspecific(dying)) { >> puts("BUG: recursion..."); >> exit(128); >> } >> >> pthread_setspecific(dying, &dying); >> } > > Yeah, that seems sane; my biggest worry was that it would create > headaches for Windows folks, who would have to emulate pthread_key. But > it seems like we already added support in 9ba604a. pthread_key is not a problem, but pthread_once is. It's certainly solvable, but do we really have to? -- Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 7:18 ` Johannes Sixt @ 2013-04-16 13:01 ` Jeff King 2013-04-16 14:13 ` Johannes Sixt 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2013-04-16 13:01 UTC (permalink / raw) To: Johannes Sixt; +Cc: Brandon Casey, git On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote: > > Yeah, that seems sane; my biggest worry was that it would create > > headaches for Windows folks, who would have to emulate pthread_key. But > > it seems like we already added support in 9ba604a. > > pthread_key is not a problem, but pthread_once is. It's certainly > solvable, but do we really have to? I'm not clear on what you are suggesting. That we protect only the main thread from recursion, or that we drop the check entirely? Or that we implement thread-local storage for this case without using pthread_once? -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] usage: refactor die-recursion checks 2013-04-16 13:01 ` Jeff King @ 2013-04-16 14:13 ` Johannes Sixt 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King 0 siblings, 1 reply; 23+ messages in thread From: Johannes Sixt @ 2013-04-16 14:13 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Casey, git Am 4/16/2013 15:01, schrieb Jeff King: > On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote: > >>> Yeah, that seems sane; my biggest worry was that it would create >>> headaches for Windows folks, who would have to emulate pthread_key. But >>> it seems like we already added support in 9ba604a. >> >> pthread_key is not a problem, but pthread_once is. It's certainly >> solvable, but do we really have to? > > I'm not clear on what you are suggesting. That we protect only the main > thread from recursion, or that we drop the check entirely? Or that we > implement thread-local storage for this case without using pthread_once? Anything(*) that does not require pthread_once. A pthread_once implementation on Windows would be tricky and voluminous and and on top of it very likely to be done differently for gcc and MSVC. I don't like to go there if we can avoid it. (*) That includes doing nothing, but does not include ripping out the recursion check, as it protects us from crashes. -- Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message 2013-04-16 14:13 ` Johannes Sixt @ 2013-04-16 19:44 ` Jeff King 2013-04-16 19:46 ` [PATCH v2 1/2] usage: allow pluggable die-recursion checks Jeff King ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Jeff King @ 2013-04-16 19:44 UTC (permalink / raw) To: Johannes Sixt; +Cc: Brandon Casey, git On Tue, Apr 16, 2013 at 04:13:56PM +0200, Johannes Sixt wrote: > > I'm not clear on what you are suggesting. That we protect only the main > > thread from recursion, or that we drop the check entirely? Or that we > > implement thread-local storage for this case without using pthread_once? > > Anything(*) that does not require pthread_once. A pthread_once > implementation on Windows would be tricky and voluminous and and on top of > it very likely to be done differently for gcc and MSVC. I don't like to go > there if we can avoid it. We don't need to use pthread_once, as we can just do the initialization of the thread-local storage along with starting the first thread (where we already do similar initialization). Patch series to follow: [1/2]: usage: allow pluggable die-recursion checks [2/2]: run-command: use thread-aware die_is_recursing routine > (*) That includes doing nothing, but does not include ripping out the > recursion check, as it protects us from crashes. I don't think doing nothing is a good idea. The recursion-detection is triggering erroneously, blocking real error messages and replacing them with the scary red-herring "recursion detected in die handler". The absolute simplest thing I think we could do is basically: diff --git a/run-command.c b/run-command.c index 765c2ce..3b0ad44 100644 --- a/run-command.c +++ b/run-command.c @@ -599,11 +599,14 @@ static NORETURN void die_async(const char *err, va_list params) return (void *)ret; } +extern int dying; + static NORETURN void die_async(const char *err, va_list params) { vreportf("fatal: ", err, params); if (!pthread_equal(main_thread, pthread_self())) { + dying = 0; /* undo counter */ struct async *async = pthread_getspecific(async_key); if (async->proc_in >= 0) close(async->proc_in); diff --git a/usage.c b/usage.c index 40b3de5..cf8a968 100644 --- a/usage.c +++ b/usage.c @@ -6,7 +6,7 @@ #include "git-compat-util.h" #include "cache.h" -static int dying; +int dying; void vreportf(const char *prefix, const char *err, va_list params) { Obviously it would help to wrap it in a "clear_die_counter()" function, but it would still suffer from the problem that there is no synchronization. In the moment between incrementing and resetting the dying counter, another thread (including the main program) could check it. In practice, this does not happen in the current code, because we do not start many async threads (and we only die in the main thread once the async thread dies). But it seems unnecessarily flaky and prone to future problems. It would also be possible to use mutexes to make it work reliably, but I'd be very concerned about increasing the complexity of the die code path. We would never want a hung thread to prevent the main program from successfully exiting, for example. So I think the right solution is just a per-thread counter. -Peff ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 1/2] usage: allow pluggable die-recursion checks 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King @ 2013-04-16 19:46 ` Jeff King 2013-04-16 19:50 ` [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2013-04-16 19:46 UTC (permalink / raw) To: Johannes Sixt; +Cc: Brandon Casey, git When any git code calls die or die_errno, we use a counter to detect recursion into the die functions from any of the helper functions. However, such a simple counter is not good enough for threaded programs, which may call die from a sub-thread, killing only the sub-thread (but incrementing the counter for everyone). Rather than try to deal with threads ourselves here, let's just allow callers to plug in their own recursion-detection function. This is similar to how we handle the die routine (the caller plugs in a die routine which may kill only the sub-thread). Signed-off-by: Jeff King <peff@peff.net> --- git-compat-util.h | 1 + usage.c | 20 ++++++++++++++------ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index cde442f..e955bb5 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -331,6 +331,7 @@ extern void set_error_routine(void (*routine)(const char *err, va_list params)); extern void set_die_routine(NORETURN_PTR void (*routine)(const char *err, va_list params)); extern void set_error_routine(void (*routine)(const char *err, va_list params)); +extern void set_die_is_recursing_routine(int (*routine)(void)); extern int prefixcmp(const char *str, const char *prefix); extern int suffixcmp(const char *str, const char *suffix); diff --git a/usage.c b/usage.c index 40b3de5..ed14645 100644 --- a/usage.c +++ b/usage.c @@ -6,8 +6,6 @@ #include "git-compat-util.h" #include "cache.h" -static int dying; - void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; @@ -49,12 +47,19 @@ static void (*warn_routine)(const char *err, va_list params) = warn_builtin; vreportf("warning: ", warn, params); } +static int die_is_recursing_builtin(void) +{ + static int dying; + return dying++; +} + /* If we are in a dlopen()ed .so write to a global variable would segfault * (ugh), so keep things static. */ static NORETURN_PTR void (*usage_routine)(const char *err, va_list params) = usage_builtin; static NORETURN_PTR void (*die_routine)(const char *err, va_list params) = die_builtin; static void (*error_routine)(const char *err, va_list params) = error_builtin; static void (*warn_routine)(const char *err, va_list params) = warn_builtin; +static int (*die_is_recursing)(void) = die_is_recursing_builtin; void set_die_routine(NORETURN_PTR void (*routine)(const char *err, va_list params)) { @@ -66,6 +71,11 @@ void set_error_routine(void (*routine)(const char *err, va_list params)) error_routine = routine; } +void set_die_is_recursing_routine(int (*routine)(void)) +{ + die_is_recursing = routine; +} + void NORETURN usagef(const char *err, ...) { va_list params; @@ -84,11 +94,10 @@ void NORETURN die(const char *err, ...) { va_list params; - if (dying) { + if (die_is_recursing()) { fputs("fatal: recursion detected in die handler\n", stderr); exit(128); } - dying = 1; va_start(params, err); die_routine(err, params); @@ -102,12 +111,11 @@ void NORETURN die_errno(const char *fmt, ...) char str_error[256], *err; int i, j; - if (dying) { + if (die_is_recursing()) { fputs("fatal: recursion detected in die_errno handler\n", stderr); exit(128); } - dying = 1; err = strerror(errno); for (i = j = 0; err[i] && j < sizeof(str_error) - 1; ) { -- 1.8.2.8.g44e4c28 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-16 19:46 ` [PATCH v2 1/2] usage: allow pluggable die-recursion checks Jeff King @ 2013-04-16 19:50 ` Jeff King 2013-04-16 22:09 ` Junio C Hamano 2013-04-17 0:49 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jonathan Nieder 2013-04-23 21:27 ` Erik Faye-Lund 3 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2013-04-16 19:50 UTC (permalink / raw) To: Johannes Sixt; +Cc: Brandon Casey, git If we die from an async thread, we do not actually exit the program, but just kill the thread. This confuses the static counter in usage.c's default die_is_recursing function; it updates the counter once for the thread death, and then when the main program calls die() itself, it erroneously thinks we are recursing. The end result is that we print "recursion detected in die handler" instead of the real error in such a case (the easiest way to trigger this is having a remote connection hang up while running a sideband demultiplexer). This patch solves it by using a per-thread counter when the async_die function is installed; we detect recursion in each thread (including the main one), but they do not step on each other's toes. Other threaded code does not need to worry about this, as they do not install specialized die handlers; they just let a die() from a sub-thread take down the whole program. Since we are overriding the default recursion-check function, there is an interesting corner case that is not a problem, but bears some explanation. Imagine the main thread calls die(), and then in the die_routine starts an async call. We will switch to using thread-local storage, which starts at 0, for the main thread's counter, even though the original counter was actually at 1. That's OK, though, for two reasons: 1. It would miss only the first level of recursion, and would still find recursive failures inside the async helper. 2. We do not currently and are not likely to start doing anything as heavyweight as starting an async routine from within a die routine or helper function. Signed-off-by: Jeff King <peff@peff.net> --- run-command.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/run-command.c b/run-command.c index 765c2ce..1b32a12 100644 --- a/run-command.c +++ b/run-command.c @@ -588,6 +588,7 @@ static pthread_key_t async_key; static pthread_t main_thread; static int main_thread_set; static pthread_key_t async_key; +static pthread_key_t async_die_counter; static void *run_thread(void *data) { @@ -614,6 +615,14 @@ static NORETURN void die_async(const char *err, va_list params) exit(128); } + +static int async_die_is_recursing(void) +{ + void *ret = pthread_getspecific(async_die_counter); + pthread_setspecific(async_die_counter, (void *)1); + return ret != NULL; +} + #endif int start_async(struct async *async) @@ -695,7 +704,9 @@ int start_async(struct async *async) main_thread_set = 1; main_thread = pthread_self(); pthread_key_create(&async_key, NULL); + pthread_key_create(&async_die_counter, NULL); set_die_routine(die_async); + set_die_is_recursing_routine(async_die_is_recursing); } if (proc_in >= 0) -- 1.8.2.8.g44e4c28 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine 2013-04-16 19:50 ` [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine Jeff King @ 2013-04-16 22:09 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2013-04-16 22:09 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Brandon Casey, git Jeff King <peff@peff.net> writes: > diff --git a/run-command.c b/run-command.c > index 765c2ce..1b32a12 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -588,6 +588,7 @@ static pthread_key_t async_key; > static pthread_t main_thread; > static int main_thread_set; > static pthread_key_t async_key; > +static pthread_key_t async_die_counter; > > static void *run_thread(void *data) > { > @@ -614,6 +615,14 @@ static NORETURN void die_async(const char *err, va_list params) > > exit(128); > } > + > +static int async_die_is_recursing(void) > +{ > + void *ret = pthread_getspecific(async_die_counter); > + pthread_setspecific(async_die_counter, (void *)1); > + return ret != NULL; > +} > + > #endif > > int start_async(struct async *async) > @@ -695,7 +704,9 @@ int start_async(struct async *async) > main_thread_set = 1; > main_thread = pthread_self(); > pthread_key_create(&async_key, NULL); > + pthread_key_create(&async_die_counter, NULL); > set_die_routine(die_async); > + set_die_is_recursing_routine(async_die_is_recursing); > } > > if (proc_in >= 0) Will queue; thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-16 19:46 ` [PATCH v2 1/2] usage: allow pluggable die-recursion checks Jeff King 2013-04-16 19:50 ` [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine Jeff King @ 2013-04-17 0:49 ` Jonathan Nieder 2013-04-17 1:37 ` Jeff King 2013-04-23 21:27 ` Erik Faye-Lund 3 siblings, 1 reply; 23+ messages in thread From: Jonathan Nieder @ 2013-04-17 0:49 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Brandon Casey, git Jeff King wrote: > [1/2]: usage: allow pluggable die-recursion checks > [2/2]: run-command: use thread-aware die_is_recursing routine Lovely. This doesn't solve the analagous problem for grep, index-pack, pack-objects, preload-index, or bidirectional_transfer_loop, but it doesn't make them worse and even should make them easier to fix later, so Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message 2013-04-17 0:49 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jonathan Nieder @ 2013-04-17 1:37 ` Jeff King 0 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2013-04-17 1:37 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Johannes Sixt, Brandon Casey, git On Tue, Apr 16, 2013 at 05:49:41PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > [1/2]: usage: allow pluggable die-recursion checks > > [2/2]: run-command: use thread-aware die_is_recursing routine > > Lovely. This doesn't solve the analagous problem for grep, > index-pack, pack-objects, preload-index, or bidirectional_transfer_loop, > but it doesn't make them worse and even should make them easier to fix > later, so > > Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> The problem does exist in those other programs, but is much less likely to be triggered. There it would be a race between two threads calling die() at the same time, because as soon as one finishes, it takes down the whole program. What made this one so easy to trigger was the installation of die_async. So it's probably not that big a deal to leave the race in those other places for the time being. Any refactoring of the "dying" flag should accompany an overall strategy for managing the clean death of sub-threads (which would entail a custom die routine, at which point the recursion-check should just fall out naturally from whatever approach they choose). Thanks for reviewing. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King ` (2 preceding siblings ...) 2013-04-17 0:49 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jonathan Nieder @ 2013-04-23 21:27 ` Erik Faye-Lund 3 siblings, 0 replies; 23+ messages in thread From: Erik Faye-Lund @ 2013-04-23 21:27 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Brandon Casey, git On Tue, Apr 16, 2013 at 9:44 PM, Jeff King <peff@peff.net> wrote: > On Tue, Apr 16, 2013 at 04:13:56PM +0200, Johannes Sixt wrote: > >> > I'm not clear on what you are suggesting. That we protect only the main >> > thread from recursion, or that we drop the check entirely? Or that we >> > implement thread-local storage for this case without using pthread_once? >> >> Anything(*) that does not require pthread_once. A pthread_once >> implementation on Windows would be tricky and voluminous and and on top of >> it very likely to be done differently for gcc and MSVC. I don't like to go >> there if we can avoid it. This seems to have been settled as a "we don't need to go there" already, but just in case; here's an implementation of PTHREAD_MUTEX_INITIALIZER using double-checked locks, and it would probably be repurposable to do pthread_once if needed: https://github.com/kusma/git/commit/c2d7190ed64652c3ac8ea1f2800ff8e7d0b6c01e Yes, it's not pretty, it's error-prone; the best thing would be not to do it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/3] run-command: factor out running_main_thread function 2013-04-15 23:06 [PATCH 0/3] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King @ 2013-04-15 23:08 ` Jeff King 2013-04-16 1:45 ` Jonathan Nieder 2013-04-15 23:09 ` [PATCH 3/3] usage: do not check die recursion outside main thread Jeff King 2 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2013-04-15 23:08 UTC (permalink / raw) To: git; +Cc: Brandon Casey When the async subsystem is used, we may spawn off sub-threads (if the platform supports it), and consider the original thread to be the "main" thread of execution. We use this information in a custom die_routine to decide whether to call pthread_exit or a regular full-process exit. Let's pull this decision out into its own function so that other parts of the system can use it to make decisions. Signed-off-by: Jeff King <peff@peff.net> --- run-command.c | 15 ++++++++++++++- run-command.h | 10 ++++++++++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/run-command.c b/run-command.c index 765c2ce..69ca052 100644 --- a/run-command.c +++ b/run-command.c @@ -603,7 +603,7 @@ static NORETURN void die_async(const char *err, va_list params) { vreportf("fatal: ", err, params); - if (!pthread_equal(main_thread, pthread_self())) { + if (!running_main_thread()) { struct async *async = pthread_getspecific(async_key); if (async->proc_in >= 0) close(async->proc_in); @@ -614,6 +614,19 @@ static NORETURN void die_async(const char *err, va_list params) exit(128); } + +int running_main_thread(void) +{ + return pthread_equal(main_thread, pthread_self()); +} + +#else /* NO_PTHREADS defined */ + +int running_main_thread(void) +{ + return 1; +} + #endif int start_async(struct async *async) diff --git a/run-command.h b/run-command.h index 221ce33..981dd10 100644 --- a/run-command.h +++ b/run-command.h @@ -92,4 +92,14 @@ int finish_async(struct async *async); int start_async(struct async *async); int finish_async(struct async *async); +/* + * If the platform supports threads, returns 1 if we are running the "main" + * thread that spawned other async threads, and zero if we are executing one + * of the async threads. + * + * If the platform does not support threads, returns 1 (we are always in the + * main thread then). + */ +int running_main_thread(void); + #endif -- 1.8.2.8.g44e4c28 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] run-command: factor out running_main_thread function 2013-04-15 23:08 ` [PATCH 2/3] run-command: factor out running_main_thread function Jeff King @ 2013-04-16 1:45 ` Jonathan Nieder 2013-04-16 2:53 ` Jeff King 0 siblings, 1 reply; 23+ messages in thread From: Jonathan Nieder @ 2013-04-16 1:45 UTC (permalink / raw) To: Jeff King; +Cc: git, Brandon Casey Jeff King wrote: > --- a/run-command.c > +++ b/run-command.c > @@ -603,7 +603,7 @@ static NORETURN void die_async(const char *err, va_list params) > { > vreportf("fatal: ", err, params); > > - if (!pthread_equal(main_thread, pthread_self())) { > + if (!running_main_thread()) { > struct async *async = pthread_getspecific(async_key); > if (async->proc_in >= 0) > close(async->proc_in); > @@ -614,6 +614,19 @@ static NORETURN void die_async(const char *err, va_list params) > > exit(128); > } > + > +int running_main_thread(void) > +{ > + return pthread_equal(main_thread, pthread_self()); > +} Would it make sense to do something like return !main_thread_set || pthread_equal(...); in case someone tries to call this before the first start_async call? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] run-command: factor out running_main_thread function 2013-04-16 1:45 ` Jonathan Nieder @ 2013-04-16 2:53 ` Jeff King 0 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2013-04-16 2:53 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Brandon Casey On Mon, Apr 15, 2013 at 06:45:04PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > --- a/run-command.c > > +++ b/run-command.c > > @@ -603,7 +603,7 @@ static NORETURN void die_async(const char *err, va_list params) > > { > > vreportf("fatal: ", err, params); > > > > - if (!pthread_equal(main_thread, pthread_self())) { > > + if (!running_main_thread()) { > > struct async *async = pthread_getspecific(async_key); > > if (async->proc_in >= 0) > > close(async->proc_in); > > @@ -614,6 +614,19 @@ static NORETURN void die_async(const char *err, va_list params) > > > > exit(128); > > } > > + > > +int running_main_thread(void) > > +{ > > + return pthread_equal(main_thread, pthread_self()); > > +} > > Would it make sense to do something like > > return !main_thread_set || pthread_equal(...); > > in case someone tries to call this before the first start_async call? I considered that, but assumed that pthread_equal would always return false against the zeroed out static main_thread. Mostly because we appeared not to be bothering with such a check already. But actually, in the existing code, it is only checked from die_async, which is only put into place when we _do_ start a thread, so we would never hit this code path without main_thread being set. So yes, a check probably would make sense. However, I think I'm convinced that we should drop this in favor of using thread-local storage for the "dying" counter. So this patch can just go away. Thanks for reviewing this version, though. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/3] usage: do not check die recursion outside main thread 2013-04-15 23:06 [PATCH 0/3] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King 2013-04-15 23:08 ` [PATCH 2/3] run-command: factor out running_main_thread function Jeff King @ 2013-04-15 23:09 ` Jeff King 2 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2013-04-15 23:09 UTC (permalink / raw) To: git; +Cc: Brandon Casey When we have started threads with the async subsystem, both the thread and the main program may call die(). The async subsystem sets up a special die routine to call pthread_exit rather than regular exit, but the recursion check in die() happens outside of this routine, and does not know that it's OK for two dies to run. As a result, we may print "recursion in die handler" when an async thread dies, even though there is no recursion. This can be triggered, for example, by upload-pack errors; the receiving side will die both in the async sideband demultiplexer and in the main program. This patch teaches the recursion check to use the same running_main_thread() check that the async code bases its exit decision on. That means we detect only recursion within the main program itself, and assume that async handlers correctly die (which they do). Signed-off-by: Jeff King <peff@peff.net> --- This is the simplest solution. You could actually have a thread-local dying counter, which would detect die recursion within an async thread. But I did not want to get into designing a lowest-common-denominator of thread-local storage that would work on all of our platforms. usage.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/usage.c b/usage.c index c6b7ac5..33405c8 100644 --- a/usage.c +++ b/usage.c @@ -5,6 +5,7 @@ */ #include "git-compat-util.h" #include "cache.h" +#include "run-command.h" void vreportf(const char *prefix, const char *err, va_list params) { @@ -82,6 +83,9 @@ static void check_die_recursion(const char *fmt, va_list ap) { static int dying; + if (!running_main_thread()) + return; + if (!dying++) return; -- 1.8.2.8.g44e4c28 ^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-04-23 21:28 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-04-15 23:06 [PATCH 0/3] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-15 23:08 ` [PATCH 1/3] usage: refactor die-recursion checks Jeff King 2013-04-15 23:45 ` Eric Sunshine 2013-04-15 23:47 ` Jeff King 2013-04-16 0:11 ` Brandon Casey 2013-04-16 0:42 ` Jeff King 2013-04-16 1:41 ` Jonathan Nieder 2013-04-16 2:34 ` Brandon Casey 2013-04-16 2:50 ` Jeff King 2013-04-16 7:18 ` Johannes Sixt 2013-04-16 13:01 ` Jeff King 2013-04-16 14:13 ` Johannes Sixt 2013-04-16 19:44 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jeff King 2013-04-16 19:46 ` [PATCH v2 1/2] usage: allow pluggable die-recursion checks Jeff King 2013-04-16 19:50 ` [PATCH v2 2/2] run-command: use thread-aware die_is_recursing routine Jeff King 2013-04-16 22:09 ` Junio C Hamano 2013-04-17 0:49 ` [PATCH v2 0/2] avoid bogus "recursion detected in die handler" message Jonathan Nieder 2013-04-17 1:37 ` Jeff King 2013-04-23 21:27 ` Erik Faye-Lund 2013-04-15 23:08 ` [PATCH 2/3] run-command: factor out running_main_thread function Jeff King 2013-04-16 1:45 ` Jonathan Nieder 2013-04-16 2:53 ` Jeff King 2013-04-15 23:09 ` [PATCH 3/3] usage: do not check die recursion outside main thread Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).