Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes
@ 2020-01-11  1:42 Stephen Oberholtzer
  2020-01-16  0:39 ` Christian Couder
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Oberholtzer @ 2020-01-11  1:42 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

After considering the responses I got for my --invert-status proposal, I
went back to the drawing board and considered how it might interact with
another feature I was going to propose.  This is the result.

To avoid repeating myself, I'm going to start with an example of what I
imagine putting into Documentation/git-bisect.txt, to document this
feature.  After the second snip line, you can find my justification for
this feature, as well as some preemptive answers to questions I expect
people to have.

>8----------------------------------------------------------------------8<

{at top}

git bisect run [--(old|good|<term-old>)-status=<list>]
[--(new|bad|<term-new>)-status=<list>]
[--skip-status=<list>] [--] <cmd>...

{below the paragraph beginning with "The special exit code 125")

Custom exit status mappings
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The previous paragraphs describe how the exit status of <cmd> is mapped
to one of four actions:

 * `old`/`good`, which defaults to exit status 0.

 * `new`/`bad`, which defaults to exit status 1-127 except 125.

 * `skip`, which defaults to exit status 125.

 * Any status not mapped to one of the first three is treated as a fatal
   error, causing `git bisect run` to immediately terminate with a
   nonzero exit status.

For more precise control over bisect run's interpretation of the command's
exit status, you can use one or more `--<term>-status=<list>` options:

<term>::
  One of the following:
  * `skip` (`--skip-status`)
  * `old` or the corresponding term specified in the bisect start call
     (`--old-status` or `--good-status`)
  * `new` or the corresponding term specified in the bisect start call
     (`--new-status` or `--bad-status`)
<list>::
A comma-separated list of values (e.g. 1) or ranges (e.g. 1-10).

(It should be noted that specifying --old-status is unlikely to be useful
without a corresponding --new-status option.)

This feature can make a few things much easier:

 * If you want to bisect a "fix", you can use (as an example)

---------------
$ git bisect run --old-status=1-127 --new-status=0 my_script arguments
---------------

 * If the test involves some fundamentally-unreliable aspects such as I/O
  (for example, a network connection could be disrupted, or a file write
   could fail due to disk space issues), you can specify something like
   --new-status=99, and then any exit status other than 0 or 99 will be
   treated as a fatal error:

   -----------------
   $ git bisect run --new-status=99 sh -c 'setup-test && (run-test || exit 99)'
   -----------------

   If setup-test exits with a nonzero status code (except 99), then the
   run will abort with an error, rather than assume that the current commit
   contains the issue being bisected.


The default status code lists are as follows:
  --skip-status=125
  --old-status=0
  --new-status=1-127

The priority of each classification is as follows:

 * If the status code is on the --skip-status list, the action is "skip".
   Note that this takes precedence over the --old-status and --new-status
   lists; doing so simplifies the specification of --new-status.

 * If the status is on the --old-status list, *and* not on the --new-status
   list, the action is "old" (or "good").

 * If the status is on the --new-status list, *and* not on the --old-status
   list, the action is "new" (or "bad").

 * Otherwise, the command is treated as having experienced a fatal error,
   and run will terminate with a nonzero exit status.

In the last case, the bisection will be left where it was; you may correct
the error and perform another bisect run, or you may finish the bisection
with manual good/bad commands.

>8----------------------------------------------------------------------8<

The motivation
==============

First, an excerpt from the current documentation for 'git bisect':

> 126 and 127 are used by POSIX shells to signal specific error status
> (127 is for command not found, 126 is for command found but not executable
> - these details do not matter, as they are normal errors in the script,
> as far as bisect run is concerned

This shows a fundamental disconnect between bisect run's view of the
world and reality.  I submit that, in reality, status codes 126 and 127 are
overwhelmingly likely indicators that the script did not work correctly,
in which case the run should be halted so the user may correct the issue.
However, 126 and 127 are mapped to "git bisect bad" -- as in, "this commit
definitely contains the bug that I am searching for".


Let's consider the consequences of an inappropriate status code:

- 'good':  The bisect will incorrectly select the wrong commit (specifically,
  a later commit than the one that actually introduced the issue.)

  This will also indirectly result in more trials than would otherwise be
  necessary to determine the result, because the bisection will have to be
  restarted at the point where the mistake occurred.  (In the worst possible
  case, where a mistake occurs on the first step, the bisection will take
  at least twice as long.)

- 'bad': The bisect will incorrectly select the wrong commit (specifically,
  an earlier commit than the one that actually introduced the issue.)

  Like 'good', this will also indirectly result in extra trials.

- 'skip': The bisect will unnecessarily test at least one extra commit for
  each false 'skip' result.  In the worst case, it may not be able to narrow
  down the issue to a single commit.

- Abort: The bisection stops until the user restarts it.  No extra commits
  are tested, though if the user isn't paying attention, the wallclock
  time will take longer than usual.

In particular, the behavior of a false match on 'good' or 'bad' is *at best*
extra time needed to do the bisection.  At worst -- if the user is not
familiar with the code in question -- a great deal of confusion and time-
wasting can result as the user investigates the wrong commit.  As such, it
is critical that you have no false 'good' or 'bad' results.


If you're using a shell script to run your test, a false 'good' result can
easily be prevented by putting 'set -e' at the top of the script.

Avoiding a false 'bad' result is far more difficult, especially if the test
is complex and you're not familiar with shell scripting in general.  (The
man page for bash, on my machine, is 3725 lines long, and does not lend
itself well to searching.)  For the task that set me down this path, it
took me about six iterations to create a robust script that would return
the exit code that git wanted,  and _it still didn't work right in all
cases_.


What would have been a great deal simpler is if I could have just picked
an exit code that none of the other commands in my script would ever
return (such as 99), and told git to treat any code other than 0 or 99 as
a fatal error.  Which is essentially what I ended up doing (or trying to),
but unfamiliar with shell scripting as I was, I had several issues.

If I can make it easy for someone to use bisect run without them having
to learn any otherwise-unnecessary shell scripting constructs, I would
consider that a win.


Pre-emptive Q&A
===============

Q: Why allow ranges?  Why not restrict it to single status codes?
A: It allows for a simpler implementation, because the default 'bad' status
   list is a already a range (1-124,126-127).  By supporting ranges,
   we need only one single code path:

   exit_bad=1-124,126-127
   # override exit_bad if --bad-status is specified
   # check exit code against exit_bad

Q: Why treat any exit code that's on both the 'old' and 'new' lists as a
   fatal error?
A: Because it's obviously a mistake, and we don't know the correct way
   to interpret it.  By halting the run and returning an error, we can
   return control to the user, who can fix it by fixing the overlapping
   lists and re-running the command.

Q: Why not an error when a code is on both 'old' and 'skip', or 'new' and
   'skip'?
A: Two reasons:
   1. As I detailed earlier, a false 'skip' is less problematic than a
      false 'new'/'bad'.
   2. By prioritizing 'skip' over 'new', the default behavior for 'new'
      can be written as '1-127' instead of '1-124,126-127'.


-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes
  2020-01-11  1:42 [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes Stephen Oberholtzer
@ 2020-01-16  0:39 ` Christian Couder
  2020-01-16  2:40   ` Stephen Oberholtzer
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Couder @ 2020-01-16  0:39 UTC (permalink / raw)
  To: Stephen Oberholtzer; +Cc: git, Christian Couder

On Sat, Jan 11, 2020 at 2:43 AM Stephen Oberholtzer <stevie@qrpff.net> wrote:
>
> After considering the responses I got for my --invert-status proposal, I
> went back to the drawing board and considered how it might interact with
> another feature I was going to propose.  This is the result.
>
> To avoid repeating myself, I'm going to start with an example of what I
> imagine putting into Documentation/git-bisect.txt, to document this
> feature.  After the second snip line, you can find my justification for
> this feature, as well as some preemptive answers to questions I expect
> people to have.

Ok.

> >8----------------------------------------------------------------------8<
>
> {at top}
>
> git bisect run [--(old|good|<term-old>)-status=<list>]
> [--(new|bad|<term-new>)-status=<list>]
> [--skip-status=<list>] [--] <cmd>...

That seems interesting.

In many cases I have found that something like the following would do:

git bisect run sh -c "make || exit 125; setup_script || exit 255;
run_cmd >log 2>&1 || exit 255; (! grep foo log)"

but I understand that some people may prefer options like the one you suggest.

> {below the paragraph beginning with "The special exit code 125")
>
> Custom exit status mappings
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The previous paragraphs describe how the exit status of <cmd> is mapped
> to one of four actions:
>
>  * `old`/`good`, which defaults to exit status 0.
>
>  * `new`/`bad`, which defaults to exit status 1-127 except 125.
>
>  * `skip`, which defaults to exit status 125.
>
>  * Any status not mapped to one of the first three is treated as a fatal
>    error, causing `git bisect run` to immediately terminate with a
>    nonzero exit status.
>
> For more precise control over bisect run's interpretation of the command's
> exit status, you can use one or more `--<term>-status=<list>` options:
>
> <term>::
>   One of the following:
>   * `skip` (`--skip-status`)
>   * `old` or the corresponding term specified in the bisect start call
>      (`--old-status` or `--good-status`)
>   * `new` or the corresponding term specified in the bisect start call
>      (`--new-status` or `--bad-status`)
> <list>::
> A comma-separated list of values (e.g. 1) or ranges (e.g. 1-10).
>
> (It should be noted that specifying --old-status is unlikely to be useful
> without a corresponding --new-status option.)

This is not clear from what has been documented above.

> This feature can make a few things much easier:
>
>  * If you want to bisect a "fix", you can use (as an example)
>
> ---------------
> $ git bisect run --old-status=1-127 --new-status=0 my_script arguments
> ---------------

Yeah, that seems useful. It is not clear though if 125 then means
"old" or still "skip".

>  * If the test involves some fundamentally-unreliable aspects such as I/O
>   (for example, a network connection could be disrupted, or a file write
>    could fail due to disk space issues), you can specify something like
>    --new-status=99, and then any exit status other than 0 or 99 will be
>    treated as a fatal error:
>
>    -----------------
>    $ git bisect run --new-status=99 sh -c 'setup-test && (run-test || exit 99)'
>    -----------------
>
>    If setup-test exits with a nonzero status code (except 99), then the
>    run will abort with an error, rather than assume that the current commit
>    contains the issue being bisected.

Yeah but `sh -c 'setup-test || exit 255; run-test'` should work fine
in this case too.

> The default status code lists are as follows:
>   --skip-status=125
>   --old-status=0
>   --new-status=1-127
>
> The priority of each classification is as follows:
>
>  * If the status code is on the --skip-status list, the action is "skip".
>    Note that this takes precedence over the --old-status and --new-status
>    lists; doing so simplifies the specification of --new-status.
>
>  * If the status is on the --old-status list, *and* not on the --new-status
>    list, the action is "old" (or "good").
>
>  * If the status is on the --new-status list, *and* not on the --old-status
>    list, the action is "new" (or "bad").

The above doesn't tell what happens if a status is both on the
--old-status and on the --new-status lists...

>  * Otherwise, the command is treated as having experienced a fatal error,
>    and run will terminate with a nonzero exit status.

...so it seems that it is an error only when such a status code is
actually received by `git bisect run`.

Some people could argue though that `--new-status=0 --old-status=0`
should be detected as an error as soon as `git bisect run` is
launched.

> In the last case, the bisection will be left where it was; you may correct
> the error and perform another bisect run, or you may finish the bisection
> with manual good/bad commands.

It would be nice if the above clarification about priority was before
the examples you gave.

> >8----------------------------------------------------------------------8<
>
> The motivation
> ==============
>
> First, an excerpt from the current documentation for 'git bisect':
>
> > 126 and 127 are used by POSIX shells to signal specific error status
> > (127 is for command not found, 126 is for command found but not executable
> > - these details do not matter, as they are normal errors in the script,
> > as far as bisect run is concerned
>
> This shows a fundamental disconnect between bisect run's view of the
> world and reality.  I submit that, in reality, status codes 126 and 127 are
> overwhelmingly likely indicators that the script did not work correctly,
> in which case the run should be halted so the user may correct the issue.

Well when bisect run was implemented the main concern was to make it
abort immediately in case of a signal and to consider common non zero
status code as "bad". A signal would result in an status code greater
than 128, so it was decided to split the [1-255] range in two more or
less equal parts, so [1-128] would be "bad" and [129-255] would be
"abort". Then "split" was added and given special code 125.

> However, 126 and 127 are mapped to "git bisect bad" -- as in, "this commit
> definitely contains the bug that I am searching for".

Yeah, it might have been better to use [126-255] for "abort".

I think though that 126 and 127 should still be rare. And git bisect
might be used to debug software written in shell (Git itself had many
commands written in shell a long time ago) where 126 or 127 could come
from errors in the software itself not from the testing scripts.

> Let's consider the consequences of an inappropriate status code:
>
> - 'good':  The bisect will incorrectly select the wrong commit (specifically,
>   a later commit than the one that actually introduced the issue.)
>
>   This will also indirectly result in more trials than would otherwise be
>   necessary to determine the result, because the bisection will have to be
>   restarted at the point where the mistake occurred.  (In the worst possible
>   case, where a mistake occurs on the first step, the bisection will take
>   at least twice as long.)
>
> - 'bad': The bisect will incorrectly select the wrong commit (specifically,
>   an earlier commit than the one that actually introduced the issue.)
>
>   Like 'good', this will also indirectly result in extra trials.
>
> - 'skip': The bisect will unnecessarily test at least one extra commit for
>   each false 'skip' result.  In the worst case, it may not be able to narrow
>   down the issue to a single commit.
>
> - Abort: The bisection stops until the user restarts it.  No extra commits
>   are tested, though if the user isn't paying attention, the wallclock
>   time will take longer than usual.
>
> In particular, the behavior of a false match on 'good' or 'bad' is *at best*
> extra time needed to do the bisection.  At worst -- if the user is not
> familiar with the code in question -- a great deal of confusion and time-
> wasting can result as the user investigates the wrong commit.  As such, it
> is critical that you have no false 'good' or 'bad' results.

I agree with that.

> If you're using a shell script to run your test, a false 'good' result can
> easily be prevented by putting 'set -e' at the top of the script.

Sometimes it is easier to understand a script when it properly checks
for errors and produces good error messages than when it uses `set
-e`.

> Avoiding a false 'bad' result is far more difficult, especially if the test
> is complex and you're not familiar with shell scripting in general.  (The
> man page for bash, on my machine, is 3725 lines long, and does not lend
> itself well to searching.)  For the task that set me down this path, it
> took me about six iterations to create a robust script that would return
> the exit code that git wanted,  and _it still didn't work right in all
> cases_.

I agree that it is not always easy.

> What would have been a great deal simpler is if I could have just picked
> an exit code that none of the other commands in my script would ever
> return (such as 99), and told git to treat any code other than 0 or 99 as
> a fatal error.  Which is essentially what I ended up doing (or trying to),
> but unfamiliar with shell scripting as I was, I had several issues.

I think there might have been easier ways.

See above the alternative I suggested to `git bisect run
--new-status=99 sh -c 'setup-test && (run-test || exit 99)'`.

> If I can make it easy for someone to use bisect run without them having
> to learn any otherwise-unnecessary shell scripting constructs, I would
> consider that a win.

I agree that in some cases the options you suggest could seem simpler
for some people.

> Pre-emptive Q&A
> ===============

Thanks for this!

Best,
Christian.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes
  2020-01-16  0:39 ` Christian Couder
@ 2020-01-16  2:40   ` Stephen Oberholtzer
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen Oberholtzer @ 2020-01-16  2:40 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Christian Couder

On Wed, Jan 15, 2020 at 7:39 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Jan 11, 2020 at 2:43 AM Stephen Oberholtzer <stevie@qrpff.net> wrote:
> >
> In many cases I have found that something like the following would do:
>
> git bisect run sh -c "make || exit 125; setup_script || exit 255;
> run_cmd >log 2>&1 || exit 255; (! grep foo log)"
>
> but I understand that some people may prefer options like the one you suggest.

Yeah,  A lot of this stuff can be easily done if you're a skilled
shell scripter.  But it's easier to do it _wrong_, especially for
something non-trivial.

> >
> > For more precise control over bisect run's interpretation of the command's
> > exit status, you can use one or more `--<term>-status=<list>` options:
> >
> > <term>::
> >   One of the following:
> >   * `skip` (`--skip-status`)
> >   * `old` or the corresponding term specified in the bisect start call
> >      (`--old-status` or `--good-status`)
> >   * `new` or the corresponding term specified in the bisect start call
> >      (`--new-status` or `--bad-status`)
> > <list>::
> > A comma-separated list of values (e.g. 1) or ranges (e.g. 1-10).
> >
> > (It should be noted that specifying --old-status is unlikely to be useful
> > without a corresponding --new-status option.)
>
> This is not clear from what has been documented above.

I'm not sure what you're referring to here, especially regarding "what
has been documented above".
All of that stuff is still in my documentation notes.

>
> > This feature can make a few things much easier:
> >
> >  * If you want to bisect a "fix", you can use (as an example)
> >
> > ---------------
> > $ git bisect run --old-status=1-127 --new-status=0 my_script arguments
> > ---------------
>
> Yeah, that seems useful. It is not clear though if 125 then means
> "old" or still "skip".

Well, it is clear if you read the entire thing.  But as you pointed
out below, I ought to reorder the text a bit.

>
> >    -----------------
> >    $ git bisect run --new-status=99 sh -c 'setup-test && (run-test || exit 99)'
> >    -----------------
> >
>
> Yeah but `sh -c 'setup-test || exit 255; run-test'` should work fine
> in this case too.

Yep, and it only works because there's a flaw in my example: it
assumes that all nonzero exit codes from run-test are conclusive
results.  But if run-test died from a signal (say, SEGV or INT or an
oom KILL) that overly-simplistic || is going to turn all of them into
99s.  If we were watching for a SEGV or INT or KILL, that's awesome.
But if we were looking for something else, then we've misclassified
the result!

That's one of the major problems with || -- it boils down the
process's exit code to a simple boolean.

I'm going to rewrite my example to say "if run-test exits with code 99
if the defect is present" and just do setup-test && run--test.

>
> The above doesn't tell what happens if a status is both on the
> --old-status and on the --new-status lists...

No, the below does.

>
> >  * Otherwise, the command is treated as having experienced a fatal error,
> >    and run will terminate with a nonzero exit status.
>
> ...so it seems that it is an error only when such a status code is
> actually received by `git bisect run`.
>
> Some people could argue though that `--new-status=0 --old-status=0`
> should be detected as an error as soon as `git bisect run` is
> launched.

There are a few reasons for not raising an error immediately:
- Such a check would not be free.  While the example you gave is
simple enough, things can quickly get complicated.  A generalized
version would have to check every single status from 0 to 255
  (that said, I can see some value in proactively checking 0 and 1
before starting the run, just as a sanity check)
-  If there _is_ an ambiguous exit code, it doesn't actually matter
unless it's actually encountered
- If the user makes such a mistake, the bisection doesn't go off the
rails; it just halts.  A simple 'git bisect good' or 'git bisect bad',
followed by another call to "run" with corrected options, will address
the issue.

>
> It would be nice if the above clarification about priority was before
> the examples you gave.

Yeah, I'll reorder things.

>
> Well when bisect run was implemented the main concern was to make it
> abort immediately in case of a signal and to consider common non zero
> status code as "bad". A signal would result in an status code greater
> than 128, so it was decided to split the [1-255] range in two more or
> less equal parts, so [1-128] would be "bad" and [129-255] would be
> "abort". Then "split" was added and given special code 125.

Oh, yes, I understand the reasoning behind the current behavior; that
doesn't change the fact that trying to work around it can be awkward.

>
> > However, 126 and 127 are mapped to "git bisect bad" -- as in, "this commit
> > definitely contains the bug that I am searching for".
>
> Yeah, it might have been better to use [126-255] for "abort".
>
> I think though that 126 and 127 should still be rare. And git bisect
> might be used to debug software written in shell (Git itself had many
> commands written in shell a long time ago) where 126 or 127 could come
> from errors in the software itself not from the testing scripts.

"rare" doesn't mean "impossible" -- and in fact, the 'command not
found' exit code was tripped on one of my early runs when an error in
my test script left it in the wrong directory.

But you make an excellent point that seems to support my proposal: If
you're bisecting a shell script issue, you probably want to treat
126-127 as signifying "bad" commits, but if you're not, you probably
want 126-127 to abort the run because it means your test itself
malfunctioned.

The little project I was working on that set me down this path took
15-20 minutes per iteration, and was bisecting through over 64K
commits; I believe it ended up testing about 20 revisions (due to
merges).  A single false positive/negative, especially early on, could
have resulted in hours wasted.

> > If you're using a shell script to run your test, a false 'good' result can
> > easily be prevented by putting 'set -e' at the top of the script.
>
> Sometimes it is easier to understand a script when it properly checks
> for errors and produces good error messages than when it uses `set
> -e`.

Perhaps, but not necessarily -- and _definitely_ not if you're not
experienced with shell scripting.
Shell scripting feels incredibly fragile; every bit you add is a
chance to make a mistake.

In contrast, "set -exuo pipefail" makes life really easy: You can see
everything the script does, and you can be confident that as soon as
something goes wrong, your script will stop right away.

> > What would have been a great deal simpler is if I could have just picked
> > an exit code that none of the other commands in my script would ever
> > return (such as 99), and told git to treat any code other than 0 or 99 as
> > a fatal error.  Which is essentially what I ended up doing (or trying to),
> > but unfamiliar with shell scripting as I was, I had several issues.
>
> I think there might have been easier ways.
>
> See above the alternative I suggested to `git bisect run
> --new-status=99 sh -c 'setup-test && (run-test || exit 99)'`.

There's a bootstrapping issue, here: If someone's unfamiliar with
shell scripting, how would they know how to construct such a thing?
If you're going to bet, say, five hours of your life on your
one-liner, how can you be confident that it will work reliably?

> Thanks for this!

Thank you for your feedback!

I've got a first draft of the proof-of-concept patch written already;
I'll add a proactive sanity check for exit codes 0 and 1, since that's
cheap and should cover over 99% of cases, do some cleanups to the
documentation and examples, and then add some test cases.

>
> Best,
> Christian.



-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-11  1:42 [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes Stephen Oberholtzer
2020-01-16  0:39 ` Christian Couder
2020-01-16  2:40   ` Stephen Oberholtzer

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git