[RFC] Proposal: New options for git bisect run to control the interpretation of exit codes

* [RFC] Proposal: New options for git bisect run to control the interpretation of exit codes
@ 2020-01-11  1:42 Stephen Oberholtzer
  2020-01-16  0:39 ` Christian Couder
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Oberholtzer @ 2020-01-11  1:42 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

After considering the responses I got for my --invert-status proposal, I
went back to the drawing board and considered how it might interact with
another feature I was going to propose.  This is the result.

To avoid repeating myself, I'm going to start with an example of what I
imagine putting into Documentation/git-bisect.txt, to document this
feature.  After the second snip line, you can find my justification for
this feature, as well as some preemptive answers to questions I expect
people to have.

>8----------------------------------------------------------------------8<

{at top}

git bisect run [--(old|good|<term-old>)-status=<list>]
[--(new|bad|<term-new>)-status=<list>]
[--skip-status=<list>] [--] <cmd>...

{below the paragraph beginning with "The special exit code 125")

Custom exit status mappings
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The previous paragraphs describe how the exit status of <cmd> is mapped
to one of four actions:

 * `old`/`good`, which defaults to exit status 0.

 * `new`/`bad`, which defaults to exit status 1-127 except 125.

 * `skip`, which defaults to exit status 125.

 * Any status not mapped to one of the first three is treated as a fatal
   error, causing `git bisect run` to immediately terminate with a
   nonzero exit status.

For more precise control over bisect run's interpretation of the command's
exit status, you can use one or more `--<term>-status=<list>` options:

<term>::
  One of the following:
  * `skip` (`--skip-status`)
  * `old` or the corresponding term specified in the bisect start call
     (`--old-status` or `--good-status`)
  * `new` or the corresponding term specified in the bisect start call
     (`--new-status` or `--bad-status`)
<list>::
A comma-separated list of values (e.g. 1) or ranges (e.g. 1-10).

(It should be noted that specifying --old-status is unlikely to be useful
without a corresponding --new-status option.)

This feature can make a few things much easier:

 * If you want to bisect a "fix", you can use (as an example)

---------------
$ git bisect run --old-status=1-127 --new-status=0 my_script arguments
---------------

 * If the test involves some fundamentally-unreliable aspects such as I/O
  (for example, a network connection could be disrupted, or a file write
   could fail due to disk space issues), you can specify something like
   --new-status=99, and then any exit status other than 0 or 99 will be
   treated as a fatal error:

   -----------------
   $ git bisect run --new-status=99 sh -c 'setup-test && (run-test || exit 99)'
   -----------------

   If setup-test exits with a nonzero status code (except 99), then the
   run will abort with an error, rather than assume that the current commit
   contains the issue being bisected.

The default status code lists are as follows:
  --skip-status=125
  --old-status=0
  --new-status=1-127

The priority of each classification is as follows:

 * If the status code is on the --skip-status list, the action is "skip".
   Note that this takes precedence over the --old-status and --new-status
   lists; doing so simplifies the specification of --new-status.

 * If the status is on the --old-status list, *and* not on the --new-status
   list, the action is "old" (or "good").

 * If the status is on the --new-status list, *and* not on the --old-status
   list, the action is "new" (or "bad").

 * Otherwise, the command is treated as having experienced a fatal error,
   and run will terminate with a nonzero exit status.

In the last case, the bisection will be left where it was; you may correct
the error and perform another bisect run, or you may finish the bisection
with manual good/bad commands.

>8----------------------------------------------------------------------8<

The motivation
==============

First, an excerpt from the current documentation for 'git bisect':

> 126 and 127 are used by POSIX shells to signal specific error status
> (127 is for command not found, 126 is for command found but not executable
> - these details do not matter, as they are normal errors in the script,
> as far as bisect run is concerned

This shows a fundamental disconnect between bisect run's view of the
world and reality.  I submit that, in reality, status codes 126 and 127 are
overwhelmingly likely indicators that the script did not work correctly,
in which case the run should be halted so the user may correct the issue.
However, 126 and 127 are mapped to "git bisect bad" -- as in, "this commit
definitely contains the bug that I am searching for".

Let's consider the consequences of an inappropriate status code:

- 'good':  The bisect will incorrectly select the wrong commit (specifically,
  a later commit than the one that actually introduced the issue.)

  This will also indirectly result in more trials than would otherwise be
  necessary to determine the result, because the bisection will have to be
  restarted at the point where the mistake occurred.  (In the worst possible
  case, where a mistake occurs on the first step, the bisection will take
  at least twice as long.)

- 'bad': The bisect will incorrectly select the wrong commit (specifically,
  an earlier commit than the one that actually introduced the issue.)

  Like 'good', this will also indirectly result in extra trials.

- 'skip': The bisect will unnecessarily test at least one extra commit for
  each false 'skip' result.  In the worst case, it may not be able to narrow
  down the issue to a single commit.

- Abort: The bisection stops until the user restarts it.  No extra commits
  are tested, though if the user isn't paying attention, the wallclock
  time will take longer than usual.

In particular, the behavior of a false match on 'good' or 'bad' is *at best*
extra time needed to do the bisection.  At worst -- if the user is not
familiar with the code in question -- a great deal of confusion and time-
wasting can result as the user investigates the wrong commit.  As such, it
is critical that you have no false 'good' or 'bad' results.

If you're using a shell script to run your test, a false 'good' result can
easily be prevented by putting 'set -e' at the top of the script.

Avoiding a false 'bad' result is far more difficult, especially if the test
is complex and you're not familiar with shell scripting in general.  (The
man page for bash, on my machine, is 3725 lines long, and does not lend
itself well to searching.)  For the task that set me down this path, it
took me about six iterations to create a robust script that would return
the exit code that git wanted,  and _it still didn't work right in all
cases_.

What would have been a great deal simpler is if I could have just picked
an exit code that none of the other commands in my script would ever
return (such as 99), and told git to treat any code other than 0 or 99 as
a fatal error.  Which is essentially what I ended up doing (or trying to),
but unfamiliar with shell scripting as I was, I had several issues.

If I can make it easy for someone to use bisect run without them having
to learn any otherwise-unnecessary shell scripting constructs, I would
consider that a win.

Pre-emptive Q&A
===============

Q: Why allow ranges?  Why not restrict it to single status codes?
A: It allows for a simpler implementation, because the default 'bad' status
   list is a already a range (1-124,126-127).  By supporting ranges,
   we need only one single code path:

   exit_bad=1-124,126-127
   # override exit_bad if --bad-status is specified
   # check exit code against exit_bad

Q: Why treat any exit code that's on both the 'old' and 'new' lists as a
   fatal error?
A: Because it's obviously a mistake, and we don't know the correct way
   to interpret it.  By halting the run and returning an error, we can
   return control to the user, who can fix it by fixing the overlapping
   lists and re-running the command.

Q: Why not an error when a code is on both 'old' and 'skip', or 'new' and
   'skip'?
A: Two reasons:
   1. As I detailed earlier, a false 'skip' is less problematic than a
      false 'new'/'bad'.
   2. By prioritizing 'skip' over 'new', the default behavior for 'new'
      can be written as '1-127' instead of '1-124,126-127'.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

^ permalink raw reply	[flat|nested] 4+ messages in thread