git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges
@ 2018-08-04 22:18 Johannes Schindelin via GitGitGadget
  2018-08-04 22:18 ` [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges Johannes Schindelin via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-08-04 22:18 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Junio C Hamano

I am a heavy user of git log -L .... In fact, I use the feature where
multiple ranges can be specified extensively, via a not-exactly-trivial
shell script function that takes the currently staged changes (or if none
are staged, the current unstanged changes) and turns them into the
corresponding commit history:

staged_log () { #
        diff="$(git diff --cached -U1)"
        test -n "$diff" ||
        diff="$(git diff -U1)"
        test -n "$diff" ||
        die "No changes"
        eval "git log $(echo "$diff" |
                sed -ne '/^--- a\//{s/^-* a\/\(.*\)/'\''\1'\''/;x}' -e \
                        '/^@@ -/{s/^@@ -\([^, ]*\),\([^ ]*\).*/-L \1,+\2/;G;s/\n/:/g;p}' |
                        tr '\n' ' ')"
}

This is an extremely useful way to look at the history, especially when
trying to fix up a commit deep in a long branch (or a thicket of branches).

Today, however, this method failed me, by greeting me with an assertion.
When I tried to paper over that assertion by joining line ranges that became
adjacent (or overlapping), it still produced a segmentation fault when the
line-log tried to print lines past the file contents.

So I had no choice but to fix this properly.

I still wanted to keep the optimization where multiple line ranges are
joined into a single one (I am convinced that this also affects the output,
where previously multiple hunks would have been displayed, but I ran out of
time to investigate this). This is the 3rd patch. It is not purely an
optimization, as the assertion would still trigger when line ranges could be
joined.

Now, I am fairly certain that the changes are correct, but given my track
record with off-by-one bugs (and once even an off-by-two bug), I would
really appreciate some thorough review of this code, in particular the
second one that is the actual bug fix. I am specifically interested in
reviews from people who know line-log.c pretty well and can tell me whether
the src[i].end > target[j].end is correct, or whether it should actually
have been a >= (I tried to wrap my head around this, but I would feel more
comfortable if a domain expert would analyze this, whistling, and looking
Eric's way).

Cc: Eric Sunshine sunshine@sunshineco.com [sunshine@sunshineco.com]

Johannes Schindelin (4):
  line-log: demonstrate a bug with nearly-overlapping ranges
  line-log: adjust start/end of ranges individually
  line-log: optimize ranges by joining them when possible
  line-log: convert an assertion to a full BUG() call

 line-log.c          | 18 +++++++++++++++---
 t/t4211-line-log.sh | 17 +++++++++++++++++
 2 files changed, 32 insertions(+), 3 deletions(-)


base-commit: 1d89318c48d233d52f1db230cf622935ac3c69fa
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-15%2Fdscho%2Fline-log-fix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-15/dscho/line-log-fix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/15
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges
  2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
@ 2018-08-04 22:18 ` Johannes Schindelin via GitGitGadget
  2018-08-05  1:59   ` Jonathan Nieder
  2018-08-04 22:18 ` [PATCH 2/4] line-log: adjust start/end of ranges individually Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-08-04 22:18 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Currently, this test case throws an assertion:

	Assertion failed!

	Program: git.exe
	File: line-log.c, Line 71

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4211-line-log.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 436b13ad2..61ff37430 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -115,4 +115,21 @@ test_expect_success 'range_set_union' '
 	git log $(for x in $(test_seq 200); do echo -L $((2*x)),+1:c.c; done)
 '
 
+q_to_lf () {
+	tr Q '\012'
+}
+
+test_expect_failure 'close to overlapping ranges' '
+	test_seq 5 >a1.c &&
+	git add a1.c &&
+	git commit -m "5 lines" a1.c &&
+	sed s/3/3QaQb/ <a1.c | q_to_lf >tmp &&
+	mv tmp a1.c &&
+	git commit -m "2 more lines" a1.c &&
+	sed s/4/cQ4/ <a1.c | q_to_lf >tmp &&
+	mv tmp a1.c &&
+	git commit -m "1 more line" a1.c &&
+	git --no-pager log -L 1,3:a1.c -L 5,8:a1.c
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] line-log: adjust start/end of ranges individually
  2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
  2018-08-04 22:18 ` [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges Johannes Schindelin via GitGitGadget
@ 2018-08-04 22:18 ` Johannes Schindelin via GitGitGadget
  2018-08-05 10:14   ` Eric Sunshine
  2018-08-04 22:18 ` [PATCH 3/4] line-log: optimize ranges by joining them when possible Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-08-04 22:18 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When traversing commits and adjusting the ranges, things can get really
tricky. For example, when the line range of interest encloses several
hunks of a commit, the line range can actually shrink.

Currently, range_set_shift_diff() does not anticipate that scenario and
blindly adjusts start and end by the same offset ("shift" the range).

This can lead to a couple of surprising issues, such as assertions in
range_set_append() (when the end of a given range is not adjusted
properly, it can point after the start of the next range) or even
segmentation faults (when t_end in the loop of dump_diff_hacky_one()
points outside the valid line range).

Let's fix this by adjusting the start and the end offsets individually.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 line-log.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/line-log.c b/line-log.c
index 72a5fed66..d8d09b5ee 100644
--- a/line-log.c
+++ b/line-log.c
@@ -427,7 +427,7 @@ static void range_set_shift_diff(struct range_set *out,
 				 struct diff_ranges *diff)
 {
 	unsigned int i, j = 0;
-	long offset = 0;
+	long offset = 0, start_offset;
 	struct range *src = rs->ranges;
 	struct range *target = diff->target.ranges;
 	struct range *parent = diff->parent.ranges;
@@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
 				- (target[j].end-target[j].start);
 			j++;
 		}
-		range_set_append(out, src[i].start+offset, src[i].end+offset);
+		start_offset = offset;
+		while (j < diff->target.nr && src[i].end > target[j].end) {
+			offset += (parent[j].end-parent[j].start)
+				- (target[j].end-target[j].start);
+			j++;
+		}
+		range_set_append(out, src[i].start+start_offset, src[i].end+offset);
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] line-log: optimize ranges by joining them when possible
  2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
  2018-08-04 22:18 ` [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges Johannes Schindelin via GitGitGadget
  2018-08-04 22:18 ` [PATCH 2/4] line-log: adjust start/end of ranges individually Johannes Schindelin via GitGitGadget
@ 2018-08-04 22:18 ` Johannes Schindelin via GitGitGadget
  2018-08-05  6:11   ` Junio C Hamano
  2018-08-05  8:45   ` Andrei Rybak
  2018-08-04 22:18 ` [PATCH 4/4] line-log: convert an assertion to a full BUG() call Johannes Schindelin via GitGitGadget
  2018-08-05 10:39 ` [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Eric Sunshine
  4 siblings, 2 replies; 20+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-08-04 22:18 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Technically, it is okay to have line ranges that touch (i.e. the end of
the first range ends just before the next range begins). However, it is
inefficient, and when the user provides such touching ranges via
multiple `-L` options, we already join them.

When we traverse the history, though, we never join ranges, even they
become "touchy-feely" due to added lines (which are "removed" from
line-log's point of view because it traverses the commit history into
the past).

Let's optimize also this case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 line-log.c          | 4 ++++
 t/t4211-line-log.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/line-log.c b/line-log.c
index d8d09b5ee..bc7ef69d6 100644
--- a/line-log.c
+++ b/line-log.c
@@ -68,6 +68,10 @@ void range_set_append_unsafe(struct range_set *rs, long a, long b)
 
 void range_set_append(struct range_set *rs, long a, long b)
 {
+	if (rs->nr > 0 && rs->ranges[rs->nr-1].end + 1 == a) {
+		rs->ranges[rs->nr-1].end = b;
+		return;
+	}
 	assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);
 	range_set_append_unsafe(rs, a, b);
 }
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 61ff37430..ebaf5ea86 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -119,7 +119,7 @@ q_to_lf () {
 	tr Q '\012'
 }
 
-test_expect_failure 'close to overlapping ranges' '
+test_expect_success 'close to overlapping ranges' '
 	test_seq 5 >a1.c &&
 	git add a1.c &&
 	git commit -m "5 lines" a1.c &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] line-log: convert an assertion to a full BUG() call
  2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2018-08-04 22:18 ` [PATCH 3/4] line-log: optimize ranges by joining them when possible Johannes Schindelin via GitGitGadget
@ 2018-08-04 22:18 ` Johannes Schindelin via GitGitGadget
  2018-08-05 10:42   ` Eric Sunshine
  2018-08-05 10:39 ` [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Eric Sunshine
  4 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-08-04 22:18 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The assertion in question really indicates a bug, when triggered, so we
might just as well use the sanctioned method to report it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 line-log.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/line-log.c b/line-log.c
index bc7ef69d6..0e09df9db 100644
--- a/line-log.c
+++ b/line-log.c
@@ -72,7 +72,9 @@ void range_set_append(struct range_set *rs, long a, long b)
 		rs->ranges[rs->nr-1].end = b;
 		return;
 	}
-	assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);
+	if (rs->nr > 0 && rs->ranges[rs->nr-1].end > a)
+		BUG("append %ld-%ld, after %ld-%ld?!?", a, b,
+		    rs->ranges[rs->nr-1].start, rs->ranges[rs->nr-1].end);
 	range_set_append_unsafe(rs, a, b);
 }
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges
  2018-08-04 22:18 ` [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges Johannes Schindelin via GitGitGadget
@ 2018-08-05  1:59   ` Jonathan Nieder
  2018-08-06 10:27     ` Johannes Schindelin
  0 siblings, 1 reply; 20+ messages in thread
From: Jonathan Nieder @ 2018-08-05  1:59 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Thomas Rast, Junio C Hamano, Johannes Schindelin

Hi,

Johannes Schindelin wrote:

> Currently, this test case throws an assertion:
>
> 	Assertion failed!
>
> 	Program: git.exe
> 	File: line-log.c, Line 71
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t4211-line-log.sh | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)

Thanks for finding and demonstrating it.  Can you say more about what
is going on in the test case?  Alternatively, could it be squashed in
with the patch that fixes it?

The below will be more nitpicky:

[...]
> --- a/t/t4211-line-log.sh
> +++ b/t/t4211-line-log.sh
> @@ -115,4 +115,21 @@ test_expect_success 'range_set_union' '
>  	git log $(for x in $(test_seq 200); do echo -L $((2*x)),+1:c.c; done)
>  '
>  
> +q_to_lf () {
> +	tr Q '\012'
> +}
> +
> +test_expect_failure 'close to overlapping ranges' '
> +	test_seq 5 >a1.c &&
> +	git add a1.c &&
> +	git commit -m "5 lines" a1.c &&

It would be nice to use test_tick or test_commit for a more realistic
history (with time marching forward).

> +	sed s/3/3QaQb/ <a1.c | q_to_lf >tmp &&
> +	mv tmp a1.c &&
> +	git commit -m "2 more lines" a1.c &&

It's probably just me, but the bit with Q makes it hard for me to
follow.  Maybe there's a simpler way?

"sed -e '3aa' -e '3ab'" works here, but I don't know how portable it
is. I'd be more tempted to do

	test_write_lines 1 2 3 4 5 >a1.c &&
	...

	test_write_lines 1 2 3 a b 4 5 >a1.c &&
	...

	test_write_lines 1 2 3 a b c 4 5 >a1.c &&
	...

which is concise and has obvious behavior.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] line-log: optimize ranges by joining them when possible
  2018-08-04 22:18 ` [PATCH 3/4] line-log: optimize ranges by joining them when possible Johannes Schindelin via GitGitGadget
@ 2018-08-05  6:11   ` Junio C Hamano
  2018-08-05  8:45   ` Andrei Rybak
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2018-08-05  6:11 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Thomas Rast, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Technically, it is okay to have line ranges that touch (i.e. the end of
> the first range ends just before the next range begins). However, it is
> inefficient, and when the user provides such touching ranges via
> multiple `-L` options, we already join them.
>
> When we traverse the history, though, we never join ranges, even they
> become "touchy-feely" due to added lines (which are "removed" from
> line-log's point of view because it traverses the commit history into
> the past).

I do not know if that would be an "optimization" (in the sense that
joining would help performance) but I do agree that such a change
makes perfect sense from consistency's point of view.  If two ranges
that were originally apart lose a gap in between them, it does not
make any sense to keep them separate.  Good thinking.

> diff --git a/line-log.c b/line-log.c
> index d8d09b5ee..bc7ef69d6 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -68,6 +68,10 @@ void range_set_append_unsafe(struct range_set *rs, long a, long b)
>  
>  void range_set_append(struct range_set *rs, long a, long b)
>  {
> +	if (rs->nr > 0 && rs->ranges[rs->nr-1].end + 1 == a) {
> +		rs->ranges[rs->nr-1].end = b;
> +		return;
> +	}

Nice.

>  	assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);

>  	range_set_append_unsafe(rs, a, b);
>  }
> diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
> index 61ff37430..ebaf5ea86 100755
> --- a/t/t4211-line-log.sh
> +++ b/t/t4211-line-log.sh
> @@ -119,7 +119,7 @@ q_to_lf () {
>  	tr Q '\012'
>  }
>  
> -test_expect_failure 'close to overlapping ranges' '
> +test_expect_success 'close to overlapping ranges' '
>  	test_seq 5 >a1.c &&
>  	git add a1.c &&
>  	git commit -m "5 lines" a1.c &&

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] line-log: optimize ranges by joining them when possible
  2018-08-04 22:18 ` [PATCH 3/4] line-log: optimize ranges by joining them when possible Johannes Schindelin via GitGitGadget
  2018-08-05  6:11   ` Junio C Hamano
@ 2018-08-05  8:45   ` Andrei Rybak
  2018-08-05 10:31     ` Eric Sunshine
  1 sibling, 1 reply; 20+ messages in thread
From: Andrei Rybak @ 2018-08-05  8:45 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Thomas Rast, Junio C Hamano, Johannes Schindelin

On 2018-08-05 00:18, Johannes Schindelin via GitGitGadget wrote:
> 
> Now, I am fairly certain that the changes are correct, but given my track
> record with off-by-one bugs (and once even an off-by-two bug), I would
> really appreciate some thorough review of this code, in particular the
> second one that is the actual bug fix. I am specifically interested in
> reviews from people who know line-log.c pretty well and can tell me whether
> the src[i].end > target[j].end is correct, or whether it should actually
> have been a >= (I tried to wrap my head around this, but I would feel more
> comfortable if a domain expert would analyze this, whistling, and looking
> Eric's way).

I don't know line-log.c at all, but here are my comments on the more
abstract range and range_set changes:

On 2018-08-05 00:18, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> Technically, it is okay to have line ranges that touch (i.e. the end of
> the first range ends just before the next range begins). However, it is
> inefficient, and when the user provides such touching ranges via
> multiple `-L` options, we already join them.
>
> ...
>
>  void range_set_append(struct range_set *rs, long a, long b)
>  {
> +	if (rs->nr > 0 && rs->ranges[rs->nr-1].end + 1 == a) {
> +		rs->ranges[rs->nr-1].end = b;
> +		return;
> +	}

As I understand it, this patch attempts to make range_set_append extend
the last range in the range set to include [a,b), if [a,b) "touches" the
last range in rs.

Definition of range from line-log.h reads:

  /* A range [start,end].  Lines are numbered starting at 0, and the
   * ranges include start but exclude end. */
  struct range {
          long start, end;
  };

So the optimization described in commit message should take care of
following case, with zero lines between last range in rs and [a,b):

  rs before : [---) ... [---)
  [a,b)     :               [---)
  rs after  : [---) ... [-------)
  
It seems that the first condition in range_set_append should be:

	if (rs->nr > 0 && rs->ranges[rs->nr-1].end == a) {
		// extend the last range to include [a, b)
	}

I think that the comments around struct range could be improved by
switching from using "[]", as in the comment from line-log.h quoted
above, and "|---|" as in various comments in line-log.c to "left-closed,
right-open" interval notation like "[start,end)" and "[---)".

>  	assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);
>  	range_set_append_unsafe(rs, a, b);
>  }

With these consideration in mind the assert should become

	assert(rs->nr == 0 || rs->ranges[rs->nr-1].end < a);
  
to cover cases starting from one line between last range in rs and [a,b)

  rs before : [---) ... [---)
  [a,b)     :                [---)
  rs after  : [---) ... [---)[---)
                            ^
                            |
		this line still not part of the range set.
  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
  2018-08-04 22:18 ` [PATCH 2/4] line-log: adjust start/end of ranges individually Johannes Schindelin via GitGitGadget
@ 2018-08-05 10:14   ` Eric Sunshine
  2018-08-05 10:57     ` Eric Sunshine
  2018-08-06 12:52     ` Johannes Schindelin
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Sunshine @ 2018-08-05 10:14 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git List, Thomas Rast, Junio C Hamano, Johannes Schindelin

On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> When traversing commits and adjusting the ranges, things can get really
> tricky. For example, when the line range of interest encloses several
> hunks of a commit, the line range can actually shrink.
>
> Currently, range_set_shift_diff() does not anticipate that scenario and
> blindly adjusts start and end by the same offset ("shift" the range).
> [...]
> Let's fix this by adjusting the start and the end offsets individually.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/line-log.c b/line-log.c
> @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
>                                 - (target[j].end-target[j].start);
>                         j++;
>                 }
> -               range_set_append(out, src[i].start+offset, src[i].end+offset);
> +               start_offset = offset;
> +               while (j < diff->target.nr && src[i].end > target[j].end) {
> +                       offset += (parent[j].end-parent[j].start)
> +                               - (target[j].end-target[j].start);
> +                       j++;
> +               }
> +               range_set_append(out, src[i].start+start_offset, src[i].end+offset);

I'm still trying to wrap my head around the original code, so I'm not
even at the point of being able to say if this fix is correct. What
happens if the "start_offset" loop consumes all of 'j' before it even
gets to the new loop? Why does the new loop use '>' whereas the
existing uses '>='?

Having said that, a much easier fix is to use
range_set_append_unsafe() here, and then at the bottom of the loop,
invoke 'sort_and_merge_range_set(out)' to restore range-set invariants
and ensure that neighboring ranges are coalesced. Not only does that
resolve the crash and other weird behavior, but it means you don't
have to add a special-case to range_set_append(), thus the fix becomes
simpler overall.

Aside from simplicity, I think the suggested use of
range_set_append_unsafe() and sort_and_merge_range_set() _is_ the
correct fix anyhow because this code isn't taking care to ensure that
the range, after applying 'offset', doesn't abut or overlap with an
earlier range, and sort_and_merge_range_set() is meant to be used
exactly in cases like this when invariants may be broken.

So, while the suggested fix is simpler and "better" and fixes the
crash, that doesn't necessarily mean that the values computed here are
actually correct. As noted, I'm still trying to grok the computation
of these values, but that's a separate issue from the crash itself.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] line-log: optimize ranges by joining them when possible
  2018-08-05  8:45   ` Andrei Rybak
@ 2018-08-05 10:31     ` Eric Sunshine
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Sunshine @ 2018-08-05 10:31 UTC (permalink / raw)
  To: Andrei Rybak
  Cc: gitgitgadget, Git List, Thomas Rast, Junio C Hamano, Johannes Schindelin

> On 2018-08-05 00:18, Johannes Schindelin via GitGitGadget wrote:
> > Technically, it is okay to have line ranges that touch (i.e. the end of
> > the first range ends just before the next range begins). However, it is
> > inefficient, and when the user provides such touching ranges via
> > multiple `-L` options, we already join them.
> >
> >  void range_set_append(struct range_set *rs, long a, long b)
> >  {
> > +     if (rs->nr > 0 && rs->ranges[rs->nr-1].end + 1 == a) {
> > +             rs->ranges[rs->nr-1].end = b;
> > +             return;
> > +     }
>
> As I understand it, this patch attempts to make range_set_append extend
> the last range in the range set to include [a,b), if [a,b) "touches" the
> last range in rs.
> It seems that the first condition in range_set_append should be:
>
>         if (rs->nr > 0 && rs->ranges[rs->nr-1].end == a) {

I agree that this patch has an off-by-1 bug. 'end' is not included in
the previous range, so it should not be adding 1 to end before
checking against 'a'.

*However*, as mentioned in my review[1] of 2/4, the special-case added
by this patch is unnecessary, so this patch should be scrapped.

> With these consideration in mind the assert should become
>
>         assert(rs->nr == 0 || rs->ranges[rs->nr-1].end < a);

Agreed. The existing assertion() has an off-by-1 error.
range_set_append() is supposed to add a range _without_ breaking the
invariant that no two ranges can abut, and the assertion() was
supposed to protect against that. The existing "<= a" incorrectly
allows the new range to abut an existing one, whereas the proposed "<
a" doesn't.

(For adding abutting or overlapping ranges, range_set_append_unsafe()
followed, at some point, by sort_and_merge_range_set() is the
recommended alternative to the more strict range_set_append().)

[1]: https://public-inbox.org/git/CAPig+cRWcFVbA76_HT2iVD16bsUmbWdCgk_07rmiGneM5czdOQ@mail.gmail.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges
  2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2018-08-04 22:18 ` [PATCH 4/4] line-log: convert an assertion to a full BUG() call Johannes Schindelin via GitGitGadget
@ 2018-08-05 10:39 ` Eric Sunshine
  4 siblings, 0 replies; 20+ messages in thread
From: Eric Sunshine @ 2018-08-05 10:39 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git List, Thomas Rast, Junio C Hamano

On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Now, I am fairly certain that the changes are correct, but given my track
> record with off-by-one bugs (and once even an off-by-two bug), I would
> really appreciate some thorough review of this code, in particular the
> second one that is the actual bug fix. I am specifically interested in
> reviews from people who know line-log.c pretty well and can tell me whether
> the src[i].end > target[j].end is correct, or whether it should actually
> have been a >= (I tried to wrap my head around this, but I would feel more
> comfortable if a domain expert would analyze this, whistling, and looking
> Eric's way).

Although I fixed a number of bugs in basic range-set manipulation code
(5 years ago), I have no experience with or knowledge of the code
actually dealing with range-set manipulation in relation to diffs, so
I'm not a domain expert by any means, and I'm _still_ trying to wrap
my head around that code.

That said, I left some comments on the individual patches suggesting a
simpler and more correct fix for the crash. (Despite that suggestion
for fixing the crash, I still can't say whether the actual
computations performed by the existing code are correct, since I still
don't understand that aspect of it.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] line-log: convert an assertion to a full BUG() call
  2018-08-04 22:18 ` [PATCH 4/4] line-log: convert an assertion to a full BUG() call Johannes Schindelin via GitGitGadget
@ 2018-08-05 10:42   ` Eric Sunshine
  2018-08-06 13:14     ` Johannes Schindelin
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Sunshine @ 2018-08-05 10:42 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git List, Thomas Rast, Junio C Hamano, Johannes Schindelin

On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> The assertion in question really indicates a bug, when triggered, so we
> might just as well use the sanctioned method to report it.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/line-log.c b/line-log.c
> @@ -72,7 +72,9 @@ void range_set_append(struct range_set *rs, long a, long b)
> -       assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);
> +       if (rs->nr > 0 && rs->ranges[rs->nr-1].end > a)
> +               BUG("append %ld-%ld, after %ld-%ld?!?", a, b,
> +                   rs->ranges[rs->nr-1].start, rs->ranges[rs->nr-1].end);

Although this appears to be a faithful translation of the assert() to
BUG(), as mentioned by Andrei in his review of 3/4, the existing
assert() seems to have an off-by-1 error, which means that the "> a"
here really ought to be ">= a".

Given that this file is full of assert()'s, it doesn't necessarily
make sense to convert only this one, so perhaps the patch should be
dropped (since I'm guessing you don't want to convert the rest of
them).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
  2018-08-05 10:14   ` Eric Sunshine
@ 2018-08-05 10:57     ` Eric Sunshine
  2018-08-06 12:52     ` Johannes Schindelin
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Sunshine @ 2018-08-05 10:57 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git List, Thomas Rast, Junio C Hamano, Johannes Schindelin

On Sun, Aug 5, 2018 at 6:14 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> Having said that, a much easier fix is to use
> range_set_append_unsafe() here, and then at the bottom of the loop,
> invoke 'sort_and_merge_range_set(out)' to restore range-set invariants

By "bottom", I meant "outside" or "after":

    ...and then invoke sort_and_merge_range_set()
    after the loop to restore range-set invariants.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges
  2018-08-05  1:59   ` Jonathan Nieder
@ 2018-08-06 10:27     ` Johannes Schindelin
  2018-08-06 14:47       ` Jonathan Nieder
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin @ 2018-08-06 10:27 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Johannes Schindelin via GitGitGadget, git, Thomas Rast, Junio C Hamano

Hi Jonathan,

On Sat, 4 Aug 2018, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > Currently, this test case throws an assertion:
> >
> > 	Assertion failed!
> >
> > 	Program: git.exe
> > 	File: line-log.c, Line 71
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  t/t4211-line-log.sh | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> 
> Thanks for finding and demonstrating it.

You're welcome.

> Can you say more about what is going on in the test case?

Certainly. I considered writing more into the commit message, but all my
attempts were really repeating what the test case does, and were therefore
poor commit message material.

Under certain circumstances, multiple ranges specified for the line-log
were adjusted incorrectly, leading to violation of assumptions as well as
to segmentation faults. This test case demonstrates this.

> Alternatively, could it be squashed in with the patch that fixes it?

There is this really awful trend on this mailing list to suggest
conflating the demonstration of a bug with the bug fix.

It is really, really important to realize how valuable it is to have the
regression test as an individual patch that can be used to verify that
there is a bug, to pinpoint where it was introduced, to test alternative
fixes, to keep records separate, and I could go on and on and on. Please
do not ignore these very good reasons, and please refrain from
recommending such conflation in the future.

> The below will be more nitpicky:
> 
> [...]
> > --- a/t/t4211-line-log.sh
> > +++ b/t/t4211-line-log.sh
> > @@ -115,4 +115,21 @@ test_expect_success 'range_set_union' '
> >  	git log $(for x in $(test_seq 200); do echo -L $((2*x)),+1:c.c; done)
> >  '
> >  
> > +q_to_lf () {
> > +	tr Q '\012'
> > +}
> > +
> > +test_expect_failure 'close to overlapping ranges' '
> > +	test_seq 5 >a1.c &&
> > +	git add a1.c &&
> > +	git commit -m "5 lines" a1.c &&
> 
> It would be nice to use test_tick or test_commit for a more realistic
> history (with time marching forward).

As far as this test is concerned, that realism is not necessary, and comes
at the cost of readability.

> > +	sed s/3/3QaQb/ <a1.c | q_to_lf >tmp &&
> > +	mv tmp a1.c &&
> > +	git commit -m "2 more lines" a1.c &&
> 
> It's probably just me, but the bit with Q makes it hard for me to
> follow.  Maybe there's a simpler way?

Maybe. I did not find it, else I would have used it.

> "sed -e '3aa' -e '3ab'" works here, but I don't know how portable it
> is.

My experience with BSD sed and the `a` command made me highly suspicious,
that's why I did not go down that route.

Besides, that `sed` invocation does not really look intuitive either.

> I'd be more tempted to do
> 
> 	test_write_lines 1 2 3 4 5 >a1.c &&
> 	...
> 
> 	test_write_lines 1 2 3 a b 4 5 >a1.c &&
> 	...
> 
> 	test_write_lines 1 2 3 a b c 4 5 >a1.c &&
> 	...
> 
> which is concise and has obvious behavior.

That works for me!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
  2018-08-05 10:14   ` Eric Sunshine
  2018-08-05 10:57     ` Eric Sunshine
@ 2018-08-06 12:52     ` Johannes Schindelin
  1 sibling, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2018-08-06 12:52 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: gitgitgadget, Git List, Thomas Rast, Junio C Hamano

Hi Eric,

On Sun, 5 Aug 2018, Eric Sunshine wrote:

> On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > When traversing commits and adjusting the ranges, things can get really
> > tricky. For example, when the line range of interest encloses several
> > hunks of a commit, the line range can actually shrink.
> >
> > Currently, range_set_shift_diff() does not anticipate that scenario and
> > blindly adjusts start and end by the same offset ("shift" the range).
> > [...]
> > Let's fix this by adjusting the start and the end offsets individually.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/line-log.c b/line-log.c
> > @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
> >                                 - (target[j].end-target[j].start);
> >                         j++;
> >                 }
> > -               range_set_append(out, src[i].start+offset, src[i].end+offset);
> > +               start_offset = offset;
> > +               while (j < diff->target.nr && src[i].end > target[j].end) {
> > +                       offset += (parent[j].end-parent[j].start)
> > +                               - (target[j].end-target[j].start);
> > +                       j++;
> > +               }
> > +               range_set_append(out, src[i].start+start_offset, src[i].end+offset);
> 
> I'm still trying to wrap my head around the original code, so I'm not
> even at the point of being able to say if this fix is correct. What
> happens if the "start_offset" loop consumes all of 'j' before it even
> gets to the new loop?

First of all, it is not really the `start_offset` loop, but more like the
"end_offset loop": we are still adjusting `offset`, and that is what we
use to adjust the end of the current line range, while we take pains to
keep the offset that needs to be used for the start of said line range.

> Why does the new loop use '>' whereas the existing uses '>='?

As a mathematician, I usually think of intervals as inclusive only on one
end. So I intuitively use the non-inclusive comparison on the other end.

In this instance, the first loop tries to make sure that the offset to be
used for the start offset has taken into account all of the relevant diff
hunks (meaning: the diff hunks whose target lines start before src[i],
i.e. the current line range to adjust).

The second loop, however, tries to make sure that the offset to adjust the
*end* of the current line range.

Hence what I wrote.

But now that I think about it again, I am no longer sure that the first
loop (the one I did not change) is sound, to begin with. Let's imagine a
very simple case, where we try to adjust a start line, say, 100, and the
diff has a hunk with header `@@ -90,20 +90,40 @@`. So the start line lies
somewhere around a quarter into the post-image.

The current line-log code adjusts this by a very crude method: since the
line number lies between the post-image's start, it is adjusted by adding
the length of the pre-image and subtracting the length of the post image,
i.e. 20 - 40. The result is that we pretend that it came from line number
80, which is squarely outside the pre-image range.

That method of adjustment is appropriate for lines *outside* the
post-image range, of course. If we had looked at line 140 (which is 10
lines after the post-image), then it would be totally correct to say that
it came from line 120.

But that method is pretty obviously broken for any line that lies *within*
a post-image.

So I think the entire loop is in dear need of adjustment.

The question is: how should it be adjusted? The safest way would be to
adjust start and end of the line range differently, where start is pulled
back to the pre-image's start (in the above example, 90), and end is
pushed up to the pre-image's end (in the above example, 110).

Of course, this has a slightly unintuitive consequence: imagine that some
commit made a block of, say, 20 lines a conditional one. Like,

	/* Some comment */
	do_something = 1;
	...
	print_it();
	...

could have become

	if (!dry_run) {
		/* Some comment */
		do_something = 1;
		...
		print_it();
		...
	}

If you use my proposed method to adjust the line ranges to figure out the
history of that `print_it()` line, this commit would blow up the line
range to the entire conditional block, which is probably not what you
wanted.

I don't really see a better way, though.

> Having said that, a much easier fix is to use range_set_append_unsafe()
> here, and then at the bottom of the loop, invoke
> 'sort_and_merge_range_set(out)' to restore range-set invariants and
> ensure that neighboring ranges are coalesced. Not only does that resolve
> the crash and other weird behavior, but it means you don't have to add a
> special-case to range_set_append(), thus the fix becomes simpler
> overall.

That would only paper over the bug, as the line range is adjusted
incorrectly. It really is.

When looking at a line range, say, 90--110, and traversing a commit that
added 5 lines from line 100-105, then the end and the start of that line
range really need to be adjusted independently. In this example, the start
would not even budge, but the end would need to be adjusted to 105.

The current code would adjust neither because the start of the line range
is outside of the post-image range.

No amount of restoring invariants can make up for that incorrect
adjustment.

> Aside from simplicity, I think the suggested use of
> range_set_append_unsafe() and sort_and_merge_range_set() _is_ the
> correct fix anyhow because this code isn't taking care to ensure that
> the range, after applying 'offset', doesn't abut or overlap with an
> earlier range, and sort_and_merge_range_set() is meant to be used
> exactly in cases like this when invariants may be broken.
> 
> So, while the suggested fix is simpler and "better" and fixes the
> crash, that doesn't necessarily mean that the values computed here are
> actually correct. As noted, I'm still trying to grok the computation
> of these values, but that's a separate issue from the crash itself.

I think sort_and_merge_range_set() is fine for "Postelizing user input"
(Postel's law: be stringent in your output and lenient in your input,
something we should also heed in general, not only in our code).

But I don't think our code should need it. If the line ranges need to
resort to re-sorting (pun intended), that smells a lot like a bug.

And the only way those ranges could be merged is when they start to touch
(like in the test case I provided, when the lines 1, 2, 3 and the lines 4,
5, become a single line range because a, b and then c, are removed from
the original 1, 2, 3, a, b, c, 4, 5). And my suggested fix makes that
change quite elegantly, while appending line ranges (which will work
correctly if done in the correct order, which should always be the case
because both our line ranges as well as the diff hunks are ordered).

What do you think about my proposed change to adjust start and end?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] line-log: convert an assertion to a full BUG() call
  2018-08-05 10:42   ` Eric Sunshine
@ 2018-08-06 13:14     ` Johannes Schindelin
  2018-08-07  9:09       ` Eric Sunshine
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin @ 2018-08-06 13:14 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: gitgitgadget, Git List, Thomas Rast, Junio C Hamano

Hi Eric,

On Sun, 5 Aug 2018, Eric Sunshine wrote:

> On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > The assertion in question really indicates a bug, when triggered, so we
> > might just as well use the sanctioned method to report it.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/line-log.c b/line-log.c
> > @@ -72,7 +72,9 @@ void range_set_append(struct range_set *rs, long a, long b)
> > -       assert(rs->nr == 0 || rs->ranges[rs->nr-1].end <= a);
> > +       if (rs->nr > 0 && rs->ranges[rs->nr-1].end > a)
> > +               BUG("append %ld-%ld, after %ld-%ld?!?", a, b,
> > +                   rs->ranges[rs->nr-1].start, rs->ranges[rs->nr-1].end);
> 
> Although this appears to be a faithful translation of the assert() to
> BUG(), as mentioned by Andrei in his review of 3/4, the existing
> assert() seems to have an off-by-1 error, which means that the "> a"
> here really ought to be ">= a".

I think Andrei's assessment is wrong. The code could not test for that
earlier, as it did allow ranges to become "abutting" in the process, by
failing to merge them. So the invariant you talked about is more of an
invariant for the initial state.

My 3/4 would make that invariant heeded throughout the process.

I am still keen on keeping the invariants straight *without* resorting to
dirty tricks like calling sort_and_merge. Calling that function would just
make it easier for bugs to hide in this code.

> Given that this file is full of assert()'s, it doesn't necessarily
> make sense to convert only this one, so perhaps the patch should be
> dropped (since I'm guessing you don't want to convert the rest of
> them).

Sure, there are 18 of them, and you're right, I lack the time to convert
them.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges
  2018-08-06 10:27     ` Johannes Schindelin
@ 2018-08-06 14:47       ` Jonathan Nieder
  2018-08-06 15:33         ` Jonathan Nieder
  0 siblings, 1 reply; 20+ messages in thread
From: Jonathan Nieder @ 2018-08-06 14:47 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Thomas Rast, Junio C Hamano

Hi Dscho,

Johannes Schindelin wrote:
> On Sat, 4 Aug 2018, Jonathan Nieder wrote:

>> Alternatively, could it be squashed in with the patch that fixes it?
>
> There is this really awful trend on this mailing list to suggest
> conflating the demonstration of a bug with the bug fix.
>
> It is really, really important to realize how valuable it is to have the
> regression test as an individual patch that can be used to verify that
> there is a bug, to pinpoint where it was introduced, to test alternative
> fixes, to keep records separate, and I could go on and on and on. Please
> do not ignore these very good reasons, and please refrain from
> recommending such conflation in the future.

If you want to propose changing the project's style to always separate
tests from the patch that fixes a bug, that's a discussion we can have,
in a separate thread.

Today, we do allow and encourage putting the test with the patch that
fixes it, and that has real advantages:

- the tests are easier to understand when found with "git log" because
  they are in context

- as the patch evolves, it is easier to remember to update the test at
  the same time

- newcomers imitating existing patches have a clear hint to write
  tests

- the beginning of a patch series can be applied and merged down while
  the end is still under review, without either leaving out the tests
  or applying a test that doesn't pass and accomplishes little

I've never found it difficult to use the test from a patch to verify
that there is a bug or pinpoint where it was introduced.  Tests are
separate from the application code since they're in the t/ directory;
this is a very easy thing to do.  That isn't to say that a patch that
only adds a (passing or expected-failure) test isn't valuable, even
without a fix.  It is valuable, precisely when it is self-explanatory.

More importantly, I am a bit surprised that instead of accepting the
feedback, you are basically calling a reviewer complicit, for pointing
out a pretty normal possible improvement that follows the project's
conventions.

I'm beyond words.

Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges
  2018-08-06 14:47       ` Jonathan Nieder
@ 2018-08-06 15:33         ` Jonathan Nieder
  0 siblings, 0 replies; 20+ messages in thread
From: Jonathan Nieder @ 2018-08-06 15:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Thomas Rast, Junio C Hamano

Jonathan Nieder wrote:
> Johannes Schindelin wrote:

>> It is really, really important to realize how valuable it is to have the
>> regression test as an individual patch that can be used to verify that
>> there is a bug, to pinpoint where it was introduced, to test alternative
>> fixes, to keep records separate, and I could go on and on and on. Please
>> do not ignore these very good reasons, and please refrain from
>> recommending such conflation in the future.
>
> If you want to propose changing the project's style to always separate
> tests from the patch that fixes a bug, that's a discussion we can have,
> in a separate thread.

By the way, don't get me wrong: I am sympathetic to the motivation for
such a change.

I have worked on projects where tests were in a separate repository
from the application.  There are costs and benefits.  To support the
kind of use cases you're describing, the tests would include
conditionals to allow running on old versions of the application (the
test expectations were based on the latest version, but setup code
sometimes had to accomodate differences between versions).  It worked
okay and was probably worth it for that project, despite the added
friction.  It's not clear it would be worth it for Git.

Jonathan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] line-log: convert an assertion to a full BUG() call
  2018-08-06 13:14     ` Johannes Schindelin
@ 2018-08-07  9:09       ` Eric Sunshine
  2018-08-07 22:00         ` Eric Sunshine
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Sunshine @ 2018-08-07  9:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: gitgitgadget, Git List, Thomas Rast, Junio C Hamano

On Mon, Aug 6, 2018 at 9:15 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Sun, 5 Aug 2018, Eric Sunshine wrote:
> > Although this appears to be a faithful translation of the assert() to
> > BUG(), as mentioned by Andrei in his review of 3/4, the existing
> > assert() seems to have an off-by-1 error, which means that the "> a"
> > here really ought to be ">= a".
>
> I think Andrei's assessment is wrong. The code could not test for that
> earlier, as it did allow ranges to become "abutting" in the process, by
> failing to merge them. So the invariant you talked about is more of an
> invariant for the initial state.

I'm having trouble interpreting your response.

My understanding is that range_set_append() is intended to be strict
by not allowing addition of a range which abuts an existing range
(although, of course, the assert() checks only the last range, so it's
not full-proof). Assuming that to be correct, then the assertion
contains a one-off-error (according to my reasoning).

I'm sensing from your reply that you seem to have a different idea
about range_set_append()'s intended use.

> My 3/4 would make that invariant heeded throughout the process.
>
> I am still keen on keeping the invariants straight *without* resorting to
> dirty tricks like calling sort_and_merge. Calling that function would just
> make it easier for bugs to hide in this code.

Indeed, avoiding the "dirty trick" would be ideal, although, I still
haven't wrapped my head around it enough to be able to say whether the
computed offset, when applied to the range in question, could cause
that range to abut or overlap an existing range.

(There are, of course, valid uses for range_set_append_unsafe() /
sort_and_merge(), such as allowing -L options to overlap and be
specified in any order. Batch-adding them to the range-set via
range_set_append_unsafe() and letting sort_and_merge() sort them all
out is plenty sensible.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] line-log: convert an assertion to a full BUG() call
  2018-08-07  9:09       ` Eric Sunshine
@ 2018-08-07 22:00         ` Eric Sunshine
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Sunshine @ 2018-08-07 22:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: gitgitgadget, Git List, Thomas Rast, Junio C Hamano

On Tue, Aug 7, 2018 at 5:09 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Aug 6, 2018 at 9:15 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > I think Andrei's assessment is wrong. The code could not test for that
> > earlier, as it did allow ranges to become "abutting" in the process, by
> > failing to merge them. So the invariant you talked about is more of an
> > invariant for the initial state.
>
> My understanding is that range_set_append() is intended to be strict
> by not allowing addition of a range which abuts an existing range
> (although, of course, the assert() checks only the last range, so it's
> not full-proof).

Ignore my parenthetical comment. It is clearly wrong.

Looking at this again, I see that there is some confusion. The comment
in line-log.h says:

    /* New range must begin at or after end of last added range */
   void range_set_append(struct range_set *, long start, long end);

However, that comment was added by me in c0babbe695 (range-set:
publish API for re-use by git-blame -L, 2013-08-06) -- five years and
one day ago -- and it appears to be based upon a direct interpretation
of the condition in the assert(), including its off-by-one error.

*But*, one of the invariants of range-set is that the ranges must
_not_ abut one another, so the "at or after" is wrong; it should say
instead something like "after, and not touching, the end of the last
added range".

That invariant about having a gap between ranges is enforced
deliberately by range_set_check_invariants(). It's also signaled
implicitly by the fact that no callers of range_set_append() ever
invoke sort_and_merge_range_set() after calling range_set_append().

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-07 22:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-04 22:18 [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Johannes Schindelin via GitGitGadget
2018-08-04 22:18 ` [PATCH 1/4] line-log: demonstrate a bug with nearly-overlapping ranges Johannes Schindelin via GitGitGadget
2018-08-05  1:59   ` Jonathan Nieder
2018-08-06 10:27     ` Johannes Schindelin
2018-08-06 14:47       ` Jonathan Nieder
2018-08-06 15:33         ` Jonathan Nieder
2018-08-04 22:18 ` [PATCH 2/4] line-log: adjust start/end of ranges individually Johannes Schindelin via GitGitGadget
2018-08-05 10:14   ` Eric Sunshine
2018-08-05 10:57     ` Eric Sunshine
2018-08-06 12:52     ` Johannes Schindelin
2018-08-04 22:18 ` [PATCH 3/4] line-log: optimize ranges by joining them when possible Johannes Schindelin via GitGitGadget
2018-08-05  6:11   ` Junio C Hamano
2018-08-05  8:45   ` Andrei Rybak
2018-08-05 10:31     ` Eric Sunshine
2018-08-04 22:18 ` [PATCH 4/4] line-log: convert an assertion to a full BUG() call Johannes Schindelin via GitGitGadget
2018-08-05 10:42   ` Eric Sunshine
2018-08-06 13:14     ` Johannes Schindelin
2018-08-07  9:09       ` Eric Sunshine
2018-08-07 22:00         ` Eric Sunshine
2018-08-05 10:39 ` [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges Eric Sunshine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).