All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] more clang/sanitizer fixes
@ 2020-01-25  5:35 Jeff King
  2020-01-25  5:37 ` [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:35 UTC (permalink / raw)
  To: git

Here are some more sanitizer fixes, this time from trying with clang-11.
These should go on top of jk/asan-build-fix if you want to get a
successful run with "make CC=clang-11 SANITIZE=address,undefined test".

I don't think any of them is indicative of a current bug in practice,
but the UBSan stuff makes me worried that an aggressive compiler might
do the wrong thing in some case.

  [1/4]: merge-recursive: silence -Wxor-used-as-pow warning
  [2/4]: avoid computing zero offsets from NULL pointer
  [3/4]: xdiff: avoid computing non-zero offset from NULL pointer
  [4/4]: obstack: avoid computing offsets from NULL pointer

 compat/obstack.h  |  6 ++++--
 merge-recursive.c | 19 ++++++++++++++-----
 sequencer.c       |  6 +++---
 unpack-trees.c    |  2 +-
 xdiff-interface.c | 12 ++++++++----
 5 files changed, 30 insertions(+), 15 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25  5:35 [PATCH 0/4] more clang/sanitizer fixes Jeff King
@ 2020-01-25  5:37 ` Jeff King
  2020-01-25 17:27   ` Junio C Hamano
  2020-01-25  5:38 ` [PATCH 2/4] avoid computing zero offsets from NULL pointer Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:37 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

The merge-recursive code uses stage number constants like this:

  add = &ci->ren1->dst_entry->stages[2 ^ 1];
  ...
  add = &ci->ren2->dst_entry->stages[3 ^ 1];

The xor has the effect of flipping the "1" bit, so that "2 ^ 1" becomes
"3" and "3 ^ 1" becomes "2", which correspond to the "ours" and "theirs"
stages respectively.

Unfortunately, clang-10 and up issue a warning for this code:

  merge-recursive.c:1759:40: error: result of '2 ^ 1' is 3; did you mean '1 << 1' (2)? [-Werror,-Wxor-used-as-pow]
                  add = &ci->ren1->dst_entry->stages[2 ^ 1];
                                                     ~~^~~
                                                     1 << 1
  merge-recursive.c:1759:40: note: replace expression with '0x2 ^ 1' to silence this warning

We could silence it by using 0x2, as the compiler mentions. Or by just
using the constants "2" and "3" directly. But after digging into it, I
do think this bit-flip is telling us something. If we just wrote:

  add = &ci->ren2->dst_entry->stages[2];

for the second one, you might think that "ren2" and "2" correspond. But
they don't. The logic is: ren2 is theirs, which is stage 3, but we
are interested in the opposite side's stage, so flip it to 2.

So let's keep the bit-flipping, but let's also put it behind a named
function, which will make its purpose a bit clearer. This also has the
side effect of suppressing the warning (and an optimizing compiler
should be able to easily turn it into a constant as before).

Signed-off-by: Jeff King <peff@peff.net>
---
It might be more readable still to #define STAGE_OURS and STAGE_THEIRS,
but the use of bare 2/3 is pervasive throughout the file. We'd probably
want to change it consistently, and perhaps call "ren1" "ren_ours" or
something like that. I'm not overly familiar with this code, so I tried
to keep it to what I found an obvious improvement. But I'm also happy to
go the "0x2 ^ 1" route if merge-recursive experts prefer that.

 merge-recursive.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 10dca5644b..e6aedd3cab 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1712,6 +1712,15 @@ static char *find_path_for_conflict(struct merge_options *opt,
 	return new_path;
 }
 
+/*
+ * Toggle the stage number between "ours" and "theirs" (2 and 3) by flipping
+ * the 1-bit.
+ */
+static inline int flip_stage(int stage)
+{
+	return stage ^ 1;
+}
+
 static int handle_rename_rename_1to2(struct merge_options *opt,
 				     struct rename_conflict_info *ci)
 {
@@ -1756,14 +1765,14 @@ static int handle_rename_rename_1to2(struct merge_options *opt,
 		 * such cases, we should keep the added file around,
 		 * resolving the conflict at that path in its favor.
 		 */
-		add = &ci->ren1->dst_entry->stages[2 ^ 1];
+		add = &ci->ren1->dst_entry->stages[flip_stage(2)];
 		if (is_valid(add)) {
 			if (update_file(opt, 0, add, a->path))
 				return -1;
 		}
 		else
 			remove_file_from_index(opt->repo->index, a->path);
-		add = &ci->ren2->dst_entry->stages[3 ^ 1];
+		add = &ci->ren2->dst_entry->stages[flip_stage(3)];
 		if (is_valid(add)) {
 			if (update_file(opt, 0, add, b->path))
 				return -1;
@@ -1776,7 +1785,7 @@ static int handle_rename_rename_1to2(struct merge_options *opt,
 		 * rename/add collision.  If not, we can write the file out
 		 * to the specified location.
 		 */
-		add = &ci->ren1->dst_entry->stages[2 ^ 1];
+		add = &ci->ren1->dst_entry->stages[flip_stage(2)];
 		if (is_valid(add)) {
 			add->path = mfi.blob.path = a->path;
 			if (handle_file_collision(opt, a->path,
@@ -1797,7 +1806,7 @@ static int handle_rename_rename_1to2(struct merge_options *opt,
 				return -1;
 		}
 
-		add = &ci->ren2->dst_entry->stages[3 ^ 1];
+		add = &ci->ren2->dst_entry->stages[flip_stage(3)];
 		if (is_valid(add)) {
 			add->path = mfi.blob.path = b->path;
 			if (handle_file_collision(opt, b->path,
@@ -1846,7 +1855,7 @@ static int handle_rename_rename_2to1(struct merge_options *opt,
 	path_side_1_desc = xstrfmt("version of %s from %s", path, a->path);
 	path_side_2_desc = xstrfmt("version of %s from %s", path, b->path);
 	ostage1 = ci->ren1->branch == opt->branch1 ? 3 : 2;
-	ostage2 = ostage1 ^ 1;
+	ostage2 = flip_stage(ostage1);
 	ci->ren1->src_entry->stages[ostage1].path = a->path;
 	ci->ren2->src_entry->stages[ostage2].path = b->path;
 	if (merge_mode_and_contents(opt, a, c1,
-- 
2.25.0.421.gb74d19af79


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-25  5:35 [PATCH 0/4] more clang/sanitizer fixes Jeff King
  2020-01-25  5:37 ` [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning Jeff King
@ 2020-01-25  5:38 ` Jeff King
  2020-01-27 20:03   ` Junio C Hamano
  2020-01-25  5:39 ` [PATCH 3/4] xdiff: avoid computing non-zero offset " Jeff King
  2020-01-25  5:41 ` [PATCH 4/4] obstack: avoid computing offsets " Jeff King
  3 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:38 UTC (permalink / raw)
  To: git

The Undefined Behavior Sanitizer in clang-11 seems to have learned a new
trick: it complains about computing offsets from a NULL pointer, even if
that offset is 0. This causes numerous test failures. For example, from
t1090:

  unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer
  ...
  not ok 6 - in partial clone, sparse checkout only fetches needed blobs

The code in question looks like this:

  struct cache_entry **cache_end = cache + nr;
  ...
  while (cache != cache_end)

and we sometimes pass in a NULL and 0 for "cache" and "nr". This is
conceptually fine, as "cache_end" would be equal to "cache" in this
case, and we wouldn't enter the loop at all. But computing even a zero
offset violates the C standard. And given the fact that UBSan is
noticing this behavior, this might be a potential problem spot if the
compiler starts making unexpected assumptions based on undefined
behavior.

So let's just avoid it, which is pretty easy. In some cases we can just
switch to iterating with a numeric index (as we do in sequencer.c here).
In other cases (like the cache_end one) the use of an end pointer is
more natural; we can keep that by just explicitly checking for NULL when
assigning the end pointer.

Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c       | 6 +++---
 unpack-trees.c    | 2 +-
 xdiff-interface.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4d31ec3296 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -588,7 +588,7 @@ static int do_recursive_merge(struct repository *r,
 	struct merge_options o;
 	struct tree *next_tree, *base_tree, *head_tree;
 	int clean;
-	char **xopt;
+	int i;
 	struct lock_file index_lock = LOCK_INIT;
 
 	if (repo_hold_locked_index(r, &index_lock, LOCK_REPORT_ON_ERROR) < 0)
@@ -608,8 +608,8 @@ static int do_recursive_merge(struct repository *r,
 	next_tree = next ? get_commit_tree(next) : empty_tree(r);
 	base_tree = base ? get_commit_tree(base) : empty_tree(r);
 
-	for (xopt = opts->xopts; xopt != opts->xopts + opts->xopts_nr; xopt++)
-		parse_merge_opt(&o, *xopt);
+	for (i = 0; i < opts->xopts_nr; i++)
+		parse_merge_opt(&o, opts->xopts[i]);
 
 	clean = merge_trees(&o,
 			    head_tree,
diff --git a/unpack-trees.c b/unpack-trees.c
index d5f4d997da..b4292d2be8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1352,7 +1352,7 @@ static int clear_ce_flags_1(struct index_state *istate,
 			    enum pattern_match_result default_match,
 			    int progress_nr)
 {
-	struct cache_entry **cache_end = cache + nr;
+	struct cache_entry **cache_end = cache ? cache + nr : cache;
 
 	/*
 	 * Process all entries that have the given prefix and meet
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 8509f9ea22..2f1fe48512 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
 {
 	const int blk = 1024;
 	long trimmed = 0, recovered = 0;
-	char *ap = a->ptr + a->size;
-	char *bp = b->ptr + b->size;
+	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
+	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
 	long smaller = (a->size < b->size) ? a->size : b->size;
 
 	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
-- 
2.25.0.421.gb74d19af79


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] xdiff: avoid computing non-zero offset from NULL pointer
  2020-01-25  5:35 [PATCH 0/4] more clang/sanitizer fixes Jeff King
  2020-01-25  5:37 ` [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning Jeff King
  2020-01-25  5:38 ` [PATCH 2/4] avoid computing zero offsets from NULL pointer Jeff King
@ 2020-01-25  5:39 ` Jeff King
  2020-01-25  5:41 ` [PATCH 4/4] obstack: avoid computing offsets " Jeff King
  3 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:39 UTC (permalink / raw)
  To: git

As with the previous commit, clang-11's UBSan complains about computing
offsets from a NULL pointer, causing some tests to fail. In this case,
though, we're actually computing a non-zero offset, which is even more
dubious. From t7810:

  xdiff-interface.c:268:14: runtime error: applying non-zero offset 1 to null pointer
  ...
  not ok 131 - grep -p with userdiff

The problem is our parsing of the funcname config. We count the number
of lines in the string, allocate an array, and then loop over our
allocated entries, parsing each line and moving our cursor to one past
the trailing newline for the next iteration.

But the final line will not generally have a trailing newline (since
it's a config value), and hence we go to one past NULL. In practice this
is OK, since our loop should terminate before we look at the value. But
even computing such an invalid pointer technically violates the
standard.

We can fix it by leaving the pointer at NULL if we're at the end, rather
than one-past. And while we're thinking about it, we can also document
the variant by asserting that our initial line-count matches the
second-pass of parsing.

Signed-off-by: Jeff King <peff@peff.net>
---
 xdiff-interface.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 2f1fe48512..3cd2ac2855 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -250,9 +250,13 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
 	ALLOC_ARRAY(regs->array, regs->nr);
 	for (i = 0; i < regs->nr; i++) {
 		struct ff_reg *reg = regs->array + i;
-		const char *ep = strchr(value, '\n'), *expression;
+		const char *ep, *expression;
 		char *buffer = NULL;
 
+		if (!value)
+			BUG("mismatch between line count and parsing");
+		ep = strchr(value, '\n');
+
 		reg->negate = (*value == '!');
 		if (reg->negate && i == regs->nr - 1)
 			die("Last expression must not be negated: %s", value);
@@ -265,7 +269,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
 		if (regcomp(&reg->re, expression, cflags))
 			die("Invalid regexp to look for hunk header: %s", expression);
 		free(buffer);
-		value = ep + 1;
+		value = ep ? ep + 1 : NULL;
 	}
 }
 
-- 
2.25.0.421.gb74d19af79


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] obstack: avoid computing offsets from NULL pointer
  2020-01-25  5:35 [PATCH 0/4] more clang/sanitizer fixes Jeff King
                   ` (2 preceding siblings ...)
  2020-01-25  5:39 ` [PATCH 3/4] xdiff: avoid computing non-zero offset " Jeff King
@ 2020-01-25  5:41 ` Jeff King
  2020-01-25  5:44   ` [PATCH v2 " Jeff King
  3 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:41 UTC (permalink / raw)
  To: git

As with the previous two commits, UBSan with clang-11 complains about
computing offsets from a NULL pointer. The failures in t4013 (and
elsewhere) look like this:

  kwset.c:102:23: runtime error: applying non-zero offset 107820859019600 to null pointer
  ...
  not ok 79 - git log -SF master # magic is (not used)

That line of kwset.c is not enlightening:

  ... = obstack_alloc(&kwset->obstack, sizeof (struct trie));

because obstack is implemented almost entirely in macros, and the actual
problem is five macros deep (I temporarily converted them to inline
functions to get better compiler errors, which was tedious but worked
reasonably well).

The actual problem is in these pointer-alignment macros:

  /* If B is the base of an object addressed by P, return the result of
     aligning P to the next multiple of A + 1.  B and P must be of type
     char *.  A + 1 must be a power of 2.  */

  #define __BPTR_ALIGN(B, P, A) ((B) + (((P) - (B) + (A)) & ~(A)))

  /* Similar to _BPTR_ALIGN (B, P, A), except optimize the common case
     where pointers can be converted to integers, aligned as integers,
     and converted back again.  If PTR_INT_TYPE is narrower than a
     pointer (e.g., the AS/400), play it safe and compute the alignment
     relative to B.  Otherwise, use the faster strategy of computing the
     alignment relative to 0.  */

  #define __PTR_ALIGN(B, P, A)                                                \
    __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
                  P, A)

If we have a sufficiently-large integer pointer type, then we do the
computation using a NULL pointer constant. That turns __BPTR_ALIGN()
into something like:

  NULL + (P - NULL + A) & ~A

and UBSan is complaining about adding the full value of P to that
initial NULL. We can fix this by doing our math as an integer type, and
then casting the result back to a pointer. The problem case only happens
when we know that the integer type is large enough, so there should be
no issue with truncation.

Another option would be just simplify out all the 0's from
__BPTR_ALIGN() for the NULL-pointer case. That probably wouldn't work
for a platform where the NULL pointer isn't all-zeroes, but Git already
wouldn't work on such a platform (due to our use of memset to set
pointers in structs to NULL). But I tried here to keep as close to the
original as possible.

Signed-off-by: Jeff King <peff@peff.net>
---
 compat/obstack.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/compat/obstack.h b/compat/obstack.h
index 01e7c81840..fbcf68c67c 100644
--- a/compat/obstack.h
+++ b/compat/obstack.h
@@ -135,8 +135,10 @@ extern "C" {
    alignment relative to 0.  */
 
 #define __PTR_ALIGN(B, P, A)						    \
-  __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
-		P, A)
+  (sizeof (PTR_INT_TYPE) < sizeof(void *) ?                                 \
+   __BPTR_ALIGN((B), (P), (A)) :                                            \
+   (void *)__BPTR_ALIGN((PTR_INT_TYPE)0, (PTR_INT_TYPE)(P), (A))            \
+  )
 
 #include <string.h>
 
-- 
2.25.0.421.gb74d19af79

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] obstack: avoid computing offsets from NULL pointer
  2020-01-25  5:41 ` [PATCH 4/4] obstack: avoid computing offsets " Jeff King
@ 2020-01-25  5:44   ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2020-01-25  5:44 UTC (permalink / raw)
  To: git

On Sat, Jan 25, 2020 at 12:41:18AM -0500, Jeff King wrote:

> diff --git a/compat/obstack.h b/compat/obstack.h
> index 01e7c81840..fbcf68c67c 100644
> --- a/compat/obstack.h
> +++ b/compat/obstack.h
> @@ -135,8 +135,10 @@ extern "C" {
>     alignment relative to 0.  */
>  
>  #define __PTR_ALIGN(B, P, A)						    \
> -  __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
> -		P, A)
> +  (sizeof (PTR_INT_TYPE) < sizeof(void *) ?                                 \
> +   __BPTR_ALIGN((B), (P), (A)) :                                            \
> +   (void *)__BPTR_ALIGN((PTR_INT_TYPE)0, (PTR_INT_TYPE)(P), (A))            \
> +  )

Oops, I added in one change at the last minute but failed to commit it.
The first argument to __BPTR_ALIGN() should be:

  (PTR_INT_TYPE)(void *)0

to ensure that it's a NULL pointer, even if the platform doesn't have
an all-bits-zero NULL. Again, this is probably academic for Git's
codebase, but I was trying to keep as close to the original as possible.

Here's a fixed patch.

-- >8 --
Subject: obstack: avoid computing offsets from NULL pointer

As with the previous two commits, UBSan with clang-11 complains about
computing offsets from a NULL pointer. The failures in t4013 (and
elsewhere) look like this:

  kwset.c:102:23: runtime error: applying non-zero offset 107820859019600 to null pointer
  ...
  not ok 79 - git log -SF master # magic is (not used)

That line is not enlightening:

  ... = obstack_alloc(&kwset->obstack, sizeof (struct trie));

because obstack is implemented almost entirely in macros, and the actual
problem is five macros deep (I temporarily converted them to inline
functions to get better compiler errors, which was tedious but worked
reasonably well).

The actual problem is in these pointer-alignment macros:

  /* If B is the base of an object addressed by P, return the result of
     aligning P to the next multiple of A + 1.  B and P must be of type
     char *.  A + 1 must be a power of 2.  */

  #define __BPTR_ALIGN(B, P, A) ((B) + (((P) - (B) + (A)) & ~(A)))

  /* Similar to _BPTR_ALIGN (B, P, A), except optimize the common case
     where pointers can be converted to integers, aligned as integers,
     and converted back again.  If PTR_INT_TYPE is narrower than a
     pointer (e.g., the AS/400), play it safe and compute the alignment
     relative to B.  Otherwise, use the faster strategy of computing the
     alignment relative to 0.  */

  #define __PTR_ALIGN(B, P, A)                                                \
    __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
                  P, A)

If we have a sufficiently-large integer pointer type, then we do the
computation using a NULL pointer constant. That turns __BPTR_ALIGN()
into something like:

  NULL + (P - NULL + A) & ~A

and UBSan is complaining about adding the full value of P to that
initial NULL. We can fix this by doing our math as an integer type, and
then casting the result back to a pointer. The problem case only happens
when we know that the integer type is large enough, so there should be
no issue with truncation.

Another option would be just simplify out all the 0's from
__BPTR_ALIGN() for the NULL-pointer case. That probably wouldn't work
for a platform where the NULL pointer isn't all-zeroes, but Git already
wouldn't work on such a platform (due to our use of memset to set
pointers in structs to NULL). But I tried here to keep as close to the
original as possible.

Signed-off-by: Jeff King <peff@peff.net>
---
 compat/obstack.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/compat/obstack.h b/compat/obstack.h
index 01e7c81840..f90a46d9b9 100644
--- a/compat/obstack.h
+++ b/compat/obstack.h
@@ -135,8 +135,10 @@ extern "C" {
    alignment relative to 0.  */
 
 #define __PTR_ALIGN(B, P, A)						    \
-  __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
-		P, A)
+  (sizeof (PTR_INT_TYPE) < sizeof(void *) ?                                 \
+   __BPTR_ALIGN((B), (P), (A)) :                                            \
+   (void *)__BPTR_ALIGN((PTR_INT_TYPE)(void *)0, (PTR_INT_TYPE)(P), (A))            \
+  )
 
 #include <string.h>
 
-- 
2.25.0.421.gb74d19af79


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25  5:37 ` [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning Jeff King
@ 2020-01-25 17:27   ` Junio C Hamano
  2020-01-25 19:55     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2020-01-25 17:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren

Jeff King <peff@peff.net> writes:

> The merge-recursive code uses stage number constants like this:
>
>   add = &ci->ren1->dst_entry->stages[2 ^ 1];
>   ...
>   add = &ci->ren2->dst_entry->stages[3 ^ 1];
>
> The xor has the effect of flipping the "1" bit, so that "2 ^ 1" becomes
> "3" and "3 ^ 1" becomes "2", which correspond to the "ours" and "theirs"
> stages respectively.
>
> Unfortunately, clang-10 and up issue a warning for this code:
>
>   merge-recursive.c:1759:40: error: result of '2 ^ 1' is 3; did you mean '1 << 1' (2)? [-Werror,-Wxor-used-as-pow]
>                   add = &ci->ren1->dst_entry->stages[2 ^ 1];
>                                                      ~~^~~
>                                                      1 << 1
>   merge-recursive.c:1759:40: note: replace expression with '0x2 ^ 1' to silence this warning
>
> We could silence it by using 0x2, as the compiler mentions. Or by just
> using the constants "2" and "3" directly. But after digging into it, I
> do think this bit-flip is telling us something. If we just wrote:
>
>   add = &ci->ren2->dst_entry->stages[2];
>
> for the second one, you might think that "ren2" and "2" correspond. But
> they don't. The logic is: ren2 is theirs, which is stage 3, but we
> are interested in the opposite side's stage, so flip it to 2.

So, the logical name for "^1" operator applied to 2 (ours) and 3
(theirs) is "the_other_side()"?  the_other_side(theirs) == ours.

I would have written (5 - side) instead of (side ^ 1) if I were
writing this, though.

> So let's keep the bit-flipping, but let's also put it behind a named
> function, which will make its purpose a bit clearer. This also has the
> side effect of suppressing the warning (and an optimizing compiler
> should be able to easily turn it into a constant as before).

OK.  Now I see you named it flip_stage(), which is even better than
"the-other-side" above.  Makes sense.

I still think ((2 + 3) - two_or_three_to_be_flipped) easier to
reason about than the bit flipping, as the implementation detail,
though.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25 17:27   ` Junio C Hamano
@ 2020-01-25 19:55     ` Jeff King
  2020-01-25 20:50       ` Elijah Newren
  2020-01-27 19:17       ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2020-01-25 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren

On Sat, Jan 25, 2020 at 09:27:36AM -0800, Junio C Hamano wrote:

> > So let's keep the bit-flipping, but let's also put it behind a named
> > function, which will make its purpose a bit clearer. This also has the
> > side effect of suppressing the warning (and an optimizing compiler
> > should be able to easily turn it into a constant as before).
> 
> OK.  Now I see you named it flip_stage(), which is even better than
> "the-other-side" above.  Makes sense.
> 
> I still think ((2 + 3) - two_or_three_to_be_flipped) easier to
> reason about than the bit flipping, as the implementation detail,
> though.

Yeah, the existing one relies on the coincidence that the two stages
differ by a single bit (in another universe, they could well be stages
"3" and "4").

I don't overly care on the implementation either way, since it's now
hidden in the helper. I mostly chose the bit-flip to match the existing
code, but I'd be happy to change it. Other people who actually work on
merge-recursive may have other opinions, though.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25 19:55     ` Jeff King
@ 2020-01-25 20:50       ` Elijah Newren
  2020-01-25 23:57         ` Jeff King
  2020-01-27 19:17       ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Elijah Newren @ 2020-01-25 20:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

On Sat, Jan 25, 2020 at 11:55 AM Jeff King <peff@peff.net> wrote:
>
> On Sat, Jan 25, 2020 at 09:27:36AM -0800, Junio C Hamano wrote:
>
> > > So let's keep the bit-flipping, but let's also put it behind a named
> > > function, which will make its purpose a bit clearer. This also has the
> > > side effect of suppressing the warning (and an optimizing compiler
> > > should be able to easily turn it into a constant as before).
> >
> > OK.  Now I see you named it flip_stage(), which is even better than
> > "the-other-side" above.  Makes sense.
> >
> > I still think ((2 + 3) - two_or_three_to_be_flipped) easier to
> > reason about than the bit flipping, as the implementation detail,
> > though.
>
> Yeah, the existing one relies on the coincidence that the two stages
> differ by a single bit (in another universe, they could well be stages
> "3" and "4").
>
> I don't overly care on the implementation either way, since it's now
> hidden in the helper. I mostly chose the bit-flip to match the existing
> code, but I'd be happy to change it. Other people who actually work on
> merge-recursive may have other opinions, though.

Interesting.  In merge-ort (my in-development attempt at a replacement
for merge-recursive), I'm currently storing the stages in an array
with indices 0-2 rather than the 1-3 used by merge-recursive.  This
removes the empty/unused array entry at the beginning, and also works
a bit better with the masks that traverse_trees() returns (as 1<<index
corresponds to the bit in the mask from the traverse_trees()
callback).  In that scheme, bitflip won't work, but the subtraction
idea still does.  So, I'd tend to agree with Junio, but I think the
helper you added here is probably the more important improvement.

However, all that said, I don't care all that much about what to do
with merge-recursive in this case, because it currently looks like not
much of the code is going to survive anyway.  merge-ort isn't even
alpha quality yet[1], but I seem to be moving towards totally
different data structures and copying/sharing less and less code from
merge-recursive with each change.

Elijah

[1] merge-ort still isn't functional yet other than in extremely
narrow circumstances, I'm still experimenting with the data
structures, and I've written several hundred lines of code and then
thrown it all away at least once -- and may do so again.  Whenever I
find a useful patch I can separate and submit upstream, I have been
doing so, but until the risk of another complete rewrite goes down,
there's no point in me sending my half-baked ideas in for review.
They need to be at least three-quarters baked first.  :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25 20:50       ` Elijah Newren
@ 2020-01-25 23:57         ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2020-01-25 23:57 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, Git Mailing List

On Sat, Jan 25, 2020 at 12:50:38PM -0800, Elijah Newren wrote:

> Interesting.  In merge-ort (my in-development attempt at a replacement
> for merge-recursive), I'm currently storing the stages in an array
> with indices 0-2 rather than the 1-3 used by merge-recursive.  This
> removes the empty/unused array entry at the beginning, and also works
> a bit better with the masks that traverse_trees() returns (as 1<<index
> corresponds to the bit in the mask from the traverse_trees()
> callback).  In that scheme, bitflip won't work, but the subtraction
> idea still does.  So, I'd tend to agree with Junio, but I think the
> helper you added here is probably the more important improvement.

OK, that's three people who think the subtraction is more obvious. I
agree it's not that big a deal now that it's hidden in the helper (and
the code may eventually go away anyway), but it's easy enough to fix it
while we're thinking about it.

Patch is below (as an incremental on top attributed to Junio, who
proposed it; but it would be fine to squash it in with a Helped-by,
too).

> [1] merge-ort still isn't functional yet other than in extremely
> narrow circumstances, I'm still experimenting with the data
> structures, and I've written several hundred lines of code and then
> thrown it all away at least once -- and may do so again.  Whenever I
> find a useful patch I can separate and submit upstream, I have been
> doing so, but until the risk of another complete rewrite goes down,
> there's no point in me sending my half-baked ideas in for review.
> They need to be at least three-quarters baked first.  :-)

Thank you, by the way, for all of the work you have done on
merge-recursive. I generally find it one of the scariest bits of the
code to look at, so I am happy to be able to punt it off to you. :)

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH] merge-recursive: use subtraction to flip stage

The flip_stage() helper uses a bit-flipping xor to switch between "2"
and "3". While clever, this relies on a property of those two numbers
that is mostly coincidence. Let's write it as a subtraction; that's more
clear and would extend to other numbers if somebody copies the logic.

Signed-off-by: Jeff King <peff@peff.net>
---
 merge-recursive.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index e6aedd3cab..aee1769a7a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1713,12 +1713,11 @@ static char *find_path_for_conflict(struct merge_options *opt,
 }
 
 /*
- * Toggle the stage number between "ours" and "theirs" (2 and 3) by flipping
- * the 1-bit.
+ * Toggle the stage number between "ours" and "theirs" (2 and 3).
  */
 static inline int flip_stage(int stage)
 {
-	return stage ^ 1;
+	return (2 + 3) - stage;
 }
 
 static int handle_rename_rename_1to2(struct merge_options *opt,
-- 
2.25.0.430.g8dfc7de6f7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning
  2020-01-25 19:55     ` Jeff King
  2020-01-25 20:50       ` Elijah Newren
@ 2020-01-27 19:17       ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2020-01-27 19:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren

Jeff King <peff@peff.net> writes:

> I don't overly care on the implementation either way, since it's now
> hidden in the helper. I mostly chose the bit-flip to match the existing
> code, but I'd be happy to change it.

I don't, either, and as a step in this series, it makes perfect
sense to keep the implementation detail the same in the new helper.

Making it more readable would be a separate issue.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-25  5:38 ` [PATCH 2/4] avoid computing zero offsets from NULL pointer Jeff King
@ 2020-01-27 20:03   ` Junio C Hamano
  2020-01-27 21:19     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2020-01-27 20:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> switch to iterating with a numeric index (as we do in sequencer.c here).
> In other cases (like the cache_end one) the use of an end pointer is
> more natural; we can keep that by just explicitly checking for NULL when
> assigning the end pointer.
>
> diff --git a/xdiff-interface.c b/xdiff-interface.c
> index 8509f9ea22..2f1fe48512 100644
> --- a/xdiff-interface.c
> +++ b/xdiff-interface.c
> @@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
>  {
>  	const int blk = 1024;
>  	long trimmed = 0, recovered = 0;
> -	char *ap = a->ptr + a->size;
> -	char *bp = b->ptr + b->size;
> +	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
> +	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
>  	long smaller = (a->size < b->size) ? a->size : b->size;
>  
>  	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {

Isn't it a bug for a->ptr or b->ptr to be NULL here?  Even if we
manage to assign ap = a->ptr = NULL without complaints, how would
that memcmp work?

Is it that the corresponding .size would always be 0 if .ptr is NULL
that protects us?

A bit puzzled.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-27 20:03   ` Junio C Hamano
@ 2020-01-27 21:19     ` Jeff King
  2020-01-28 18:03       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2020-01-27 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Jan 27, 2020 at 12:03:55PM -0800, Junio C Hamano wrote:

> > diff --git a/xdiff-interface.c b/xdiff-interface.c
> > index 8509f9ea22..2f1fe48512 100644
> > --- a/xdiff-interface.c
> > +++ b/xdiff-interface.c
> > @@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
> >  {
> >  	const int blk = 1024;
> >  	long trimmed = 0, recovered = 0;
> > -	char *ap = a->ptr + a->size;
> > -	char *bp = b->ptr + b->size;
> > +	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
> > +	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
> >  	long smaller = (a->size < b->size) ? a->size : b->size;
> >  
> >  	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
> 
> Isn't it a bug for a->ptr or b->ptr to be NULL here?  Even if we
> manage to assign ap = a->ptr = NULL without complaints, how would
> that memcmp work?
> 
> Is it that the corresponding .size would always be 0 if .ptr is NULL
> that protects us?
> 
> A bit puzzled.

Yes, that's what's happening; all of the cases in this first patch are
dealing with "NULL + 0". Which isn't to say somebody couldn't pass in an
mmfile_t with NULL and a non-zero size, but obviously that would be a
bug. Before my patch that would be a segfault, but afterwards we'd
quietly treat it as if the size were zero.

If we want to be more defensive, we could do something like this:

  /* dual inline/macro magic to avoid evaluating ptr twice but knowing
   * enough about the type of *ptr to get the size. */
  #define SAFE_END_PTR(ptr, len) safe_end_ptr(ptr, len, sizeof(*ptr))
  static inline void *safe_end_ptr(void *ptr, size_t nr, size_t elem_size)
  {
	if (!ptr) {
		if (nr)
			BUG("non-zero size coupled with NULL pointer");
		return NULL;
	}
	return (char *)ptr + nr * elem_size;
  }

  ...
  char *ap = SAFE_END_PTR(a->ptr, a->size);

I'm not sure if it's worth it, though.

Yet another alternative is to consider it a bug to use an mmfile_t with
a NULL pointer, figure out where that's being set up, and fix it.

As an aside, I also wondered whether we could run into problems with
"memcmp(NULL, ..., 0)", which is also undefined behavior. But we don't
here because the first half of the while() condition wouldn't trigger.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-27 21:19     ` Jeff King
@ 2020-01-28 18:03       ` Junio C Hamano
  2020-01-29  2:31         ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2020-01-28 18:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> >  	const int blk = 1024;
>> >  	long trimmed = 0, recovered = 0;
>> > -	char *ap = a->ptr + a->size;
>> > -	char *bp = b->ptr + b->size;
>> > +	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
>> > +	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
>> >  	long smaller = (a->size < b->size) ? a->size : b->size;
>> >  
>> >  	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
>> 
>> Isn't it a bug for a->ptr or b->ptr to be NULL here?  Even if we
>> manage to assign ap = a->ptr = NULL without complaints, how would
>> that memcmp work?
>> 
>> Is it that the corresponding .size would always be 0 if .ptr is NULL
>> that protects us?
>
> If we want to be more defensive, we could do something like this:
>
>   /* dual inline/macro magic to avoid evaluating ptr twice but knowing
>    * enough about the type of *ptr to get the size. */
>   #define SAFE_END_PTR(ptr, len) safe_end_ptr(ptr, len, sizeof(*ptr))
>   static inline void *safe_end_ptr(void *ptr, size_t nr, size_t elem_size)
>   {
> 	if (!ptr) {
> 		if (nr)
> 			BUG("non-zero size coupled with NULL pointer");
> 		return NULL;
> 	}
> 	return (char *)ptr + nr * elem_size;
>   }
>
>   ...
>   char *ap = SAFE_END_PTR(a->ptr, a->size);
>
> I'm not sure if it's worth it, though.

As long as we make it clear to those who add new callers that
any mmfile_t with .ptr==NULL must come with .size==0, it is fine.

> Yet another alternative is to consider it a bug to use an mmfile_t with
> a NULL pointer, figure out where that's being set up, and fix it.

But that would still require us to make it clear to those who add
new callers that mmfile_t with .ptr==NULL is a bug, and the current
callers must be using that as it is convenient for them, I presume,
so I think a simple comment should probably be sufficient.

> As an aside, I also wondered whether we could run into problems with
> "memcmp(NULL, ..., 0)", which is also undefined behavior. But we don't
> here because the first half of the while() condition wouldn't trigger.

Yes, although the details slightly differ ;-)

What is problematic actually is "memcmp(NULL - 1024, ..., 1024)",
which is guarded with "1024 + trimmed <= smaller &&" that will never
be true as long as "mmfile_t with .ptr==NULL must have .size==0"
holds true, right?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-28 18:03       ` Junio C Hamano
@ 2020-01-29  2:31         ` Jeff King
  2020-01-29  5:16           ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2020-01-29  2:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jan 28, 2020 at 10:03:49AM -0800, Junio C Hamano wrote:

> > I'm not sure if it's worth it, though.
> 
> As long as we make it clear to those who add new callers that
> any mmfile_t with .ptr==NULL must come with .size==0, it is fine.

TBH, I'm not sure it _is_ fine. The concept that it's safe for a ptr/len
pair to use NULL/0 is true in a lot of places, but the mmfile struct
gets used in a lot of places, much of which is xdiff code we didn't
write. I have no idea if that assumption holds everywhere.

We'd be fixing this one spot, and that's enough to make the tests happy
with UBSan. But I don't know if it's something we ought to be
recommending.

> > Yet another alternative is to consider it a bug to use an mmfile_t with
> > a NULL pointer, figure out where that's being set up, and fix it.
> 
> But that would still require us to make it clear to those who add
> new callers that mmfile_t with .ptr==NULL is a bug, and the current
> callers must be using that as it is convenient for them, I presume,
> so I think a simple comment should probably be sufficient.

Yep, but it's not much different than the hundreds of other function
interfaces we have where sometimes you can pass NULL and sometimes not.

So anyway. What do we want to do here? The fix I have? Something more
elaborate and reusable? Or perhaps just switch it to:

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 3cd2ac2855..4d20069302 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
 {
 	const int blk = 1024;
 	long trimmed = 0, recovered = 0;
-	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
-	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
+	char *ap = a->size ? a->ptr + a->size : a->ptr;
+	char *bp = b->size ? b->ptr + b->size : b->ptr;
 	long smaller = (a->size < b->size) ? a->size : b->size;
 
 	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {

By checking "size" instead of "ptr", then we know that the addition is a
noop. And we'd continue to catch a NULL pointer mixed with a non-zero
length (as a segfault). And a non-NULL pointer with a zero length does
the right thing.

> > As an aside, I also wondered whether we could run into problems with
> > "memcmp(NULL, ..., 0)", which is also undefined behavior. But we don't
> > here because the first half of the while() condition wouldn't trigger.
> 
> Yes, although the details slightly differ ;-)
> 
> What is problematic actually is "memcmp(NULL - 1024, ..., 1024)",
> which is guarded with "1024 + trimmed <= smaller &&" that will never
> be true as long as "mmfile_t with .ptr==NULL must have .size==0"
> holds true, right?

Yes, because "smaller" would always be "0". And that part of the code
always uses a 1024-size blk, so it would never have passed "0" to memcmp
anyway.

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-29  2:31         ` Jeff King
@ 2020-01-29  5:16           ` Junio C Hamano
  2020-01-29  5:46             ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2020-01-29  5:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Yep, but it's not much different than the hundreds of other function
> interfaces we have where sometimes you can pass NULL and sometimes not.
>
> So anyway. What do we want to do here? The fix I have? Something more
> elaborate and reusable? Or perhaps just switch it to:

My preference was to take the patch as-is, as it was clear enough,
before seeing this one ...

> diff --git a/xdiff-interface.c b/xdiff-interface.c
> index 3cd2ac2855..4d20069302 100644
> --- a/xdiff-interface.c
> +++ b/xdiff-interface.c
> @@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
>  {
>  	const int blk = 1024;
>  	long trimmed = 0, recovered = 0;
> -	char *ap = a->ptr ? a->ptr + a->size : a->ptr;
> -	char *bp = b->ptr ? b->ptr + b->size : b->ptr;
> +	char *ap = a->size ? a->ptr + a->size : a->ptr;
> +	char *bp = b->size ? b->ptr + b->size : b->ptr;
>  	long smaller = (a->size < b->size) ? a->size : b->size;
>  
>  	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
>
> By checking "size" instead of "ptr", then we know that the addition is a
> noop. And we'd continue to catch a NULL pointer mixed with a non-zero
> length (as a segfault). And a non-NULL pointer with a zero length does
> the right thing.

which makes quite a lot of sense.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] avoid computing zero offsets from NULL pointer
  2020-01-29  5:16           ` Junio C Hamano
@ 2020-01-29  5:46             ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2020-01-29  5:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jan 28, 2020 at 09:16:39PM -0800, Junio C Hamano wrote:

> My preference was to take the patch as-is, as it was clear enough,
> before seeing this one ...
> [..]
> which makes quite a lot of sense.

Yeah, about an hour after I wrote it, I thought "gee, that's clearly
better than the other way; I should have done that from the start".

So here's a re-roll of the patch using that technique, with a note in
the commit message.

-- >8 --
Subject: [PATCH] avoid computing zero offsets from NULL pointer

The Undefined Behavior Sanitizer in clang-11 seems to have learned a new
trick: it complains about computing offsets from a NULL pointer, even if
that offset is 0. This causes numerous test failures. For example, from
t1090:

  unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer
  ...
  not ok 6 - in partial clone, sparse checkout only fetches needed blobs

The code in question looks like this:

  struct cache_entry **cache_end = cache + nr;
  ...
  while (cache != cache_end)

and we sometimes pass in a NULL and 0 for "cache" and "nr". This is
conceptually fine, as "cache_end" would be equal to "cache" in this
case, and we wouldn't enter the loop at all. But computing even a zero
offset violates the C standard. And given the fact that UBSan is
noticing this behavior, this might be a potential problem spot if the
compiler starts making unexpected assumptions based on undefined
behavior.

So let's just avoid it, which is pretty easy. In some cases we can just
switch to iterating with a numeric index (as we do in sequencer.c here).
In other cases (like the cache_end one) the use of an end pointer is
more natural; we can keep that by just explicitly checking for the
NULL/0 case when assigning the end pointer.

Note that there are two ways you can write this latter case, checking
for the pointer:

  cache_end = cache ? cache + nr : cache;

or the size:

  cache_end = nr ? cache + nr : cache;

For the case of a NULL/0 ptr/len combo, they are equivalent. But writing
it the second way (as this patch does) has the property that if somebody
were to incorrectly pass a NULL pointer with a non-zero length, we'd
continue to notice and segfault, rather than silently pretending the
length was zero.

Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c       | 6 +++---
 unpack-trees.c    | 2 +-
 xdiff-interface.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4d31ec3296 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -588,7 +588,7 @@ static int do_recursive_merge(struct repository *r,
 	struct merge_options o;
 	struct tree *next_tree, *base_tree, *head_tree;
 	int clean;
-	char **xopt;
+	int i;
 	struct lock_file index_lock = LOCK_INIT;
 
 	if (repo_hold_locked_index(r, &index_lock, LOCK_REPORT_ON_ERROR) < 0)
@@ -608,8 +608,8 @@ static int do_recursive_merge(struct repository *r,
 	next_tree = next ? get_commit_tree(next) : empty_tree(r);
 	base_tree = base ? get_commit_tree(base) : empty_tree(r);
 
-	for (xopt = opts->xopts; xopt != opts->xopts + opts->xopts_nr; xopt++)
-		parse_merge_opt(&o, *xopt);
+	for (i = 0; i < opts->xopts_nr; i++)
+		parse_merge_opt(&o, opts->xopts[i]);
 
 	clean = merge_trees(&o,
 			    head_tree,
diff --git a/unpack-trees.c b/unpack-trees.c
index d5f4d997da..a027504b56 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1352,7 +1352,7 @@ static int clear_ce_flags_1(struct index_state *istate,
 			    enum pattern_match_result default_match,
 			    int progress_nr)
 {
-	struct cache_entry **cache_end = cache + nr;
+	struct cache_entry **cache_end = nr ? cache + nr : cache;
 
 	/*
 	 * Process all entries that have the given prefix and meet
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 8509f9ea22..99661d43c6 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -84,8 +84,8 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b)
 {
 	const int blk = 1024;
 	long trimmed = 0, recovered = 0;
-	char *ap = a->ptr + a->size;
-	char *bp = b->ptr + b->size;
+	char *ap = a->size ? a->ptr + a->size : a->ptr;
+	char *bp = b->size ? b->ptr + b->size : b->ptr;
 	long smaller = (a->size < b->size) ? a->size : b->size;
 
 	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
-- 
2.25.0.515.g69a699b7aa


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-29  5:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-25  5:35 [PATCH 0/4] more clang/sanitizer fixes Jeff King
2020-01-25  5:37 ` [PATCH 1/4] merge-recursive: silence -Wxor-used-as-pow warning Jeff King
2020-01-25 17:27   ` Junio C Hamano
2020-01-25 19:55     ` Jeff King
2020-01-25 20:50       ` Elijah Newren
2020-01-25 23:57         ` Jeff King
2020-01-27 19:17       ` Junio C Hamano
2020-01-25  5:38 ` [PATCH 2/4] avoid computing zero offsets from NULL pointer Jeff King
2020-01-27 20:03   ` Junio C Hamano
2020-01-27 21:19     ` Jeff King
2020-01-28 18:03       ` Junio C Hamano
2020-01-29  2:31         ` Jeff King
2020-01-29  5:16           ` Junio C Hamano
2020-01-29  5:46             ` Jeff King
2020-01-25  5:39 ` [PATCH 3/4] xdiff: avoid computing non-zero offset " Jeff King
2020-01-25  5:41 ` [PATCH 4/4] obstack: avoid computing offsets " Jeff King
2020-01-25  5:44   ` [PATCH v2 " Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.